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BOX PATENT APPLICATION 

Assistant Commissioner for Patents 
Washington, DC 20231 

Sir: 

As authorized by the inventor (s) , transmitted herewith for 
filing is a patent application applied for on behalf of the 
inventor (s) according to the provisions of 37 C.F.R. § 1.41(c), 
which claims priority under 35 U.S.C. § 119(e) of Provisional 
Application No. 60/159,331 filed on October 14, 1999. 

Inventor (s): Nickolai ALEXANDROV, Vyacheslav BROVER 

For: SEQUENCE- DETERMINED DNA FRAGMENTS AND CORRESPONDING 

POLYPEPTIDES ENCODED THEREBY 



Enclosed are: 

[>3 A specification consisting of a Description (1078 pages), 
Table 1 ( (640 pages), Table 2 '(1487 pages), Claims (5 pages), 
Schematic (1 page), and an Abstract (1 page) totaling three- 
thousand two-hundred and twelve (3212) pages 

□ ( ) sheet (s) of formal drawings 

□ Certified copy of Priority Document (s) 

[3 Executed Declaration in accordance with 37 C.F.R. § 1.64 will 
follow 

13 A statement to establish small entity status under 37 C.F.R. 
§ 1.9 and 37 C.F.R. § 1.27 



Mail Address: P.O. Box 747 , Falls Church, Virginia, USA 22040-0747 
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□ Preliminary Amendment 
Information Sheet 

□ Information Disclosure Statement, PTO-1449 and reference (s) 

□ Amend the specification by inserting before the first line 
the sentence: 

— This application claims priority on provisional Application 
No. filed on , the entire contents of which are 

hereby incorporated by reference. — 

[3 Other: Power of Attorney regarding Small Entity Statement, 
ATCC Deposit receipts PTA-595, PTA-1161, PTA-1411, CD 
containing Specification 



The filing fee has been calculated as shown below: 









LARGE ENTITY 


SMALL ENTITY 




BASIC 


FEE 


$690.00 


$345.00 




NUMBER 
FILED 


NUMBER 
EXTRA 


RATE FEE 


RATE FEE 


TOTAL 
CLAIMS 


50- 20 = 


30 


X 18 = $0.00 


x 9 = 270 


INDEPENDENT 
CLAIMS 


5- 3 = 


2 


x 78 = $0.00 


x 39 = 78 


MULTIPLE DEPENDENT 
□ CLAIMS PRESENTED 


+ $260.00 


+ $130.00 




TOTAL 


$0.00 


$693.00 



The application transmitted herewith is filed in accordance 
with 37 C.F.R. § 1.41(c). The undersigned has been authorized 
by the inventor (s) to file the present application. The 
original duly executed declaration together with the 
surcharge will be forwarded in due course. 

A check in the amount of $693.00 to cover the filing fee is 
enclosed. 
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□ Please charge Deposit Account No. 02-2448 in the amount of 
$0.00. A triplicate copy of this transmittal form is 
enclosed. 

13 Please send correspondence to: 

BIRCH, STEWART, KOLASCH & BIRCH, LLP or Customer No. 2292 
P.O. Box 747 

Falls Church, VA 22040-0747 
Telephone: (703) 205-8000 



If necessary, the Commissioner is hereby authorized in this, 
concurrent, and future replies, to charge payment or credit any 
overpayment to Deposit Account No. 02-2448 for any additional fees 
required under 37 C.F.R. §§ 1.16 or 1.17; particularly, extension 
of time fees. 



Respectfully submitted, 



BIRCH, STEWART, KOLASCH & BIRCH, LLP 
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P.O. Box 747 

Falls Church, VA 22040-0747 
(703) 205-8000 
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STATEMENT CLAIMING SMALL ENTITY STATUS Docket Number: 2750-1237P 

(37 CFR 1.9(f) & 1.27(c)) - SMALL BUSINESS CONCERN 



Applicant, Patentee, or Identifier: N. ALEXANDROV et al. 



Application or Patent No.: NEW Patent Application 



Filed or Issued: October 13, 2000 



Title: SEQUENCE-DETERMINED DNA FRAGMENTS AND CORRESPONDING POLYPEPTIDES 
ENCODED THEREBY _____ _ — 

I hereby state that I am 

□ the owner of the small business concern identified below: 
_<] an official of the small business concern empowered to act on behalf of 
the concern identified below: 



NAME OF SMALL BUSINESS CONCERN CERES, INC 



ADDRESS OF SMALL BUSINESS CONCERN 3007 Malibu Canyon Road Malibu, CA 90265 

I hereby state that the above identified small business concern qualifies as a small 
business concern as defined in 37 CFR Part 121 for purposes of paying reduced fees to the United 
States Patent and Trademark Office, in that the number of employees of the concern, including 
those of its affiliates, does not exceed 500 persons. For purposes of this statement, (1) the 
number of employees of the business concern is the average over the previous fiscal year of the 
concern of the persons employed on a full-time, part-time, or temporary basis during each of the 
pay periods of the fiscal year, and (2) concerns are affiliates of each other when either, 
directly or indirectly, one concern controls or has the power to control the other, or a third 
party or parties controls or has the power to control both. 

I hereby state that rights under contract or law have been conveyed to and remain with 
the small business concern identified above with regard to the invention described in: 

the specification filed herewith with title as listed above. 
[_] the application identified above. 

□ the patent identified above. 

If the rights held by the above identified small business concern are not 
exclusive, each individual, concern, or organization having rights in the invention 
must file separate statements as to their status as small entities, and no rights to 
the invention are held by any person, other than the inventor, who would not qualify 
as an independent inventor under 37 CFR 1.9(c) if that person made the invention, or 
by any concern which would not qualify as a small business concern under 37 CFR 
1.9(d), or a nonprofit organization under 37 CFR 1.9(e). 

Each person, concern, or organization having any rights in the 
invention is listed below: 

_<] no such person, concern, or organization exists. 

□ each such person, concern, or organization is listed below. 

Separate statements are required from each named person, concern, ^ or 
organization having rights to the invention stating their status as small entities. 
(37 CFR 1.27) 

I acknowledge the duty to file, in this application or patent, notification of 
any change in status resulting in loss of entitlement to small entity status prior to 
paying, or at the time of paying, the earliest of the issue fee or any maintenance fee 
due after the date on which status as a small entity is not longer appropriate. (37 
CFR 1.28 (b) ) 

NAME OF PERSON SIGNING Raymond C. Stewart (Reg. No. 21, 066) 

TITLE IN ORGANIZATION OF PERSON SIGNING Legal Representative of CERES, INC. 

ADDRESS OF PERSON SIGNING Birch, Stewart, Kolasch and Birch, LLP. 

P.O. Box 747 Falls Church, VA 22040-0.747 




SIGNATURE C/ ^t^ — ^ t ^^%jj^tJ^A^><C^ — DATE October 13, 2000 

Rev. 10/12/1998 
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SEQUENCE-DETERMINED DNA FRAGMENTS AND CORRESPONDING 
POLYPEPTIDES ENCODED THEREBY 

This application claims priority under 35 USC § 119(e), §119(a-d) and §120 of the 
following applications, the entire contents of which are hereby incorporated by reference: 



Country 


Filing Date 


Attorney No. 


Client No. 


Application No, 


United States 


10/14/1999 


2750-0578P 


80146.001 


60/159,331 1 
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FIELD OF THE INVENTION 

The present invention relates to isolated polynucleotides that represent a complete 
gene, or a fragment thereof, that is expressed. In addition, the present invention relates to the 
polypeptide or protein corresponding to the coding sequence of these polynucleotides. The 
1 0 present invention also relates to isolated polynucleotides that represent regulatory regions of 
genes. The present invention also relates to isolated polynucleotides that represent 
untranslated regions of genes. The present invention further relates to the use of these isolated 
polynucleotides and polypeptides and proteins. 

DESCRIPTION OF THE RELATED ART 

1 5 Efforts to map and sequence the genome of a number of organisms are in progress; a few 

complete genome sequences, for example those of E. coli and Saccharomyces cerevisiae are 
known (Blattner et al, Science 277:1453 (1997); Goffeau et al., Science 274:546 (1996)). The 
complete genome of a multicellular organism, C. elegans, has also been sequenced (See, the C. 
elegans Sequencing Consortium, Science 282:2012 (1998)). To date, no complete genome of a 

2 0 plant has been sequenced, nor has a complete cDNA complement of any plant been sequenced. 

SUMMARY OF THE INVENTION 

The present invention comprises polynucleotides, such as complete cDNA sequences 
and/or sequences of genomic DNA encompassing complete genes, fragments of genes, and/or 

2 5 regulatory elements of genes and/or regions with other functions and/or intergenic regions, 

hereinafter collectively referred to as Sequence-Determined DNA Fragments (SDFs), from 
different plant species, particularly corn, wheat, soybean, rice m&Arabidopsis thaliana, and 
other plants and or mutants, variants, fragments or fusions of said SDFs and polypeptides or 
proteins derived therefrom. In some instances, the SDFs span the entirety of a protein-coding 

3 0 segment. In some instances, the entirety of an mRNA is represented. Other objects of the 

invention that are also represented by SDFs of the invention are control sequences, such as, but 
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not limited to, promoters. Complements of any sequence of the invention are also considered 
part of the invention. 

Other objects of the invention are polynucleotides comprising exon sequences, 
polynucleotides comprising intron sequences, polynucleotides comprising introns together with 
exons, intron/exon junction sequences, 5' untranslated sequences, and 3 ? untranslated sequences 
of the SDFs of the present invention. Polynucleotides representing the joinder of any exons 
described herein, in any arrangement, for example, to produce a sequence encoding any 
desirable amino acid sequence are within the scope of the invention. 

The present invention also resides in probes useful for isolating and identifying nucleic 
acids that hybridize to an SDF of the invention. The probes can be of any length, but more 
typically are 12-2000 nucleotides in length; more typically, 15 to 200 nucleotides long; even 
more typically, 18 to 100 nucleotides long. 

Yet another object of the invention is a method of isolating and/or identifying nucleic 
acids using the following steps: 

(a) contacting a probe of the instant invention with a polynucleotide sample under 
conditions that permit hybridization and formation of a polynucleotide duplex; and 

(b) detecting and/or isolating the duplex of step (a). 

The conditions for hybridization can be from low to moderate to high stringency 
conditions. The sample can include a polynucleotide having a sequence unique in a plant 
genome. Probes and methods of the invention are useful, for example, without limitation, for 
mapping of genetic traits and/or for positional cloning of a desired fragment of genomic DNA. 

Probes and methods of the invention can also be used for detecting alternatively spliced 
messages within a species. Probes and methods of the invention can further be used to detect or 
isolate related genes in other plant species using genomic DNA (gDNA) and/or cDNA libraries. 
In some instances, especially when longer probes and low to moderate stringency hybridization 
conditions are used; the probe will hybridize to a plurality of cDNA and/or gDNA sequences of 
a plant. This approach is useful for isolating representatives of gene families which are 
identifiable by possession of a common functional domain in the gene product or which have 
common cis-acting regulatory sequences. This approach is also useful for identifying 
orthologous genes from other organisms. 

The present invention also resides in constructs for modulating the expression of the 
genes comprised of all or a fragment of an SDF. The constructs comprise all or a fragment of 
the expressed SDF, or of a complementary sequence. Examples of constructs include 
ribozymes comprising RNA encoded by an SDF or by a sequence complementary thereto, 
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antisense constructs, constructs comprising coding regions or parts thereof, constructs 
comprising promoters, introns, untranslated regions, scaffold attachment regions, methylating 
regions, enhancing or reducing regions, DNA and chromatin conformation modifying 
sequences, etc. Such constructs can be constructed using viral, plasmid, bacterial artificial 
5 chromosomes (BACs), plasmid artificial chromosomes (PACs), autonomous plant plasmids, 
plant artificial chromosomes or other types of vectors and exist in the plant as autonomous 
replicating sequences or as DNA integrated into the genome* When inserted into a host cell 
the construct is, preferably, functionally integrated with, or operatively linked to, a 
heterologous polynucleotide. For instance, a coding region from an SDF might be operably 

1 0 linked to a promoter that is functional in a plant. 

The present invention also resides in host cells, including bacterial or yeast cells or plant 
cells, and plants that harbor constructs such as described above. Another aspect of the invention 
relates to methods for modulating expression of specific genes in plants by expression of the 
coding sequence of the constructs, by regulation of expression of one or more endogenous genes 

15 in a plant or by suppression of expression of the polynucleotides of the invention in a plant. 
Methods of modulation of gene expression include without limitation (1) inserting into a host 
cell additional copies of a polynucleotide comprising a coding sequence; (2) modulating an 
endogenous promoter in a host cell; (3) inserting antisense or ribozyme constructs into a host 
cell and (4) inserting into a host cell a polynucleotide comprising a sequence encoding a variant 

2 0 , fragment, or fusion of the native polypeptides of the instant invention. 

BRIEF DESCRIPTION OF THE TABLES 

The sequences of exemplary SDFs and polypeptides corresponding to the coding 
2 5 sequences of the instant invention are described in Table 1 and Table 2. Table 1 refers to a 
number of Maximum Length Sequences" or MLS." Each MLS corresponds to the longest 
cDNA obtained, either by cloning or by the prediction from genomic sequence. The 
sequence of the MLS is the cDNA sequence as described in the Av subsection of Table L 

30 Table 1 includes the following information relating to each MLS: 



I. 



cDNA Sequence 



A. 



5'UTR 



B. 



Coding Sequence 



C. 



3'UTR 
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II. Genomic Sequence 

A. Exons 

B. Introns 

C. Promoters 

III. Link of cDNA Sequences to Clone IDs 

IV. Multiple Transcription Start Sites 

V. Polypeptide Sequences 

A. Signal Peptide 

B. Domains 

C. Related Polypeptides 

VL Related Polynucleotide Sequences 



I. cDNA SEQUENCE 

Table 1 indicates which sequence in Table 2 represents the sequence of each MLS. 
The MLS sequence can comprise 5' and 3' UTR as well as coding sequences. In addition, 
specific cDNA clone numbers aiso are included in Table 1 when the MLS sequence relates to 
a specific cDNA clone. 

A. 5' UTR 

The location of the 5 ? UTR can be determined by comparing the most 5 ? MLS 
sequence with the corresponding genomic sequence as indicated in Table 1. The sequence 
that matches, beginning at any of the transcriptional start sites and ending at the last 
nucleotide before any of the translational start sites corresponds to the 5' UTR. 

B. Coding Region 

The coding region is the sequence in any open reading frame found in the MLS. 
Coding regions of interest are indicated in the PolyP SEQ subsection Table 1. 

C. 3' UTR 

The location of the 3' UTR can be determined by comparing the most 3 7 MLS 
sequence with the corresponding genomic sequence as indicated in Table 1. The sequence 
that matches, beginning at the translational stop site and ending at the last nucleotide of the 
MLS corresponds to the 3 ? UTR. 
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T1. GENOMIC SEQUENCE 

Further, Table 1 indicates the specific gi" number of the genomic sequence if the 
5 sequence resides in a public databank. For each genomic sequence, Table 1 indicates which 
regions are included in the MLS. These regions can include the 5' and 3' UTRs as well as 
the coding sequence of the MLS. See, for example, the scheme below: 



10 



15 Region 1 Region 2 Region 3 

I 5' UTR | Exon | ~\ Exon I — | Exon I 3' UTR I 

| - | I " I 

2 0 Promoter I Intron Intron I 

Translational sto P Codon 

Start Site 



25 



Table 1 reports the first and last base of each region that are included in an MLS 
3 0 sequence. An example is shown below: 
gi No. 47000: 
37102 ... 37497 
37593 ... 37925 

The numbers indicate that the MLS contains the following sequences from two regions of gi 

3 5 No. 47000; a first region including bases 37102-37497, and a second region including bases 

37593-37925. 

A FXON SEQUENCES 

The location of the exons can be determined by comparing the sequence of the 

4 0 regions from the genomic sequences with the corresponding MLS sequence as indicated by 

Table 1. 
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I TNITTAL EXON 

To determine the location of the initial exon, information from the 

(1) polypeptide sequence section; 

(2) cDNA polynucleotide section; and 

(3) the genomic sequence section 

of Table 1 is used. First, the polypeptide section will indicate where the translational start 
site is located in the MLS sequence. The MLS sequence can be matched to the genomic 
sequence that corresponds to the MLS. Based on the match between the MLS and 
corresponding genomic sequences , the location of the translational start site can be 
determined in one of the regions of the genomic sequence. The location of this translational 
start site is the start of the first exon. 

Generally, the last base of the exon of the corresponding genomic region, in which the 
translational start site was located, will represent the end of the initial exon. In some cases, 
the initial exon will end with a stop codon, when the initial exon is the only exon. 

In the case when sequences representing the MLS are in the positive strand of the 
corresponding genomic sequence, the last base will be a larger number than the first base. 
When the sequences representing the MLS are in the negative strand of the corresponding 
genomic sequence, then the last base will be a smaller number than the first base. 

ii. INTERNAL EXONS 

Except for the regions that comprise the 5 ' and 3' UTRs, initial exon, and terminal 
exon, the remaining genomic regions that match the MLS sequence are the internal exons. 
Specifically, the bases defining the boundaries of the remaining regions also define the 
intron/exon junctions of the internal exons. 

iii. TERMINAL EXON 

As with the initial exon, the location of the terminal exon is determined with 
information from the 

(1) polypeptide sequence section; 

(2) cDNA polynucleotide section; and 

(3) the genomic sequence section 

of Table L The polypeptide section will indicate where the stop codon is located in the MLS 
sequence. The MLS sequence can be matched to the corresponding genomic sequence. 
Based on the match between MLS and corresponding genomic sequences, the location of the 
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stop codon can be determined in one of the regions of the genomic sequence. The location of 
this stop codon is the end of the terminal exon. Generally, the first base of the exon of the 
corresponding genomic region that matches the cDNA sequence, in which the stop codon was 
located, will represent the beginning of the terminal exon. In some cases, the translational 
start site will represent the start of the terminal exon, which will be the only exon. 

In the case when the MLS sequences are in the positive strand of the corresponding 
genomic sequence, the last base will be a larger number than the first base. When the MLS 
sequences are in the negative strand of the corresponding genomic sequence, then the last 
base will be a smaller number than the first base. 

R. TNTRON SEQUENCES 

In addition, the introns corresponding to the MLS are defined by identifying the 
genomic sequence located between the regions where the genomic sequence comprises 
exons. Thus, introns are defined as starting one base downstream of a genomic region 
comprising an exon, and end one base upstream from a genomic region comprising an exon. 

C. PROMOTER SEQUENCES 

As indicated below, promoter sequences corresponding to the MLS are defined as 
sequences upstream of the first exon; more usually, as sequences upstream of the first of 
multiple transcription start sites; even more usually as sequences about 2,000 nucleotides 
upstream of the first of multiple transcription start sites. 

TTT. LINK of cDNA SEQUENCES to CLONE IDs 

As noted above, Table 1 identifies the cDNA clone(s) that relate to each MLS. The 
MLS sequence can be longer than the sequences included in the cDNA clones. In such a 
case, Table 1 indicates the region of the MLS that is included in the clone. If either the 5' or 
y termini of the cDNA clone sequence is the same as the MLS sequence, no mention will be 
made. 

IV. Multiple Transcription Start Sites 

Initiation of transcription can occur at a number of sites of the gene. Table 1 indicates 
the possible multiple transcription sites for each gene. In Table 1, the location of the 
transcription start sites can be either a positive or negative number. 
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The positions indicated by positive numbers refer to the transcription start sites as located in 
the MLS sequence. The negative numbers indicate the transcription start site within the 
genomic sequence that corresponds to the MLS. 

To determine the location of the transcription start sites with the negative numbers, 
the MLS sequence is aligned with the corresponding genomic sequence. In the instances 
when a public genomic sequence is referenced, the relevant corresponding genomic sequence 
can be found by direct reference to the nucleotide sequence indicated by the gi" number 
shown in the public genomic DNA section of Table 1. When the position is a negative 
number, the transcription start site is located in the corresponding genomic sequence 
upstream of the base that matches the beginning of the MLS sequence in the alignment. The 
negative number is relative to the first base of the MLS sequence which matches the genomic 
sequence corresponding to the relevant gi" number. 

In the instances when no public genomic DNA is referenced, the relevant nucleotide 
sequence for alignment is the nucleotide sequence associated with the amino acid sequence 
designated by gi" number of the later PolyP SEQ subsection. 

V. Polypeptide Sequences 

The PolyP SEQ subsection lists SEQ ID NOs and Ceres SEQ ID NO for polypeptide 
sequences corresponding to the coding sequence of the MLS sequence and the location of the 
translational start site with the coding sequence of the MLS sequence. 

The MLS sequence can have multiple translational start sites and can be capable of 
producing more than one polypeptide sequence. 

A. Signal Peptide 

Table 1 also indicates in subsection (B) the cleavage site of the putative signal peptide 
of the polypeptide corresponding to the coding sequence of the MLS sequence. Typically, 
signal peptide coding sequences comprise a sequence encoding the first residue of the 
polypeptide to the cleavage site residue. 

R. Domains 

Subsection (C) provides information regarding identified domains (where present) 
within the polypeptide and (where present) a name for the polypeptide domain. 



C. Related Polypeptides 
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Subsection (Dp) provides (where present) information concerning amino acid 
sequences that are found to be related and have some percentage of sequence identity to the 
polypeptide sequences of Table 1 and Table 2. These related sequences are identified by a 
gi" number. 

5 

VT. Related Polynucleotide Sequences 

Subsection (Dn) provides polynucleotide sequences (where present) that are related to 
and have some percentage of sequence identity to the MLS or corresponding genomic 
sequence. 

10 



Jr\U V I C V 1 a 11 U 11 


Description 


lVldA LvCIl. i3CL|. 


Maximum Length Sequence 


Id lu 


Related to 




Clone ID numbers 


Pub gDNA 


Public Genomic DNA 


gi No. 


gi number 


Gen. seq. in cDNA 


Genomic Sequence in cDNA 

(Each region for a single gene prediction is 

listed on a separate line. 

In the case of multiple gene predictions, the 

group of regions relating to a single prediction 

are separated by a blank line) 


^/\CJ CI.yrN.rl OUtKl 


cDNA sequence 


Pst Annln ^FO TD NO 


Patent Application SEQ ID NO: 


- Ceres SEQ ID NO: 1673877 


Ceres SEQ ID NO: 


- SEQ # w. TSS 


Location within the cDNA sequence, SEQ ID 
NO: ? of Transcription Start Sites which are 
listed below 


-Clone ID #:#-># 


Clone ID comprises bases # to # of the cDNA 
Sequence 


PolyP SEQ 


Polypeptide Sequence 


- Pat. Appln. SEQ ID NO: 


Patent Application SEQ ID NO: 


- Ceres SEQ ID NO 


Ceres SEO ID NO: 


- Loc. SEQ ID NO: @ nt. 


Location of translational start site in cDNA of 
SEQ ID NO: at nucleotide number 


(C) Pred. PP Norn. & Annot. 


Nomination and Annotation of Domains within 
Predicted Polypeptide(s) 


- (Title) 


Name of Domain 


- Loc. SEQ ID NO #:#-># aa. 


Location of the domain within the polypeptide 
of SEQ ID NO: from # to # amino acid 
residues. 


(Dp) Rel. AA SEQ 


Related Amino Acid Sequences 


- Align. NO 


Alignment number 


- gi No 


Gi number 
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Abbreviation 


Description 


- Desp. 


JL/caCrip Hull 


- % Idnt. 


Percent identity 


- Align. Len. 


Alignment Length 


- Loc. SEQ ID NO: # -> # aa 


Location within SEQ ID NO: from # to # 
amino acid residue. 



DETAILED DESCRIPTION OF THE INVENTION 

The invention relates to (I) polynucleotides and methods of use thereof, such as 
LA. Probes, Primers and Substrates; 
IB. Methods of Detection and Isolation; 

B. 1 . Hybridization; 

B.2. Methods of Mapping; 

B. 3. Southern Blotting; 

B A Isolating cDNA from Related Organisms; 
B .5 . Isolating and/or Identifying Orthologous Genes 
IC Methods of Inhibiting Gene Expression 

C. l. Antisense 

C.2. Ribozyme Constructs; 

C.3. Chimeraplasts; 

C.4 Co-Suppression; 

C.5. Transcriptional Silencing 

C.6. Other Methods to Inhibit Gene Expression 

ID. Methods of Functional Analysis; 

IE. Promoter Sequences and Their Use; 

IF. UTRs and/or Intron Sequences and Their Use; and 

IG. Coding Sequences and Their Use. 

The invention also relates to (II) polypeptides and proteins and methods of use thereof, 
such as HA. Native Polypeptides and Proteins 
A.l Antibodies 
A.2 In Vitro Applications 
IIB. Polypeptide Variants, Fragments and Fusions 
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B.l Variants 
B.2 Fragments 
B.3 Fusions 

The invention also includes (III) methods of modulating polypeptide production, such as 

IIIA. Suppression 

A.l Antisense 
A.2 Ribozymes 
A.3 Co-suppression 

A.4 Insertion of Sequences into the Gene to be Modulated 
A.5 Promoter Modulation 

A. 6 Expression of Genes containing Dominant-Negative Mutations 

IIIB. Enhanced Expression 

B. l Insertion of an Exogenous Gene 
B.2 Promoter Modulation 

The invention further concerns (IV) gene constructs and vector construction, such as 
IVA. Coding Sequences 
IVB. Promoters 
IVC. Signal Peptides 

The invention still further relates to 
V Transformation Techniques 



Definitions 



Allelic variant An allelic variant" is an alternative form of the same SDF, which 

resides at the same chromosomal locus in the organism. Allelic variations can occur in any 
portion of the gene sequence, including regulatory regions. Allelic variants can arise by 
normal genetic variation in a population. Allelic variants can also be produced by genetic 
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engineering methods. An allelic variant can be one that is found in a naturally occurring 
plant, including a cultivar or ecotype. An allelic variant may or may not give rise to a 
phenotypic change, and may or may not be expressed. An allele can result in a detectable 
change in the phenotype of the trait represented by the locus. A phenotypically silent allele 
can give rise to a product* 

Alternatively spliced messages Within the context of the current invention, 
alternatively spliced messages" refers to mature mRNAs originating from a single gene with 
variations in the number and/or identity of exons, introns and/or intron-exon junctions. 

Chimeric The term chimeric" is used to describe genes, as defined supra, or contructs 
wherein at least two of the elements of the gene or construct, such as the promoter and the 
coding sequence and/or other regulatory sequences and/or filler sequences and/or complements 
thereof, are heterologous to each other. 

Constitutive Promoter: Promoters referred to herein as "constitutive promoters" actively promote 
transcription under most, but not necessarily all, environmental conditions and states of 
development or cell differentiation. Examples of constitutive promoters include the cauliflower 
mosaic virus (CaMV) 35S transcript initiation region and the V or T promoter derived from 
T-DNA of Agrobacterium tumefaciens, and other transcription initiation regions from various 
plant genes, such as the maize ubiquitin-1 promoter, known to those of skilL 

Coordinately Expressed: The term coordinately expressed," as used in the current 
invention, refers to genes that are expressed at the same or a similar time and/or stage and/or 
under the same or similar environmental conditions. 

Domain: Domains are fingerprints or signatures that can be used to characterize 

protein families and/or parts of proteins. Such fingerprints or signatures can comprise 
conserved (1) primary sequence, (2) secondary structure, and/or (3) three-dimensional 
conformation. Generally, each domain has been associated with either a family of proteins or 
motifs. Typically, these families and/or motifs have been correlated with specific in-vitro 
and/or in-vivo activities. A domain can be any length, including the entirety of the sequence 
of a protein. Detailed descriptions of the domains, associated families and motifs, and 
correlated activities of the polypeptides of the instant invention are described below. 
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Usually, the polypeptides with designated domain(s) can exhibit at least one activity that is 
exhibited by any polypeptide that comprises the same domain(s). 

Endogenous The term endogenous," within the context of the current invention refers to 
any polynucleotide, polypeptide or protein sequence which is a natural part of a cell or 
organisms regenerated from said celL 

Exogenous Exogenous," as referred to within, is any polynucleotide, polypeptide 

or protein sequence, whether chimeric or not, that is initially or subsequently introduced into 
the genome of an individual host cell or the organism regenerated from said host cell by any 
means other than by a sexual cross. Examples of means by which this can be accomplished 
are described below, and include Agrobacter zwm-mediated transformation (of dicots - e.g. 
Salomon et aL EMBO J. 3:141 (1984); Herrera-Estrella et al. EMBO J. 2:987 (1983); of 
monocots, representative papers are those by Escudero et aL, Plant J. 10:355 (1996), Ishida et 
al., Nature Biotechnology 14:745 (1996), May et al., Bio/Technology 13:486 (1995)), biolistic 
methods (Armaleo et aL, Current Genetics 17:97 1990)), electroporation, in planta 
techniques, and the like. Such a plant containing the exogenous nucleic acid is referred to 
here as a T 0 for the primary transgenic plant and Ti for the first generation. The term 
exogenous" as used herein is also intended to encompass inserting a naturally found element 
into a non-naturally found location. 

Filler sequence: As used herein, filler sequence" refers to any nucleotide sequence that 
is inserted into DNA construct to evoke a particular spacing between particular components 
such as a promoter and a coding region and may provide an additional attribute such as a 
restriction enzyme site. 

Gene: The term gene," as used in the context of the current invention, encompasses all 
regulatory and coding sequence contiguously associated with a single hereditary unit with a 
genetic function (see SCHEMATIC 1). Genes can include non-coding sequences that 
modulate the genetic function that include, but are not limited to, those that specify 
polyadenylation, transcriptional regulation, DNA conformation, chromatin conformation, 
extent and position of base methylation and binding sites of proteins that control all of these. 
Genes comprised of exons" (coding sequences), which may be interrupted by introns" 
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(non-coding sequences), encode proteins. A gene's genetic function may require only RNA 
expression or protein production, or may only require binding of proteins and/or nucleic acids 
without associated expression. In certain cases, genes adjacent to one another may share 
sequence in such a way that one gene will overlap the other. A gene can be found within the 
genome of an organism, artificial chromosome, plasmid, vector, etc., or as a separate isolated 
entity. 

Gene Family: Gene family" is used in the current invention to describe a group of 
functionally related genes, each of which encodes a separate protein. 

Heterologous sequences: Heterologous sequences" are those that are not operatively 

linked or are not contiguous to each other in nature. For example, a promoter from corn is 
considered heterologous to znArabidopsis coding region sequence. Also, a promoter from a 
gene encoding a growth factor from corn is considered heterologous to a sequence encoding the 
corn receptor for the growth factor. Regulatory element sequences, such as UTRs or 3' end 
termination sequences that do not originate in nature from the same gene as the coding sequence 
originates from, are considered heterologous to said coding sequence. Elements operatively 
linked in nature and contiguous to each other are not heterologous to each other. On the other 
hand, these same elements remain operatively linked but become heterologous if other filler 
sequence is placed between them. Thus, the promoter and coding sequences of a corn gene 
expressing an amino acid transporter are not heterologous to each other, but the promoter and 
coding sequence of a corn gene operatively linked in a novel manner are heterologous. 

Homologous gene In the current invention, homologous gene" refers to a gene that shares 
sequence similarity with the gene of interest. This similarity may be in only a fragment of the 
sequence and often represents a functional domain such as, examples including without 
limitation a DNA binding domain, a domain with tyrosine kinase activity, or the like. The 
functional activities of homologous genes are not necessarily the same. 

Inducible Promoter An inducible promoter" in the context of the current invention 

refers to a promoter which is regulated under certain conditions, such as light, chemical 
concentration, protein concentration, conditions in an organism, cell, or organelle, etc. A typical 
example of an inducible promoter, which can be utilized with the polynucleotides of the present 
invention, is PARSK1, the promoter from the Arabidopsis gene encoding a serine-threonine 
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kinase enzyme, and which promoter is induced by dehydration, abscissic acid and sodium 
chloride (Wang and Goodman, Plant J. 8:37 (1995)) Examples of environmental conditions that 
may affect transcription by inducible promoters include anaerobic conditions, elevated 
temperature, or the presence of light. 

Intergenic region Intergenic region," as used in the current invention, refers to 
nucleotide sequence occurring in the genome that separates adjacent genes. 

Mutant gene In the current invention, mutant" refers to a heritable change in DNA 
sequence at a specific location. Mutants of the current invention may or may not have an 
associated identifiable function when the mutant gene is transcribed. 

Orthologous Gene In the current invention orthologous gene" refers to a second gene that 
encodes a gene product that performs a similar function as the product of a first gene. The 
orthologous gene may also have a degree of sequence similarity to the first gene. The 
orthologous gene may encode a polypeptide that exhibits a degree of sequence similarity to a 
polypeptide corresponding to a first gene. The sequence similarity can be found within a 
functional domain or along the entire length of the coding sequence of the genes and/or their 
corresponding polypeptides. 

Percentage of sequence identity "Percentage of sequence identity," as used herein, is 
determined by comparing two optimally aligned sequences over a comparison window, where 
the fragment of the polynucleotide or amino acid sequence in the comparison window may 
comprise additions or deletions (e.g., gaps or overhangs) as compared to the reference sequence 
(which does not comprise additions or deletions) for optimal alignment of the two sequences. 
The percentage is calculated by determining the number of positions at which the identical 
nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched 
positions, dividing the number of matched positions by the total number of positions in the 
window of comparison and multiplying the result by 100 to yield the percentage of sequence 
identity. Optimal alignment of sequences for comparison may be conducted by the local 
homology algorithm of Smith and Waterman Add.APL. Math. 2:482 (1981), by the homology 
alignment algorithm of Needleman and Wunsch J. Mol Biol 48:443 (1970), by the search for 
similarity method of Pearson and Lipman Proa Natl. Acad. ScL (USA) 85: 2444 (1988), by 
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computerized implementations of these algorithms (GAP, BESTFIT, BLAST, PASTA, and 
TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 
Science Dr., Madison, WI), or by inspection. Given that two sequences have been identified for 
comparison, GAP and BESTFIT are preferably employed to determine their optimal alignment. 
Typically, the default values of 5.00 for gap weight and 030 for gap weight length are used. 
The term "substantial sequence identity" between polynucleotide or polypeptide sequences 
refers to polynucleotide or polypeptide comprising a sequence that has at least 80% sequence 
identity, preferably at least 85%, more preferably at least 90% and most preferably at least 95%, 
even more preferably, at least 96%, 97%, 98% or 99% sequence identity compared to a 
reference sequence using the programs. 

Plant Promoter A plant promoter" is a promoter capable of initiating transcription in 

plant cells and can drive or facilitate transcription of a fragment of the SDF of the instant 
invention or a coding sequence of the SDF of the instant invention. Such promoters need not 
be of plant origin. For example, promoters derived from plant viruses, such as the CaMV35S 
promoter or from Agrobacterium tumefaciens such as the T-DNA promoters, can be plant 
promoters. A typical example of a plant promoter of plant origin is the maize ubiquitin-1 (ubi- 
l)promoter known to those of skill. 



Promoter: The term "promoter," as used herein, refers to a region of sequence 

determinants located upstream from the start of transcription of a gene and which are involved in 
recognition and binding of RNA polymerase and other proteins to initiate and modulate 
transcription. A basal promoter is the minimal sequence necessary for assembly of a 
transcription complex required for transcription initiation. Basal promoters frequently include a 
TATA box" element usually located between 15 and 35 nucleotides upstream from the site of 
initiation of transcription. Basal promoters also sometimes include a CCAAT box" element 
(typically a sequence CCAAT) and/or a GGGCG sequence, usually located between 40 and 200 
nucleotides, preferably 60 to 120 nucleotides, upstream from the start site of transcription. 

Public sequence: The term public sequence as used in the context of the instant 
application, refers to any sequence that has been deposited in a publicly accessible database. 
This term encompasses both amino acid and nucleotide sequences. Such sequences are 
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publicly accessible, for example, on the BLAST databases on the NCBI FTP web site 
(accessible at ncbi.nlm.gov/blast). The database at the NCBI GTP site utilizes gi" numbers 
assigned by NCBI as a unique identifier for each sequence in the databases, thereby 
providing a non-redundant database for sequence from various databases, including 
5 GenBank, EMBL, DBBJ, (DNA Database of Japan) and PDB (Brookhaven Protein Data 
Bank). 

Regulatory Sequence The term regulatory sequence," as used in the current 

invention, refers to any nucleotide sequence that influences transcription or translation 
1 0 initiation and rate, and stability and/or mobility of the transcript or polypeptide product. 

Regulatory sequences include, but are not limited to, promoters, promoter control elements, 
protein binding sequences, 5' and 3' UTRs, transcriptional start site, termination sequence, 
polyadenylation sequence, introns, certain sequences within a coding sequence, etc. 

1 5 Related Sequences: Related sequences" refer to either a polypeptide or a nucleotide 
sequence that exhibits some degree of sequence similarity with a sequence described by 
Table 1 and Table 2. 

Scaffold Attachment Region (SAR) As used herein, scaffold attachment region" is a DNA 

2 0 sequence that anchors chromatin to the nuclear matrix or scaffold to generate loop domains 

that can have either a transcriptionally active or inactive structure (Spiker and Thompson 
(1996) Plant Physiol. 110: 15-21). 

Sequence-determined DNA fragments (SDFs) Sequence-determined DNA fragments" 

25 as used in the current invention are isolated sequences of genes, fragments of genes, 

intergenic regions or contiguous DNA from plant genomic DNA or cDNA or RNA the 
sequence of which has been determined. 

Signal Peptide A signal peptide" as used in the current invention is an amino acid 

3 0 sequence that targets the protein for secretion, for transport to an intracellular compartment or 

organelle or for incorporation into a membrane. Signal peptides are indicated in the tables 
and a more detailed description located below. 
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Specific Promoter In the context of the current invention, specific promoters" refers to a 
subset of inducible promoters that have a high preference for being induced in a specific 
tissue or cell and/or at a specific time during development of an organism. By high 
preference" is meant at least 3-fold, preferably 5-fold, more preferably at least 10-fold still 
5 more preferably at least 20-fold, 50-fold or 100-fold increase in transcription in the desired 
tissue over the transcription in any other tissue. Typical examples of temporal and/or tissue 
specific promoters of plant origin that can be used with the polynucleotides of the present 
invention, are: PTA29, a promoter which is capable of driving gene transcription specifically in 
tapetum and only during anther development (Koltonow et aL, Plant Cell 2:1201 (1990); RCc2 

1 0 and RCc3, promoters that direct root-specific gene transcription in rice (Xu et al., Plant MoL 
Biol 27:237 (1995); TobRB27, a root-specific promoter from tobacco (Yamamoto et al., Plant 
Cell 3:371 (1991)). Examples of tissue-specific promoters under developmental control include 
promoters that initiate transcription only in certain tissues or organs, such as root, ovule, fruit, 
seeds, or flowers. Other suitable promoters include those from genes encoding storage proteins 

15 or the lipid body membrane protein, oleosin. A few root-specific promoters are noted above. 

Stringency "Stringency" as used herein is a function of probe length, probe composition (G 
+ C content), and salt concentration, organic solvent concentration, and temperature of 
hybridization or wash conditions. Stringency is typically compared by the parameter T m , which 
2 0 is the temperature at which 50% of the complementary molecules in the hybridization are 

hybridized, in terms of a temperature differential from T m . High stringency conditions are those 
providing a condition of T m - 5°C to T m - 10°C. Medium or moderate stringency conditions are 
those providing T m - 20°C to T m - 29°C. Low stringency conditions are those providing a 
condition of T m - 40°C to T m - 48°C. The relationship of hybridization conditions to T m (in °C) is 

2 5 expressed in the mathematical equation 

T m = 81.5 -16.6(logio[Na + ]) + 0.41(%G+C) - (600/N) (1) 

where N is the length of the probe. This equation works well for probes 14 to 70 nucleotides in 
length that are identical to the target sequence. The equation below for T m of DNA-DNA 
hybrids is useful for probes in the range of 50 to greater than 500 nucleotides, and for conditions 

3 0 that include an organic solvent (formamide). 



T m = 81.5+16.6 log {[Na + ]/(l+0.7[Na + ])}+ 0.41(%G+C)-500/L 0.63(%formamide) (2) 
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where L is the length of the probe in the hybrid. (P. Tijessen, Hybridization with Nucleic 
Acid Probes" in Laboratory Techniques in Biochemistry and Molecular Biology . P.C. vand 
der Vliet, ed., c. 1993 by Elsevier, Amsterdam.) The T m of equation (2) is affected by the 
nature of the hybrid; for DNA-RNA hybrids T m is 10-15°C higher than calculated, for RNA- 
5 RNA hybrids T m is 20-25°C higher. Because the T m decreases about 1 °C for each 1% 

decrease in homology when a long probe is used (Bonner et al., J. Mol. Biol. 81:123 (1973)), 
stringency conditions can be adjusted to favor detection of identical genes or related family 
members. 

Equation (2) is derived assuming equilibrium and therefore, hybridizations according 
10 to the present invention are most preferably performed under conditions of probe excess and 
for sufficient time to achieve equilibrium. The time required to reach equilibrium can be 
shortened by inclusion of a hybridization accelerator such as dextran sulfate or another high 
volume polymer in the hybridization buffer. 

Stringency can be controlled during the hybridization reaction or after hybridization 
1 5 has occurred by altering the salt and temperature conditions of the wash solutions used. The 
formulas shown above are equally valid when used to compute the stringency of a wash 
solution. Preferred wash solution stringencies lie within the ranges stated above; high 
stringency is 5-8°C below T m? medium or moderate stringency is 26-29°C below T m and low 
stringency is 45-48°C below T m . 

20 

Substantially free of A composition containing A is substantially free of B when 

at least 85% by weight of the total A+B in the composition is A. Preferably, A comprises at 
least about 90% by weight of the total of A+B in the composition, more preferably at least 
about 95% or even 99% by weight. For example, a plant gene or DNA sequence can be 
2 5 considered substantially free of other plant genes or DNA sequences. 

Translational start site In the context of the current invention, a translational start 

site" is usually an ATG in the cDNA transcript, more usually the first ATG. A single cDNA, 
however, may have multiple translational start sites. 

30 

Transcription start site Transcription start site" is used in the current invention to 

describe the point at which transcription is initiated. This point is typically located about 25 
nucleotides downstream from a TFIID binding site, such as a TATA box. Transcription can 
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initiate at one or more sites within the gene, and a single gene may have multiple transcriptional 
start sites, some of which may be specific for transcription in a particular cell-type or tissue. 

Untranslated region (UTR) A UTR" is any contiguous series of nucleotide bases that is 
5 transcribed, but is not translated. These untranslated regions may be associated with 

particular functions such as increasing mRNA message stability. Examples of UTRs include, 
but are not limited to polyadenylation signals, terminations sequences, sequences located 
between the transcriptional start site and the first exon (5' UTR) and sequences located 
between the last exon and the end of the mRNA (3 ? UTR). 

0 

Variant: The term variant" is used herein to denote a polypeptide or protein or 

polynucleotide molecule that differs from others of its kind in some way. For example, 
polypeptide and protein variants can consist of changes in amino acid sequence and/or charge 
and/or post-translational modifications (such as glycosylation, etc). 



DETAILED DESCRIPTION OF THE INVENTION 

I. Polynucleotides 

Exemplified SDFs of the invention represent fragments of the genome of corn, wheat, 
2 0 rice, soybean or Arabidopsis and/or represent mRNA expressed from that genome. The isolated 
nucleic acid of the invention also encompasses corresponding fragments of the genome and/or 
cDNA complement of other organisms as described in detail below. 

Polynucleotides of the invention can be isolated from polynucleotide libraries using 
primers comprising sequence similar to those described by Table 1 and Table 2. See, for 
2 5 example, the methods described in Sambrook et aL, supra. 

Alternatively, the polynucleotides of the invention can be produced by chemical 
synthesis. Such synthesis methods are described below. 



30 
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It is contemplated that the nucleotide sequences presented herein may contain some 
small percentage of errors. These errors may arise in the normal course of determination of 
nucleotide sequences. Sequence errors can be corrected by obtaining seeds deposited under the 
accession numbers cited above, propagating them, isolating genomic DNA or appropriate 
5 mRNA from the resulting plants or seeds thereof, amplifying the relevant fragment of the 

genomic DNA or mRNA using primers having a sequence that flanks the erroneous sequence, 
and sequencing the amplification product. 

I. A. Probes. Primers and Substrates 

SDFs of the invention can be applied to substrates for use in array applications such 
1 0 as, but not limited to, assays of global gene expression, for example under varying conditions 
of development, growth conditions. The arrays can also be used in diagnostic or forensic 
methods (WO95/35505, US 5,445,943 and US 5,410,270). 

Probes and primers of the instant invention will hybridize to a polynucleotide 
comprising a sequence in Tables 1 and 2. Though many different nucleotide sequences can 
1 5 encode an amino acid sequence, the sequences of Tables 1 and 2 are generally preferred for 
encoding polypeptides of the invention. However, the sequence of the probes and/or primers 
of the instant invention need not be identical to those in Tables 1 and 2 or the complements 
thereof. For example, some variation in probe or primer sequence and/or length can allow 
additional family members to be detected, as well as orthologous genes and more 

2 0 taxonomically distant related sequences. Similarly, probes and/or primers of the invention 

can include additional nucleotides that serve as a label for detecting the formed duplex or for 
subsequent cloning purposes. 

Probe length will vary depending on the application. For use as primers, probes are 
12-40 nucleotides, preferably 18-30 nucleotides long. For use in mapping, probes are 
25 preferably 50 to 500 nucleotides, preferably 100-250 nucleotides long. For Southern 
hybridizations, probes as long as several kilobases can be used as explained below. 

The probes and/or primers can be produced by synthetic procedures such as the 
triester method of Matteucci et al. J. Am. Chem. Soc. 103:3185( 1981); or according to Urdea 
et al. Proc. Natl Acad. 80:7461 (1981) or using commercially available automated 

3 0 oligonucleotide synthesizers. 



LB. Methods of Detection and Isolation 
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The polynucleotides of the invention can be utilized in a number of methods known to 
those skilled in the art as probes and/or primers to isolate and detect polynucleotides, 
including, without limitation: Southerns, Northerns, Branched DNA hybridization assays, 
polymerase chain reaction, and microarray assays, and variations thereof. Specific methods 
5 given by way of examples, and discussed below include: 

Hybridization 

Methods of Mapping 

Southern Blotting 

Isolating cDNA from Related Organisms 
1 0 Isolating and/or Identifying Orthologous Genes. 

Also, the nucleic acid molecules of the invention can used in other methods, such as high 

density oligonucleotide hybridizing assays, described, for example, in U.S. Pat. Nos. 

6,004,753; 5,945,306; 5,945,287; 5,945,308; 5,919,686; 5,919,661; 5,919,627; 5,874,248; 

5,871,973; 5,871,971; and 5,871,930; and PCT Pub. Nos. WO 9946380; WO 9933981; WO 
15 9933870; WO 9931252; WO 9915658; WO 9906572; WO 9858052; WO 9958672; and WO 

9810858. 



B.l. Hybridization 
The isolated SDFs of Tables 1 and 2 of the present invention can be used as probes 
2 0 and/or primers for detection and/or isolation of related polynucleotide sequences through 
hybridization. Hybridization of one nucleic acid to another constitutes a physical property 
that defines the subject SDF of the invention and the identified related sequences. Also, such 
hybridization imposes structural limitations on the pair. A good general discussion of the 
factors for determining hybridization conditions is provided by Sambrook et al. ("Molecular 

2 5 Cloning, a Laboratory Manual, 2nd ed., c. 1989 by Cold Spring Harbor Laboratory Press, Cold 

Spring Harbor, NY; see esp., chapters 11 and 12). Additional considerations and details of the 
physical chemistry of hybridization are provided by G.H. Keller and M.M. Manak DNA 
Probes", 2 nd Ed. pp. 1-25, c. 1993 by Stockton Press, New York, NY. 

Depending on the stringency of the conditions under which these probes and/or primers 

3 0 are used, polynucleotides exhibiting a wide range of similarity to those in Tables 1 and 2 can be 

detected or isolated. When the practitioner wishes to examine the result of membrane 
hybridizations under a variety of stringencies, an efficient way to do so is to perform the 
hybridization under a low stringency condition, then to wash the hybridization membrane 
under increasingly stringent conditions. 
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When using SDFs to identify orthologous genes in other species, the practitioner will 
preferably adjust the amount of target DNA of each species so that, as nearly as is practical, 
the same number of genome equivalents are present for each species examined. This 
5 prevents faint signals from species having large genomes, and thus small numbers of genome 
equivalents per mass of DNA, from erroneously being interpreted as absence of the 
corresponding gene in the genome. 

The probes and/or primers of the instant invention can also be used to detect or isolate 
nucleotides that are identical" to the probes or primers. Two nucleic acid sequences or 
1 0 polypeptides are said to be "identical" if the sequence of nucleotides or amino acid residues, 
respectively, in the two sequences is the same when aligned for maximum correspondence as 
described below. 

Isolated polynucleotides within the scope of the invention also include allelic variants of 
the specific sequences presented in Tables 1 and 2. The probes and/or primers of the invention 
1 5 can also be used to detect and/or isolate polynucleotides exhibiting at least 80% sequence 
identity with the sequences of Tables 1 and 2 or fragments thereof. 

With respect to nucleotide sequences, degeneracy of the genetic code provides the 
possibility to substitute at least one base of the base sequence of a gene with a different base 
2 0 without causing the amino acid sequence of the polypeptide produced from the gene to be 

changed. Hence, the DNA of the present invention may also have any base sequence that has 
been changed from a sequence in Tables 1 and 2 by substitution in accordance with 
degeneracy of genetic code. References describing codon usage include: Carels et aL, J. MoL 
Evol 46: 45 (1998) and Fennoy et aL, Nucl Acids Res. 21(23) : 5294 (1993). 

25 B.2. Mapping 

The isolated SDF DNA of the invention can be used to create various types of genetic 
and physical maps of the genome of corn, Arabidopsis, soybean, rice, wheat, or other plants. 
Some SDFs may be absolutely associated with particular phenotypic traits, allowing 
construction of gross genetic maps. While not all SDFs will immediately be associated with 

30 a phenotype, all SDFs can be used as probes for identifying polymorphisms associated with 
phenotypes of interest. Briefly, one method of mapping involves total DNA isolation from 
individuals. It is subsequently cleaved with one or more restriction enzymes, separated 
according to mass, transferred to a solid support, hybridized with SDF DNA and the pattern 
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of fragments compared. Polymorphisms associated with a particular SDF are visualized as 
differences in the size of fragments produced between individual DNA samples after 
digestion with a particular restriction enzyme and hybridization with the SDF. After 
identification of polymorphic SDF sequences, linkage studies can be conducted. By using the 
5 individuals showing polymorphisms as parents in crossing programs, F2 progeny 

recombinants or recombinant inbreds, for example, are then analyzed. The order of DNA 
polymorphisms along the chromosomes can be determined based on the frequency with 
which they are inherited together versus independently. The closer two polymorphisms are 
together in a chromosome the higher the probability that they are inherited together. 

1 0 Integration of the relative positions of all the polymorphisms and associated marker SDFs can 
produce a genetic map of the species, where the distances between markers reflect the 
recombination frequencies in that chromosome segment. 

The use of recombinant inbred lines for such genetic mapping is described for 
Arabidopsis by Alonso-Blanco et al. {Methods in Molecular Biology, voL82, Arabidopsis 

1 5 Protocols", pp. 137-146, J.M. Martinez-Zapater and J. Salinas, eds., c. 1998 by Humana 

Press, Totowa, NJ) and for corn by Burr ( Mapping Genes with Recombinant Inbreds", pp. 
249-254. In Freeling, M. and V. Walbot (Ed.), The Maize Handbook, c. 1994 by Springer- 
Verlag New York, Inc.: New York, NY, USA; Berlin Germany; Burr et al. Genetics (1998) 
118: 519; Gardiner, J. et al., (1993) Genetics 134: 917). This procedure, however, is not 

2 0 limited to plants and can be used for other organisms (such as yeast) or for individual cells. 

The SDFs of the present invention can also be used for simple sequence repeat (SSR) 
mapping. Rice SSR mapping is described by Morgante et al. (The Plant Journal (1993) 3: 
165), Panaud et aL (Genome (1995) 38: 1170); Senior et aL (Crop Science (1996) 36: 1676), 
Taramino et al. (Genome (1996) 39: 277) and Ahn et al. (Molecular and General Genetics 

25 (1993) 241: 483-90). SSR mapping can be achieved using various methods. In one instance, 
polymorphisms are identified when sequence specific probes contained within an SDF 
flanking an SSR are made and used in polymerase chain reaction (PCR) assays with template 
DNA from two or more individuals of interest. Here, a change in the number of tandem 
repeats between the SSR-flanking sequences produces differently sized fragments (U.S. 

30 Patent 5,766,847). Alternatively, polymorphisms can be identified by using the PCR 

fragment produced from the SSR-flanking sequence specific primer reaction as a probe 
against Southern blots representing different individuals (U.H. Refseth et al., (1997) 
Electrophoresis 18: 1519). 
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Genetic and physical maps of crop species have many uses. For example, these maps 
can be used to devise positional cloning strategies for isolating novel genes from the mapped 
crop species. In addition, because the genomes of closely related species are largely syntenic 
(that is, they display the same ordering of genes within the genome), these maps can be used 
5 to isolate novel alleles from relatives of crop species by positional cloning strategies. 

The various types of maps discussed above can be used with the SDFs of the 
invention to identify Quantitative Trait Loci (QTLs). Many important crop traits, such as the 
solids content of tomatoes, are quantitative traits and result from the combined interactions of 
several genes. These genes reside at different loci in the genome, oftentimes on different 

1 0 chromosomes, and generally exhibit multiple alleles at each locus. The SDFs of the 

invention can be used to identify QTLs and isolate specific alleles as described by de Vicente 
and Tanksley (Genetics 134:585 (1993)). In addition to isolating QTL alleles in present crop 
species, the SDFs of the invention can also be used to isolate alleles from the corresponding 
QTL of wild relatives. Transgenic plants having various combinations of QTL alleles can 

1 5 then be created and the effects of the combinations measured. Once a desired allele 

combination has been identified, crop improvement can be accomplished either through 
biotechnological means or by directed conventional breeding programs (for review see 
Tanksley and McCouch, Science 277:1063 (1997)). 

In another embodiment, the SDFs can be used to help create physical maps of the 

2 0 genome of corn, Arabidopsis and related species. Where SDFs have been ordered on a 
genetic map, as described above, they can be used as probes to discover which clones in 
large libraries of plant DNA fragments in YACs, BACs, etc. contain the same SDF or similar 
sequences, thereby facilitating the assignment of the large DNA fragments to chromosomal 
positions. Subsequently, the large BACs, YACs, etc. can be ordered unambiguously by more 

2 5 detailed studies of their sequence composition (e.g. Marra et al. (1997) Genomic Research 
7:1072-1084) and by using their end or other sequences to find the identical sequences in 
other cloned DNA fragments. The overlapping of DNA sequences in this way allows large 
contigs of plant sequences to be built that, when sufficiently extended, provide a complete 
physical map of a chromosome. Sometimes the SDFs themselves will provide the means of 

30 joining cloned sequences into a contig. 

The patent publication WO95/35505 and U.S. Patents 5,445,943 and 5,410,270 
describe scanning multiple alleles of a plurality of loci using hybridization to arrays of 
oligonucleotides. These techniques are useful for each of the types of mapping discussed 
above. 
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Following the procedures described above and using a plurality of the SDFs of 
the present invention, any individual can be genotyped. These individual genotypes can be 
used for the identification of particular cultivars, varieties, lines, ecotypes and genetically 
modified plants or can serve as tools for subsequent genetic studies involving multiple 
5 phenotypic traits. 

B.3 Southern Blot Hybridization 

The sequences from Tables 1 and 2 can be used as probes for various hybridization 
techniques. These techniques are useful for detecting target polynucleotides in a sample or 

1 0 for determining whether transgenic plants, seeds or host cells harbor a gene or sequence of 
interest and thus might be expected to exhibit a particular trait or phenotype. 

In addition, the SDFs from the invention can be used to isolate additional members of 
gene families from the same or different species and/or orthologous genes from the same or 
different species. This is accomplished by hybridizing an SDF to, for example, a Southern 

1 5 blot containing the appropriate genomic DNA or cDNA. Given the resulting hybridization 
data, one of ordinary skill in the art could distinguish and isolate the correct DNA fragments 
by size, restriction sites, sequence and stated hybridization conditions from a gel or from a 
library. 

Identification and isolation of orthologous genes from closely related species and 
2 0 alleles within a species is particularly desirable because of their potential for crop 

improvement. Many important crop traits, such as the solid content of tomatoes, result from 
the combined interactions of the products of several genes residing at different loci in the 
genome. Generally, alleles at each of these loci can make quantitative differences to the trait. 
By identifying and isolating numerous alleles for each locus from within or different species, 

2 5 transgenic plants with various combinations of alleles can be created and the effects of the 

combinations measured. Once a more favorable allele combination has been identified, crop 
improvement can be accomplished either through biotechnological means or by directed 
conventional breeding programs (Tanksley et al. Science 277:1063(1997)). 

3 0 The results from hybridizations of the SDFs of the invention to, for example, 

Southern blots containing DNA from another species can also be used to generate restriction 
fragment maps for the corresponding genomic regions. These maps provide additional 
information about the relative positions of restriction sites within fragments, further 
distinguishing mapped DNA from the remainder of the genome. 
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Physical maps can be made by digesting genomic DNA with different combinations 
of restriction enzymes. 

Probes for Southern blotting to distinguish individual restriction fragments can range 
in size from 15 to 20 nucleotides to several thousand nucleotides. More preferably, the probe 
5 is 100 to 1,000 nucleotides long for identifying members of a gene family when it is found 
that repetitive sequences would complicate the hybridization. For identifying an entire 
corresponding gene in another species, the probe is more preferably the length of the gene, 
typically 2,000 to 10,000 nucleotides, but probes 50-1,000 nucleotides long might be used. 
Some genes, however, might require probes up to 1,500 nucleotides long or overlapping 
1 0 probes constituting the full-length sequence to span their lengths. 

Also, while it is preferred that the probe be homogeneous with respect to its sequence, 
it is not necessary. For example, as described below, a probe representing members of a gene 
family having diverse sequences can be generated using PCR to amplify genomic DNA or 
RNA templates using primers derived from SDFs that include sequences that define the gene 
15 family. 

For identifying corresponding genes in another species, the next most preferable 
probe is a cDNA spanning the entire coding sequence, which allows all of the mRNA-coding 
fragment of the gene to be identified. Probes for Southern blotting can easily be generated 
from SDFs by making primers having the sequence at the ends of the SDF and using corn or 
2 0 Arabidopsis genomic DNA as a template. In instances where the SDF includes sequence 
conserved among species, primers including the conserved sequence can be used for PCR 
with genomic DNA from a species of interest to obtain a probe. 

Similarly, if the SDF includes a domain of interest, that fragment of the SDF can be used to 
make primers and, with appropriate template DNA, used to make a probe to identify genes 

2 5 containing the domain. Alternatively, the PCR products can be resolved, for example by gel 

electrophoresis, and cloned and/or sequenced. Using Southern hybridization, the variants of 
the domain among members of a gene family, both within and across species, can be 
examined. 

B.4.1 Isolating DNA from Related Organisms 

3 0 The SDFs of the invention can be used to isolate the corresponding DNA from other 

organisms. Either cDNA or genomic DNA can be isolated. For isolating genomic DNA, a 
lambda, cosmid, BAC or YAC, or other large insert genomic library from the plant of interest 
can be constructed using standard molecular biology techniques as described in detail by 
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Sambrook et al. 1989 (Molecular Cloning: A Laboratory Manual, 2 nd ed. Cold Spring Harbor 
Laboratory Press, New York) and by Ausubel et al. 1992 (Current Protocols in Molecular 
Biology, Greene Publishing, New York). 

To screen a phage library, for example, recombinant lambda clones are plated out on 
5 appropriate bacterial medium using an appropriate E. coli host strain. The resulting plaques 
are lifted from the plates using nylon or nitrocellulose filters. The plaque lifts are processed 
through denaturation, neutralization, and washing treatments following the standard protocols 
outlined by Ausubel et al. (1992). The plaque lifts are hybridized to either radioactively 
labeled or non-radioactively labeled SDF DNA at room temperature for about 16 hours, 

1 0 usually in the presence of 50% formamide and 5X SSC (sodium chloride and sodium citrate) 
buffer and blocking reagents. The plaque lifts are then washed at 42°C with 1% Sodium 
Dodecyl Sulfate (SDS) and at a particular concentration of SSC. The SSC concentration used 
is dependent upon the stringency at which hybridization occurred in the initial Southern blot 
analysis performed. For example, if a fragment hybridized under medium stringency (e.g., 

1 5 Tm - 20°C), then this condition is maintained or preferably adjusted to a less stringent 
condition (e.g., Tm-30°C) to wash the plaque lifts. Positive clones show detectable 
hybridization e.g., by exposure to X-ray films or chromogen formation. The positive clones 
are then subsequently isolated for purification using the same general protocol outlined 
above. Once the clone is purified, restriction analysis can be conducted to narrow the region 

2 0 corresponding to the gene of interest. The restriction analysis and succeeding subcloning 
steps can be done using procedures described by, for example Sambrook et al. (1989) cited 
above. 

The procedures outlined for the lambda library are essentially similar to those used for 
YAC library screening, except that the YAC clones are harbored in bacterial colonies. The 
25 YAC clones are plated out at reasonable density on nitrocellulose or nylon filters supported 
by appropriate bacterial medium in petri plates. Following the growth of the bacterial clones, 
the filters are processed through the denaturation, neutralization, and washing steps following 
the procedures of Ausubel et al. 1992. The same hybridization procedures for lambda library 
screening are followed. 

To isolate cDNA, similar procedures using appropriately modified vectors are 
employed. For instance, the library can be constructed in a lambda vector appropriate for 
cloning cDNA such as Xgtll. Alternatively, the cDNA library can be made in a plasmid 
vector. cDNA for cloning can be prepared by any of the methods known in the art, but is 
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preferably prepared as described above. Preferably, a cDNA library will include a high 
proportion of full-length clones. 

B. 5. Isolating and/or Identifying Orthologous Genes 
Probes and primers of the invention can be used to identify and/or isolate 
polynucleotides related to those in Tables 1 and 2. Related polynucleotides are those that are 
native to other plant organisms and exhibit either similar sequence or encode polypeptides with 
5 similar biological activity. One specific example is an orthologous gene. Orthologous genes 
have the same functional activity. As such, orthologous genes may be distinguished from 
homologous genes. The percentage of identity is a function of evolutionary separation and, in 
closely related species, the percentage of identity can be 98 to 100%. The amino acid sequence 
of a protein encoded by an orthologous gene can be less than 75% identical, but tends to be at 
1 0 least75% or at least 80% identical, more preferably at least 90%, most preferably at least 95% 
identical to the amino acid sequence of the reference protein. 

To find orthologous genes, the probes are hybridized to nucleic acids from a species of interest 
under low stringency conditions, preferably one where sequences containing as much as 40-45% 
mismatches will be able to hybridize. This condition is established by T m - 40°C to Tm - 48°C 

1 5 (see below). Blots are then washed under conditions of increasing stringency. It is preferable 
that the wash stringency be such that sequences that are 85 to 100% identical will hybridize. 
More preferably, sequences 90 to 100% identical will hybridize and most preferably only 
sequences greater than 95% identical will hybridize. One of ordinary skill in the art will 
recognize that, due to degeneracy in the genetic code, amino acid sequences that are identical 

2 0 can be encoded by DNA sequences as little as 67% identical or less. Thus, it is preferable, for 
example, to make an overlapping series of shorter probes, on the order of 24 to 45 nucleotides, 
and individually hybridize them to the same arrayed library to avoid the problem of degeneracy 
introducing large numbers of mismatches. 

As evolutionary divergence increases, genome sequences also tend to diverge. Thus, 

2 5 one of skill will recognize that searches for orthologous genes between more divergent 

species will require the use of lower stringency conditions compared to searches between 
closely related species. Also, degeneracy of the genetic code is more of a problem for 
searches in the genome of a species more distant evolutionarily from the species that is the 
source of the SDF probe sequences. 

3 0 Therefore the method described in Bouckaert et al., U.S. Ser. No. 60/121,700 Atty . 

Dkt. No. 2750-117P, Client Dkt. No. 00010.001, filed February 25, 1999, hereby 
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incorporated in its entirety by reference, can be applied to the SDFs of the present invention 
to isolate related genes from plant species which do not hybridize to the corn Arabidopsis, 
soybean, rice, wheat, and other plant sequences of Tables 1 and 2. 

Identification of the relationship of nucleotide or amino acid sequences among plant 
species can be done by comparing the nucleotide or amino acid sequences of SDFs of the 
present application with nucleotide or amino acid sequences of other SDFs such as those 
present in applications listed in the table below: 
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2750-0925P 



Client Docket 


Fifed 


Application No. 


80116.002 


5/12/00 


09/570,738 


00025.002 


5/12/00 


09/570,582 


80117.002 


5/12/00 


09/570,768 


91062.001 


5/12/00 


60/203,91 1 


80254.001 


5/12/00 


60/203,915 


0021 9.001 


5/12/00 


60/203,916 


91 006.002 


5/12/00 


09/570,581 


00220.001 


5/15/00 


60/204,388 


80255.001 


5/15/00 


60/204,122 


91 063.001 


5/15/00 


60/204,395 


00221.001 


5/16/00 


60/204,568 


80256.001 


5/16/00 


60/204,569 


92002.003 


5/17/00 


60/205,325 


00222.001 


5/17/00 


60/204,830 


92001 .003 


5/17/00 


60/205,233 


80257.001 


5/17/00 


60/204,829 


80258.001 


5/18/00 


60/205,058 


91007.100 


5/18/00 


02/306,202 


91007.101 


5/18/00 


00/304,161 


00223.001 


5/18/00 


60/205,201 


91007.002 


5/18/00 


09/573,655 


91007.102 


5/18/00 


00/004,850 


80259.001 


5/19/00 


60/205,243 


00224.001 


5/19/00 


60/205,242 


91064.001 


5/22/00 


60/205,574 


00225.001 


5/22/00 


60/205,572 


80260.001 


5/22/00 


60/205,576 


80261 .001 


5/23/00 


60/206,319 


00226.001 


5/23/00 


60/206,316 


00227.001 


5/24/00 


60/206,553 


80262.001 


5/24/00 


60/206,545 


91065.001 


5/25/00 


60/206,988 


80264.001 


5/26/00 


60/207,354 


00229.001 


5/26/00 


60/207,239 


80263.001 


5/26/00 


60/207,243 


91066.001 


5/26/00 


60/207,242 


00228.001 


5/26/00 


60/207,367 


80265.001 


5/30/00 


60/207,329 


00230.001 


5/30/00 


60/207,452 


91067.001 


5/30/00 


60/207,291 


91068.001 


6/1/00 


60/208,324 


80268.001 


6/1/00 


60/208,312 


80266.002 


6/1/00 


60/208,421 


80267.002 


6/1/00 


60/208,648 


00231 .001 


6/1/00 


60/208,329 


80267.001 


6/2/00 


60/208,649 


80266.001 


6/2/00 


60/209,338 


00233.001 


6/5/00 


60/208,921 


80270.001 


6/5/00 


60/208,920 


91070.001 


6/5/00 


60/208,917 
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United States 


2750-0924P 


80269.001 


6/5/00 


60/208,918 


United States 


2750-0923P 


00232.001 


6/5/00 


60/208,910 


United States 


2750-0922P 


91069.001 


6/5/00 


60/208,919 


United States 


2750-0931 P 


80271 .001 


6/8/00 


60/210,006 


United States 


2750-0930P 


00234.001 


6/8/00 


60/210,012 


United States 


2750-0929P 


91071.001 


6/8/00 


60/210,008 


Mexico 


2750-1 037F(MX) 


91 000.1 03(MX) 


6/9/00 


00/005,741 


United States 


2750-0932P 


00235.001 


6/9/00 


60/210,670 


United States 


2750-0928P 


00033.003 


6/9/00 


09/592,459 


Mexico 


2750-0928F(MX) 


00033.102 


6/9/00 


00/005,740 


Canada 


2750-0928F(CA) 


00033.100 


6/9/00 




United States 


2750-0933P 


80272.001 


6/9/00 


60/210,564 


Europe 


2750-0928F(EP) 


00033.101 


6/12/00 


00/304,943 


United States 


2750-0936P 


80273.001 


6/13/00 


60/211,214 


United States 


2750-0935P 


00237.001 


6/13/00 


60/211,213 


United States 


2750-0937P 


91072.001 


6/13/00 


60/211,210 


Canada 


2750-0934F(CA) 


00034.100 


6/14/00 




Europe 


2750-0934F(EP) 


00034.101 


6/14/00 


00/305,026 


United States 


2750-0934P 


00034.002 


6/14/00 


09/593,710 


Mexico 


2750-0934F(MX) 


00034.102 


6/14/00 


00/005,842 


United States 


2750-0938P 


00238.001 


6/1 5/00 


60/21 1 ,539 


United States 


2750-0939P 


80274.001 


6/15/00 


60/211,540 


United States 


2750-0940P 


91074.001 


6/15/00 


60/21 1 ,538 


Europe 


2750-0942F(EP) 


00038.101 


6/16/00 




Europe 


2750-0941 F(EP) 


00037.101 


6/1 6/00 


00/305,144 


Mexico 


2750-0941 F(MX) 


00037.102 


6/16/00 


00/005,950 


Canada 


2750-0943F(CA) 


00039.100 


6/16/00 




Europe 


2750-0943F(EP) 


00039.101 


6/16/00 


09/595,331 


United States 


2750-0955P 


80132.024 


6/16/00 


Mexico 


2750-0942F(MX) 


00038.102 


6/16/00 




United States 


2750-0948P 


80132.017 


6/16/00 


09/595,329 


Mexico 


2750-0943F(MX) 


00039.102 


6/16/00 




United States 


2750-0947P 


80132.016 


6/16/00 


09/595,335 


United States 


2750-0945P 


80132.014 


6/16/00 


09/595,333 


United States 


2750-0944P 


80132.013 


6/1 6/00 


09/595,330 


United States 


2750-0943P 


00039.002 


6/16/00 


09/596,577 


United States 


2750-0954P 


80132.023 


6/16/00 


09/594,599 


United States 


2750-0950P 


80132.019 


6/16/00 


09/594,598 


Canada 


2750-0941 F(CA) 


00037.100 


6/16/00 




United States 


2750-0953P 


80132.022 


6/16/00 


09/595,298 


United States 


2750-0949P 


80132.018 


6/1 6/00 


09/595,332 


United States 


2750-0941 P 


00037.002 


6/16/00 


09/595,334 


United States 


2750-0951 P 


80132.020 


6/16/00 


09/594,595 


United States 


2750-0946P 


80132.015 


6/16/00 


09/595,328 


Canada 


2750-0942F(CA) 


00038.100 


6/16/00 




United States 


2750-0942P 


00038.002 


6/16/00 


09/595,326 


United States 


2750-0952P 


80132.021 


6/16/00 


09/594,597 


United States 


2750-0958P 


91075.001 


6/19/00 


60/212,623 


United States 


2750-0957P 


80275.001 


6/19/00 


60/212,649 


United States 


2750-0956P 


00239.001 


6/19/00 


60/212,414 
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United States 


2750-0960P 


80276.001 


6/20/00 


60/212,713 




United States 


2750-0959P 


00240.001 


6/20/00 


60/212,677 




United States 


2750-0961 P 


91076.001 


6/20/00 


60/212,727 




United States 


2750-0971 P 


00042.003 


6/21/00 


09/602,660 




Canada 


2750-0971 F(CA) 


00042.100 


6/21/00 


02/309,874 




Europe 


2750-0971 F(EP) 


00042.101 


6/21/00 


00/305,249 




Mexico 


2750-0971 F(MX) 


00042.102 


6/21/00 


00/006,142 




United States 


2750-0967P 


91079.001 


6/22/00 


60/213,270 




United States 


2750-0966P 


80278.001 


6/22/00 


60/213,220 




United States 


2750-0965P 


00246.001 


6/22/00 


60/213,221 




Mexico 


2750-0972F(MX) 


00043.102 


6/22/00 


00/006,625 




Europe 


2750-0972F(EP) 


00043.101 


6/22/00 


00/305,270 




Canada 


2750-0972F(CA) 


00043.100 


6/22/00 


02/309,793 




United States 


2750-0972P 


00043.002 


6/22/00 


09/602,152 




United States 


2750-0963P 


80277.001 


6/22/00 


60/213,249 




United States 


2750-0962P 


00242.001 


6/22/00 


60/213,271 




United States 


2750-0964P 


91077.001 


6/22/00 


60/213,195 




Mexico 


2750-0974F(MX) 


00042.102 


6/23/00 






Mexico 


2750-0975F(MX) 


00045.102 


6/23/00 




y * 


Europe 


2750-0973F(EP) 


00044.101 


6/23/00 


00/305,305 




Canada 


2750-0975F(CA) 


00045.100 


6/23/00 






Europe 


2750-0974F(EP) 


00042.101 


6/23/00 




™» 


Mexico 


2750-0973F(MX) 


00044.102 


6/23/00 


00/006,267 




United States 


2750-0973P 


00044.002 


6/23/00 


09/602,205 




Canada 


2750-0973F(CA) 


00044.100 


6/23/00 


02/309,889 




Europe 


2750-0975F(EP) 


00045.101 


6/23/00 




O 


United States 


2750-0975P 


00045.002 


6/23/00 


09/602,016 




Canada 


2750-0974F(CA) 


00042.100 


6/23/00 




yj 


United States 


2750-1 035P 


00248.001 


6/27/00 


60/214,534 


-i-si 


United States 


2750-0968P 


00247.001 


6/27/00 






United States 


2750-0969P 


80279.001 


6/27/00 


60/214,762 




United States 


2750-0970P 


91080.001 


6/27/00 


60/214,524 




United States 


2750-1 036P 


80280.001 


6/27/00 


60/214,535 




Canada 


2750-0976F(CA) 


00046.100 


6/28/00 






Europe 


2750-0976F(EP) 


00046.101 


6/28/00 






Mexico 


2750-0976F(MX) 


00046.102 


6/28/00 


60/214,800 




United States 


2750-1 038P 


00249.001 


6/28/00 




United States 


2750-1 039P 


80281 .001 


6/28/00 


60/214,799 




United States 


2750-0976P 


00046.002 


6/28/00 


09/605,843 




Canada 


2750-0977F(CA) 


00048.100 


6/29/00 






Europe 


2750-0977F(EP) 


00048.101 


6/29/00 






Mexico 


2750-0977F(MX) 


00048.102 


6/29/00 






United States 


2750-0977P 


00048.002 


6/29/00 


09/606,181 




Europe 


2750-0981 F(EP) 


00052.101 


6/30/00 






United States 


2750-0980P 


00051 .002 


6/30/00 


09/610,157 




Canada 


2750-0979F(CA) 


00050.100 


6/30/00 






Europe 


2750-0979F(EP) 


00050.101 


6/30/00 






Mexico 


2750-0979F(MX) 


00050.102 


6/30/00 


60/215,127 




United States 


2750-1 041 P 


80282.001 


6/30/00 




United States 


2750-1 040P 


00250.001 


6/30/00 


60/215,775 
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United States 


2750-0979P 


00050.002 


6/30/00 


09/607,081 


United States 


2750-0981 P 


00052.002 


6/30/00 


09/609,198 


Mexico 


2750-0980F(MX) 


00051.102 


6/30/00 




Canada 


2750-0980F(CA) 


00051.100 


6/30/00 




Mexico 


2750-0978F(MX) 


00049.102 


6/30/00 




Europe 


2750-0978F(EP) 


00049.101 


6/30/00 




Canada 


2750-0978F(CA) 


00049.100 


6/30/00 




Canada 


2750-0981 F(CA) 


00052.100 


6/30/00 




Europe 


2750-0980F(EP) 


00051.101 


6/30/00 




United States 


2750-0978P 


00049.002 


6/30/00 


09/608,960 


Mexico 


2750-0981 F(MX) 


00052.102 


6/30/00 




United States 


2750-1 043P 


80283.001 


7/5/00 


60/216,362 


United States 


2750-1 042P 


00252.001 


7/5/00 




Mexico 


2750-0982F(MX) 


00053.102 


7/6/00 




United States 


2750-0982P 


00053.002 


7/6/00 


09/61 1 ,409 


Europe 


2750-0982F(EP) 


00053.101 


7/6/00 




Canada 


2750-0982F(CA) 


00053.100 


7/6/00 




Mexico 


2750-0983F(MX) 


00054.102 


7/7/00 




United States 


2750-0984P 


00058.002 


7/7/00 


09/613,547 


Europe 


2750-0984F(EP) 


00058.101 


7/7/00 




Mexico 


2750-0984F(MX) 


00058.102 


7/7/00 




United States 


2750-0983P 


00054.002 


7/7/00 


09/612,645 


Canada 


2750-0984F(CA) 


00058.100 


7/7/00 




Europe 


2750-0983F(EP) 


00054.101 


7/7/00 




Canada 


2750-0983F(CA) 


00054.100 


7/7/00 




United States 


2750-1 044P 


91081.001 


7/11/00 


60/217,385 


United States 


2750-1 045P 


00253.001 


7/11/00 


60/217,476 


United States 


2750-1 046P 


80284.001 


7/11/00 


60/217,384 


United States 


2750-0985P 


00059.002 


7/12/00 


09/615,007 


Canada 


2750-0985F(CA) 


00059.100 


7/12/00 




Europe 


2750-0985F(EP) 


00059.101 


7/12/00 




United States 


2750-1 050P 


80286.002 


7/12/00 




Mexico 


2750-0985F(MX) 


00059.102 


7/12/00 




United States 


2750-1 052P 


80287.002 


7/12/00 




United States 


2750-0986P 


00060.002 


7/13/00 


09/615,748 


Canada 


2750-0986F(CA) 


00060.100 


7/13/00 




United States 


2750-1 054P 


80288.002 


7/13/00 




Mexico 


2750-0986F(MX) 


00060.102 


7/13/00 




Europe 


2750-0986F(EP) 


00060.101 


7/13/00 




Canada 


2750-0988F(CA) 


00062.100 


7/14/00 




United States 


2750-1 060P 


80134.017 


7/14/00 


09/614,388 


United States 


2750-0987P 


00061 .002 


7/14/00 


09/617,525 


Canada 


2750-0987F(CA) 


00061.100 


7/14/00 




Europe 


2750-0987F(EP) 


00061.101 


7/14/00 




United States 


2750-1 061 P 


80134.018 


7/14/00 


09/614,450 


United States 


2750-0988P 


00062.002 


7/14/00 


09/617,203 


Europe 


2750-0988F(EP) 


00062.101 


7/14/00 




Mexico 


2750-0988F(MX) 


00062.102 


7/14/00 




United States 


2750-1 048P 


80285.002 


7/14/00 




Mexico 


2750-0987F(MX) 


00061.102 


7/14/00 
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United States 


2750-1 055P 


91 082.001 


7/18/00 




United States 


2750-1 057P 


80291 .001 


7/18/00 




United States 


2750-1 056P 


00254.001 


7/18/00 




Mexico 


2750-0989F(MX) 


00064.102 


7/19/00 




Europe 


2750-0989F(EP) 


00064.101 


7/19/00 




United States 


2750-0989P 


00064.002 


7/19/00 


09/620,421 


United States 


2750-1 062P 


80134.020 


7/19/00 


09/617,683 


United States 


2750-1 063P 


80134.022 


7/19/00 


09/617,682 


United States 


2750-1 064P 


80134.024 


7/19/00 


09/617,681 


Canada 


2750-0989F(CA) 


00064.100 


7/1 9/00 




Canada 


2750-0990F(CA) 


00065.100 


7/20/00 




United States 


2750-0990P 


00065.002 


7/20/00 


09/621 ,323 


Europe 


2750-0990F(EP) 


00065.101 


7/20/00 




Mexico 


2750-0990F(MX) 


00065.102 


7/20/00 




United States 


2750-1 065P 


80134.026 


7/20/00 


09/620,998 


United States 


2750-1 066P 


80135.004 


7/20/00 


09/620,978 


United States 


2750-1 073P 


80134.025 


7/21/00 


09/620,314 


United States 


2750-1 072P 


80135.003 


7/21/00 


09/621,900 


United States 


2750-0992P 


00067.002 


7/21/00 


09/621 ,660 


Canada 


2750-0992F(CA) 


00067.100 


7/21/00 




Europe 


2750-0992F(EP) 


00067.101 


7/21/00 




United States 


2750-1 069P 


80134.023 


7/21/00 


09/620,313 


United States 


2750-1 071 P 


80134.021 


7/21/00 


09/620,390 


United States 


2750-1 070P 


80134.019 


7/21/00 


09/620,111 


United States 


2750-1 068P 


80134.016 


7/21/00 


09/620,393 


Mexico 


2750-0993F(MX) 


00069.102 


7/21/00 




Europe 


2750-0993F(EP) 


00069.101 


7/21/00 




Canada 


2750-0993F(CA) 


00069.100 


7/21/00 




United States 


2750-0991 P 


00066.002 


7/21/00 


09/621,630 


Canada 


2750-0991 F(CA) 


00066.100 


7/21/00 




United States 


2750-0993P 


00069.002 


7/21/00 


09/621 ,902 


Europe 


2750-0991 F(EP) 


00066.101 


7/21/00 




Mexico 


2750-0991 F(MX) 


00066.102 


7/21/00 




Mexico 


2750-0992F(MX) 


00067.102 


7/21/00 




United States 


2750-1 067P 


80134.015 


7/21/00 


09/620,394 


United States 


2750-1 059P 


00255.001 


7/25/00 




United States 


2750-1 079P 


80292.001 


7/25/00 




United States 


2750-1 081 P 


80293.001 


7/25/00 




United States 


2750-1 058P 


91083.001 


7/25/00 




United States 


2750-1 080P 


00256.001 


7/25/00 




Canada 


2750-0994F(CA) 


00070.100 


7/26/00 




Europe 


2750-0994F(EP) 


00070.101 


7/26/00 




Mexico 


2750-0994F(MX) 


00070.102 


7/26/00 




United States 


2750-0994P 


00070.002 


7/26/00 




United States 


2750-1 075P 


80136.005 


7/27/00 


09/628,987 


United States 


2750-0995P 


00071 .002 


7/27/00 


09/628,986 


Canada 


2750-0995F(CA) 


00071.100 


7/27/00 




United States 


2750-1 074P 


801 36.004 


7/27/00 


09/628,984 


Europe 


2750-0995F(EP) 


00071.101 


7/27/00 




Mexico 


2750-0995F(MX) 


00071.102 


7/27/00 
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Europe 


2750-0996F(EP) 


00072.101 


7/28/00 




Canada 


2750-0996F(CA) 


00072.100 


7/28/00 




Mexico 


2750-0996F(MX) 


00072.102 


7/28/00 




United States 


2750-0996P 


00072.002 


7/28/00 


09/628,552 


United States 


2750-1 049P 


80286.001 


8/1/00 




United States 


2750-1 256P 


80299.002 


8/1/00 




United States 


2750-1 077P 


80137.004 


8/2/00 


09/630,442 


Mexico 


2750-0997F(MX) 


00073.102 


8/2/00 




Europe 


2750-0997F(EP) 


00073.101 


8/2/00 




Canada 


2750-0997F(CA) 


00073.100 


8/2/00 




United States 


2750-0997P 


00073.002 


8/2/00 


09/632,340 


United States 


2750-1 076P 


80137.003 


8/2/00 


09/628,985 


United States 


2750-1 257P 


80299.003 


8/3/00 




United States 


2750-1 05 1P 


80287.001 


8/3/00 




United States 


2750-0998P 


00074.002 


8/3/00 


09/632,349 


United States 


2750-1 258P 


80299.004 


8/3/00 




United States 


2750-1 053P 


80288.002 


8/3/00 


60/223,100 


United States 


2750-1 255P 


80299.001 


8/3/00 




Mexico 


2750-0998F(MX) 


00074.102 


8/3/00 




Europe 


2750-0998F(EP) 


00074.101 


8/3/00 




Canada 


2750-0998F(CA) 


00074.100 


8/3/00 




United States 


2750-1 078P 


80138.003 


8/4/00 


09/635,640 


Europe 


2750-1 OOOF(EP) 


00077.101 


8/4/00 




Canada 


2750-1 OOOF(CA) 


00077.100 


8/4/00 




United States 


2750-1 000P 


00077.002 


8/4/00 


09/633,191 


Mexico 


2750-1 001 F(MX) 


00079.102 


8/4/00 




Mexico 


2750-1 OOOF(MX) 


00077.102 


8/4/00 




Europe 


2750-1 001 F(EP) 


00079.101 


8/4/00 




United States 


2750-1 001 P 


00079.002 


8/4/00 


09/633,239 


Mexico 


2750-0999F(MX) 


00076.102 


8/4/00 




United States 


2750-1 092P 


80138.004 


8/4/00 


09/635,643 


Canada 


2750-1 001 F(CA) 


00079.100 


8/4/00 




United States 


2750-0999P 


00076.002 


8/4/00 


09/633,051 


Canada 


2750-0999F(CA) 


00076.100 


8/4/00 




Europe 


2750-0999F(EP) 


00076.101 


8/4/00 




United States 


2750-1 047P 


80285.001 


8/7/00 




United States 


2750-1 094P 


80139.004 


8/9/00 


09/635,642 


United States 


2750-1 114P 


92001 .004 


8/9/00 


60/224,390 


Europe 


2750-1 002F(EP) 


00080.101 


8/9/00 




United States 


2750-1 115P 


92002.004 


8/9/00 




Mexico 


2750-1 002F(MX) 


00080.102 


8/9/00 




Canada 


2750-1 002F(CA) 


00080.100 


8/9/00 




United States 


2750-1 002P 


00080.002 


8/9/00 


09/635,277 


United States 


2750-1 093P 


80139.003 


8/10/00 


09/635,641 


Canada 


2750-1 005F(CA) 


00083.100 


8/11/00 




United States 


2750-11 01 P 


80141 .005 


8/11/00 




United States 


2750-1 005P 


00083.002 


8/11/00 


09/637,563 


United States 


2750-1 096P 


80142.004 


8/11/00 


09/637,780 


United States 


2750-1 1 03P 


80141.007 


8/11/00 




Europe 


2750-1 005F(EP) 


00083.101 


8/11/00 
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Canada 
United States 
Mexico 
United States 
Canada 
Europe 
Mexico 
United States 
Europe 
Mexico 
United States 
United States 
United States 
United States 
United States 
United States 
United States 
United States 
United States 
United States 
United States 
Europe 
Mexico 
Canada 
Mexico 
United States 
Europe 
United States 
United States 
Canada 
United States 
Canada 
Mexico 
United States 
mexico 
Europe 
United States 
United States 
Canada 
Europe 
United States 
United States 
United States 
United States 
United States 
United States 
United States 
United States 
United States 
United States 



2750-1 003F(CA) 


00081.100 


8/11/00 


2750-1 003P 


00081 .002 


8/11/00 


2750-1 005F(MX) 


00083.102 


8/11/00 


2750-1 004P 


00082.002 


8/11/00 


2750-1 004F(CA) 


00082.100 


8/11/00 


2750-1 004F(EP) 


00082.101 


8/11/00 


2750-1 004F(MX) 


00082.102 


8/11/00 


2750-1 1 04P 


80141.008 


8/11/00 


2750-1 003F(EP) 


00081.101 


8/11/00 


2750-1 003F(MX) 


00081.102 


8/11/00 


2750-1 102P 


80141.006 


8/11/00 


2750-1 083P 


91084.001 


8/14/00 


2750-1 082P 


80298.001 


8/14/00 


2750-1 084P 


80300.001 


8/15/00 


2750-1 085P 


91085.001 


8/15/00 


2750-1 106P 


80294.002 


8/16/00 


2750-1 1 1 0P 


80296.002 


8/1 6/00 


2750-1 108P 


80295.002 


8/16/00 


2750-1 112P 


80297.002 


8/16/00 


2750-1 095P 


80142.003 


8/16/00 


2750-1 006P 


00084.002 


8/17/00 


2750-1 006F(EP) 


00084.101 


8/17/00 


2750-1 006F(MX) 


00084.102 


8/1 7/00 


2750-1 006F(CA) 


00084.100 


8/1 7/00 


2750-1 007F(MX) 


00085.102 


8/1 8/00 


2750-1 1 1 1 P 


80297.001 


8/18/00 


2750-1 007F(EP) 


00085.101 


8/18/00 


2750-1 1 09P 


80296.001 


8/18/00 


2750-1 107P 


80295.001 


8/18/00 


2750-1 007F(CA) 


00085.100 


8/18/00 


2750-1 007P 


00085.002 


8/18/00 


2750-1 009F(CA) 


00087.100 


8/18/00 


2750-1 008F(MX) 


00086.102 


8/18/00 


2750-1 105P 


80294.001 


8/18/00 


2750-1 009F(MX) 


00087.102 


8/18/00 


2750-1 009F(EP) 


00087.101 


8/18/00 


2750-1 098P 


80143.004 


8/18/00 


2750-1 008P 


00086.002 


8/18/00 


2750-1 008F(CA) 


00086.100 


8/1 8/00 


2750-1 008F(EP) 


00086.101 


8/18/00 


2750-1 009P 


00087.002 


8/1 8/00 


2750-1 086P 


80301 .001 


8/21/00 


2750-1 087P 


91086.001 


8/21/00 


2750-1 117P 


80289.002 


8/23/00 


2750-1 118P 


80289.003 


8/23/00 


2750-1 122P 


80289.007 


8/23/00 


2750-1 132P 


80289.017 


8/23/00 


2750-1 124P 


80289.009 


8/23/00 


2750-11 31 P 


80289.01 6 


8/23/00 


2750-1 116P 


80289.001 


8/23/00 



09/637,837 
09/636,555 



09/643,672 
09/641,198 



60/226,324 



09/643,671 



09/640,695 
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2750-1 123P 


80289.008 


8/23/00 




2750-1 01 OF(EP) 


00088.101 


8/23/00 




2750-1 01 OP 


00088.002 


8/23/00 


09/643,854 


2750-1 01 OF(MX) 


00088.102 


8/23/00 




2750-1 162P 


80307.001 


8/23/00 




2750-11 63P 


91 087.001 


8/23/00 




2750-1 097P 


80143.003 


8/23/00 


09/649,866 


2750-1 01 OF(CA) 


00088.100 


8/23/00 




2750-1 01 3F(MX) 


00091.102 


8/25/00 




2750-1 099P 


80144.003 


8/25/00 


09/649,868 


2750-1 133P 


80289.018 


8/25/00 




2750-1 140P 


80289.025 


8/25/00 




2750-1 145P 


80289.030 


8/25/00 


60/227,728 


2750-1 150P 


80289.035 


8/25/00 




2750-11 51 P 


80289.036 


8/25/00 




2750-1 149P 


80289.034 


8/25/00 




2750-1 146P 


80289.031 


8/25/00 




2750-1 147P 


80289.032 


8/25/00 




2750-1 1 44P 


80289.029 


8/25/00 




2750-1 143P 


80289.028 


8/25/00 




2750-1 137P 


80289.022 


8/25/00 




2750-1 136P 


80289.021 


8/25/00 




2750-1 135P 


80289.020 


8/25/00 




2750-1 134P 


80289.019 


8/25/00 




2750-1 120P 


80289.005 


8/25/00 




2750-1 01 3P 


00091 .002 


8/25/00 




2750-1 01 3F(CA) 


00091.100 


8/25/00 




2750-1 152P 


80289.037 


8/25/00 




2750-1 01 2F(MX) 


00090.102 


8/25/00 




2750-1 126P 


80289.01 1 


8/25/00 




2750-1 01 3F(EP) 


00091.101 


8/25/00 




2750-1 01 1 P 


00089.002 


8/25/00 




2750-1 01 1F(CA) 


00089.100 


8/25/00 




2750-1 01 1F(EP) 


00089.101 


8/25/00 




2750-1 01 1F(MX) 


00089.102 


8/25/00 




2750-1 138P 


80289.023 


8/25/00 


60/227,729 


2750-1 119P 


80289.004 


8/25/00 


60/228,029 


2750-11 OOP 


80144.004 


8/25/00 


09/649,867 


2750-1 148P 


80289.033 


8/25/00 




2750-1 142P 


80289.027 


8/25/00 


60/227,733 


2750-1 128P 


80289.013 


8/25/00 


60/227,726 


2750-1 139P 


80289.024 


8/25/00 


60/228,167 


2750-11 21 P 


80289.006 


8/25/00 




2750-1 01 2F(EP) 


00090.101 


8/25/00 




2750-1 01 2F(CA) 


00090.100 


8/25/00 




2750-1 01 2P 


00090.002 


8/25/00 




2750-11 41 P 


80289.026 


8/25/00 




2750-1 127P 


80289.012 


8/25/00 




2750-1 225P 


80308.001 


8/30/00 




2750-1 01 4F(MX) 


00092.102 


8/30/00 
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Europe 


2750-1 01 4F(EP) 


00092.101 


8/30/00 




Canada 


2750-1 01 4F{CA) 


00092.100 


8/30/00 




United States 


2750-1 226P 


91088.001 


8/30/00 




United States 


2750-1 01 4P 


00092.002 


8/30/00 


09/651 ,370 


United States 


2750-1 090P 


80303.001 


8/31/00 




United States 


2750-1 089P 


80302.002 


8/31/00 




United States 


2750-1 088P 


80302.001 


8/31/00 




Mexico 


2750-1 01 5F(MX) 


00093.102 


8/31/00 




United States 


2750-1 01 5P 


00093.002 


8/31/00 


09/653,466 


United States 


2750-1 091 P 


80303.002 


8/31/00 




Canada 


2750-1 01 5F(CA) 


00093.100 


8/31/00 




Europe 


2750-1 01 5F(EP) 


00093.101 


8/31/00 




United States 


2750-1 113P 


80304.001 


8/31/00 




Mexico 


2750-1 01 6F(MX) 


00094.102 


9/1/00 




United States 


2750-1 01 6P 


00094.002 


9/1/00 


09/654,547 


Europe 


2750-1 01 6F(EP) 


00094.101 


9/1/00 




Canada 


2750-1 01 6F(CA) 


00094.100 


9/1/00 




United States 


2750-1 228P 


91089.001 


9/6/00 




United States 


2750-1 227P 


80309.001 


9/6/00 




United States 


2750-1 160P 


80306.001 


9/6/00 




United States 


2750-1 159P 


80305.002 


9/6/00 




United States 


2750-1 158P 


80305.001 


9/6/00 




United States 


2750-1 157P 


80304.002 


9/6/00 




United States 


2750-11 61 P 


80306.002 


9/6/00 




Mexico 


2750-1 01 7F(MX) 


00095.102 


9/7/00 




Canada 


2750-1 01 7F(CA) 


00095.100 


9/7/00 




United States 


2750-1 01 7P 


00095.002 


9/7/00 




Europe 


2750-1 01 7F(EP) 


00095.101 


9/7/00 




Mexico 


2750-1 01 8F(MX) 


00096.102 


9/8/00 




Europe 


2750-1 01 8F(EP) 


00096.101 


9/8/00 




Canada 


2750-1 01 8F(CA) 


00096.100 


9/8/00 




United States 


2750-1 01 8P 


00096.002 


9/8/00 


09/657,569 


United States 


2750-1 230P 


91 090.001 


9/13/00 




Europe 


2750-1 01 9F(EP) 


00098.101 


9/13/00 




Canada 


2750-1 01 9F(CA) 


00098.100 


9/13/00 




United States 


2750-1 01 9P 


00098.002 


9/13/00 


09/660,883 


United States 


2750-1 229P 


8031 0.001 


9/13/00 




Mexico 


2750-1 01 9F(MX) 


00098.102 


9/13/00 




United States 


2750-1 231 P 


8031 1 .001 


9/15/00 




United States 


2750-1 021 P 


00101.002 


9/15/00 




Canada 


2750-1 021 F(CA) 


00101.100 


9/15/00 




Europe 


2750-1 021 F(EP) 


00101.101 


9/1 5/00 




Mexico 


2750-1 021 F(MX) 


00101.102 


9/15/00 




Canada 


2750-1 020F(CA) 


00099.100 


9/15/00 




United States 


2750-1 232P 


91091.001 


9/15/00 




Europe 


2750-1 020F(EP) 


00099.101 


9/15/00 




Mexico 


2750-1 020F(MX) 


00099.102 


9/15/00 




United States 


2750-1 020P 


00099.002 


9/15/00 




United States 


2750-1 233P 


80312.001 


9/18/00 




United States 


2750-1 234P 


91092.001 


9/18/00 
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Mexico 
Europe 
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Canada 
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Canada 
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United States 
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United States 
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United States 
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2750-1 253P 
2750-1 022F(MX) 
2750-1 022F(EP) 
2750-1 022P 
2750-1 022F(CA) 
2750-1 254P 
2750-1 259P 
2750-1 260P 
2750-1 261 P 
2750-1 023F(MX) 
2750-1 025F(MX) 
2750-1 025P 
2750-1 023F(EP) 
2750-1 025F(CA) 
2750-1 023F(CA) 
2750-1 023P 
2750-1 024F(MX) 
2750-1 024F(EP) 
2750-1 024F(CA) 
2750-1 024P 
2750-1 025F(EP) 
2750-1 263P 
2750-1 264P 
2750-1 265P 
2750-1 262P 
2750-1 266P 
2750-1 267P 
2750-1 026P 
2750-1 268P 
2750-1 027P 
2750-1 269P 
2750-1 271 P 
2750-1 270P 
2750-1 028P 
2750-1 273P 
2750-1 272P 
2750-1 029P 
2750-1 274P 
2750-1 032P 
2750-1 031 P 
2750-1 030P 



80313.001 


9/20/00 


00102.102 


9/20/00 


00102.101 


9/20/00 


00102.002 


9/20/00 


00102.100 


9/20/00 


91093.001 


9/20/00 


80314.001 


9/21/00 


91094.001 


9/21/00 


91095.001 


9/21/00 


00103.102 


9/22/00 


00105.102 


9/22/00 


00105.002 


9/22/00 


00103.101 


9/22/00 


00105.100 


9/22/00 


00103.100 


9/22/00 


00103.002 


9/22/00 


00104.102 


9/22/00 


00104.101 


9/22/00 


00104.100 


9/22/00 


00104.002 


9/22/00 


00105.101 


9/22/00 


91096.001 


9/25/00 


80316.001 


9/25/00 


91097.001 


9/25/00 


80315.001 


9/25/00 


80317.001 


9/26/00 


91098.001 


9/27/00 


00106.002 


9/28/00 


91078.001 


9/28/00 


00107.002 


9/29/00 


91099.001 


9/29/00 


91100.001 


10/2/00 


80318.001 


10/2/00 


00108.002 


10/4/00 


91101.001 


10/4/00 


80319.001 


10/4/00 


00109.002 


10/5/00 


80320.001 


10/5/00 


00112.002 


10/6/00 


001 1 1 .002 


10/6/00 


00110.002 


10/6/00 



All applications listed in the table above are expressly incorporated herein by 
reference. 

The SDFs of the invention can also be used as probes to search for genes that are 
related to the SDF within a species. Such related genes are typically considered to be 
members of a gene family. In such a case, the sequence similarity will often be concentrated 
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into one or a few fragments of the sequence. The fragments of similar sequence that define 
the gene family typically encode a fragment of a protein or RNA that has an enzymatic or 
structural function. The percentage of identity in the amino acid sequence of the domain that 
defines the gene family is preferably at least 70%, more preferably 80 to 95 %, most 
preferably 85 to 99%. To search for members of a gene family within a species, a low 
stringency hybridization is usually performed, but this will depend upon the size, distribution 
and degree of sequence divergence of domains that define the gene family. SDFs 
encompassing regulatory regions can be used to identify coordinately expressed genes by 
using the regulatory region sequence of the SDF as a probe. 

In the instances where the SDFs are identified as being expressed from genes that confer 
a particular phenotype, then the SDFs can also be used as probes to assay plants of different 
species for those phenotypes. 

I.C. Methods to Inhibit Gene Expression 

The nucleic acid molecules of the present invention can be used to inhibit gene 
transcription and/or translation. Example of such methods include, without limitation: 
Antisense Constructs; 
Ribozyme Constructs; 
Chimeraplast Constructs; 
Co-Suppression; 
Transcriptional Silencing; and 
Other Methods of Gene Expression. 

C.l Antisense 

In some instances it is desirable to suppress expression of an endogenous or 
exogenous gene. A well-known instance is the FLAVOR-SAVOR™ tomato, in which the 
gene encoding ACC synthase is inactivated by an antisense approach, thus delaying softening 
of the fruit after ripening. See for example, U.S. Patent No. 5,859,330; U.S. Patent No. 
5,723,766; Oeller, et al, Science, 254:437-439(1991); and Hamilton et ^Nature, 346:284- 
287 (1990). Also, timing of flowering can be controlled by suppression of the FLOWERING 
LOCUS C (FLC); high levels of this transcript are associated with late flowering, while 
absence of FLC is associated with early flowering (S.D. Michaels et al., Plant Cell 11:949 
(1999). Also, the transition of apical meristem from production of leaves with associated 
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shoots to flowering is regulated by TERMINAL FLOWER1, APETALA1 and LEAFY, Thus, 
when it is desired to induce a transition from shoot production to flowering, it is desirable to 
suppress TFL1 expression (S J. Liljegren, Plant Cell 11:1007 (1999)). As another instance, 
arrested ovule development and female sterility result from suppression of the ethylene 
forming enzyme but can be reversed by application of ethylene (D. De Martinis et al., Plant 
Cell 11:1061 (1999)). The ability to manipulate female fertility of plants is useful in 
increasing fruit production and creating hybrids* 

In the case of polynucleotides used to inhibit expression of an endogenous gene, the 
introduced sequence need not be perfectly identical to a sequence of the target endogenous gene. 
The introduced polynucleotide sequence will typically be at least substantially identical to the 
target endogenous sequence. 

Some polynucleotide SDFs in Tables 1 and 2 represent sequences that are expressed 
in corn,wheat, rice, soybean Arabidopsis and/or other plants. Thus the invention includes 
using these sequences to generate antisense constructs to inhibit translation and/or 
degradation of transcripts of said SDFs, typically in a plant cell. 

To accomplish this, a polynucleotide segment from the desired gene that can hybridize to 
the mRNA expressed from the desired gene (the antisense segment") is operably linked to a 
promoter such that the antisense strand of RNA will be transcribed when the construct is present 
in a host cell. A regulated promoter can be used in the construct to control transcription of the 
antisense segment so that transcription occurs only under desired circumstances. 

The antisense segment to be introduced generally will be substantially identical to at 
least a fragment of the endogenous gene or genes to be repressed. The sequence, however, need 
not be perfectly identical to inhibit expression. Further, the antisense product may hybridize to 
the untranslated region instead of or in addition to the coding sequence of the gene. The vectors 
of the present invention can be designed such that the inhibitory effect applies to other proteins 
within a family of genes exhibiting homology or substantial homology to the target gene. 

For antisense suppression, the introduced antisense segment sequence also need not 
be full length relative to either the primary transcription product or the fully processed 
mRNA. Generally, a higher percentage of sequence identity can be used to compensate for 
the use of a shorter sequence. Furthermore, the introduced sequence need not have the same 
intron or exon pattern, and homology of non-coding segments may be equally effective. 
Normally, a sequence of between about 30 or 40 nucleotides and the full length of the 
transcript canbe used, though a sequence of at least about 100 nucleotides is preferred, a 
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sequence of at least about 200 nucleotides is more preferred, and a sequence of at least about 
500 nucleotides is especially preferred. 

C.2. Ribozymes 

It is also contemplated that gene constructs representing ribozymes and based on the 
SDFs in Tables 1 and 2 are an object of the invention. Ribozymes can also be used to inhibit 
expression of genes by suppressing the translation of the mRNA into a polypeptide. It is 
possible to design ribozymes that specifically pair with virtually any target RNA and cleave the 
phosphodiester backbone at a specific location, thereby functionally inactivating the target RNA. 
In carrying out this cleavage, the ribozyme is not itself altered, and is thus capable of recycling 
and cleaving other molecules, making it a true enzyme. The inclusion of ribozyme sequences 
within antisense RNAs confers RNA-cleaving activity upon them, thereby increasing the 
activity of the constructs. 

A number of classes of ribozymes have been identified. One class of ribozymes is 
derived from a number of small circular RNAs, which are capable of self-cleavage and 
replication in plants. The RNAs replicate either alone (viroid RNAs) or with a helper virus 
(satellite RNAs). Examples include RNAs from avocado sunblotch viroid and the satellite 
RNAs from tobacco ringspot virus, lucerne transient streak virus, velvet tobacco mottle virus, 
solanum nodiflorum mottle virus and subterranean clover mottle virus. The design and use of 
target RNA-specific ribozymes is described in Haseloff et al. Nature, 334:585 (1988). 

Like the antisense constructs above, the ribozyme sequence fragment necessary for 
pairing need not be identical to the target nucleotides to be cleaved, nor identical to the 
sequences in Tables 1 and 2. Ribozymes may be constructed by combining the ribozyme 
sequence and some fragment of the target gene which would allow recognition of the target 
gene mRNA by the resulting ribozyme molecule. Generally, the sequence in the ribozyme 
capable of binding to the target sequence exhibits a percentage of sequence identity with at 
least 80%, preferably with at least 85%, more preferably with at least 90% and most preferably 
with at least 95%, even more preferably, with at least 96%, 97%, 98% or 99% sequence identity 
to some fragment of a sequence in Tables 1 and 2 or the complement thereof. The ribozyme 
can be equally effective in inhibiting mRNA translation by cleaving either in the untranslated 
or coding regions. Generally, a higher percentage of sequence identity can be used to 
compensate for the use of a shorter sequence. Furthermore, the introduced sequence need not 
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have the same intron or exon pattern, and homology of non-coding segments may be equally 
effective, 

C.3. Chimeraplasts 

The SDFs of the invention, such as those described by Tables 1 and 2, can also be 
used to construct chimeraplasts that can be introduced into a cell to produce at least one 
specific nucleotide change in a sequence corresponding to the SDF of the invention. A 
chimeraplast is an oligonucleotide comprising DNA and/or RNA that specifically hybridizes 
to a target region in a manner which creates a mismatched base-pair. This mismatched base- 
pair signals the cell's repair enzyme machinery which acts on the mismatched region 
resulting in the replacement, insertion or deletion of designated nucleotide(s). The altered 
sequence is then expressed by the cell's normal cellular mechanisms. Chimeraplasts can be 
designed to repair mutant genes, modify genes, introduce site-specific mutations, and/or act 
tu interrupt or alter normal gene function (US Pat. Nos. 6,010,907 and 6,004,804; and PCT 
Pub. No. W099/58723 and WO99/07865). 

C.4. Sense Suppression 

The SDFs of Tables 1 and 2 of the present invention are also useful to modulate gene 
expression by sense suppression. Sense suppression represents another method of gene 
suppression by introducing at least one exogenous copy or fragment of the endogenous 
sequence to be suppressed. 

Introduction of expression cassettes in which a nucleic acid is configured in the sense 
orientation with respect to the promoter into the chromosome of a plant or by a self-replicating 
virus has been shown to be an effective means by which to induce degradation of mRNAs of 
target genes. For an example of the use of this method to modulate expression of endogenous 
genes see, Napoli et aL, The Plant Cell 2:279 (1990), and U.S. Patents Nos. 5,034,323, 
5,231,020, and 5,283,184. Inhibition of expression may require some transcription of the 
introduced sequence. 

For sense suppression, the introduced sequence generally will be substantially identical 
to the endogenous sequence intended to be inactivated. The minimal percentage of sequence 
identity will typically be greater than about 65%, but a higher percentage of sequence identity 
might exert a more effective reduction in the level of normal gene products. Sequence identity 
of more than about 80% is preferred, though about 95% to absolute identity would be most 
preferred. As with antisense regulation, the effect would likely apply to any other proteins 
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within a similar family of genes exhibiting homology or substantial homology to the suppressing 
sequence. 

C.5. Transcriptional Silencing 

The nucleic acid sequences of the invention, including the SDFs of Tables 1 and 2, and 
fragments thereof, contain sequences that can be inserted into the genome of an organism 
resulting in transcriptional silencing. Such regulatory sequences need not be operatively linked 
to coding sequences to modulate transcription of a gene. Specifically, a promoter sequence 
without any other element of a gene can be introduced into a genome to transcriptionally silence 
an endogenous gene (see, for example, Vaucheret, H et al. (1998) The Plant Journal 16: 651- 
659). As another example, triple helices can be formed using oligonucleotides based on 
sequences from Tables 1 and 2, fragments thereof, and substantially similar sequence thereto. 
The oligonucleotide can be delivered to the host cell and can bind to the promoter in the genome 
to form a triple helix and prevent transcription. An oligonucleotide of interest is one that can 
bind to the promoter and block binding of a transcription factor to the promoter. In such a case, 
the oligonucleotide can be complementary to the sequences of the promoter that interact with 
transcription binding factors. 



C.6. Other Methods to Inhibit Gene Expression 

Yet another means of suppressing gene expression is to insert a polynucleotide into the 
gene of interest to disrupt transcription or translation of the gene. 

Low frequency homologous recombination can be used to target a polynucleotide insert 
to a gene by flanking the polynucleotide insert with sequences that are substantially similar to 
the gene to be disrupted. Sequences from Tables 1 and 2, fragments thereof, and substantially 
similar sequence thereto can be used for homologous recombination. 

In addition, random insertion of polynucleotides into a host cell genome can also be used 
to disrupt the gene of interest. Azpiroz-Leehan et al. ? Trends in Genetics 13:152 (1997). In this 
method, screening for clones from a library containing random insertions is preferred to 
identifying those that have polynucleotides inserted into the gene of interest. Such screening can 
be performed using probes and/or primers described above based on sequences from Tables 1 
and 2, fragments thereof, and substantially similar sequence thereto. The screening can also be 
performed by selecting clones or Ri plants having a desired phenotype. 
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LD. Methods of Functional Analysis 

The constructs described in the methods under I.C. above can be used to determine 
the function of the polypeptide encoded by the gene that is targeted by the constructs. 

Down-regulating the transcription and translation of the targeted gene in the host cell 
or organisms, such as a plant, may produce phenotypic changes as compared to a wild-type 
cell or organism. In addition, in vitro assays can be used to determine if any biological 
activity, such as calcium flux, DNA transcription, nucleotide incorporation, etc., are being 
modulated by the down-regulation of the targeted gene. 

Coordinated regulation of sets of genes, e.g., those contributing to a desired polygenic 
trait, is sometimes necessary to obtain a desired phenotype. SDFs of the invention 
representing transcription activation and DNA binding domains can be assembled into hybrid 
transcriptional activators. These hybrid transcriptional activators can be used with their 
corresponding DNA elements (i.e., those bound by the DNA-binding SDFs) to effect 
coordinated expression of desired genes (J J. Schwarz et al., Mol Cell Biol 12:266 (1992), 
A. Martinez et al, MoL Gen. Genet 261:546 (1999)). 

The SDFs of the invention can also be used in the two-hybrid genetic systems to 
identify networks of protein-protein interactions (L. McAlister-Henn et al. Methods 19:330 
(1999), LC. Hu et al. Methods 20:80 (2000), M. Golovkin et al, J. Biol Chem. 274:36428 
(1999), K. Ichimura et al, Biochem, Biophys. Res. Comm. 253:532 (1998)). The SDFs of the 
invention can also be used in various expression display methods to identify important 
protein-DNA interactions (e.g. B. Luo et al, J. Mol. Biol 266:479 (1997)). 

I.E. Promoters 

The SDFs of the invention are also useful as structural or regulatory sequences in a 
construct for modulating the expression of the corresponding gene in a plant or other organism, 
e.g. a symbiotic bacterium. For example, promoter sequences associated to SDFs of Tables 1 
and 2 of the present invention can be useful in directing expression of coding sequences either as 
constitutive promoters or to direct expression in particular cell types, tissues, or organs or in 
response to environmental stimuli. 

With respect to the SDFs of the present invention a promoter is likely to be a relatively 
small portion of a genomic DNA (gDNA) sequence located in the first 2000 nucleotides 
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upstream from an initial exon identified in a gDNA sequence or initial ATG" or methionine 
codon or translational start site in a corresponding cDNA sequence. Such promoters are more 
likely to be found in the first 1000 nucleotides upstream of an initial ATG or methionine codon 
or translational start site of a cDNA sequence corresponding to a gDNA sequence. In particular, 
the promoter is usually located upstream of the transcription start site. The fragments of a 
particular gDNA sequence that function as elements of a promoter in a plant cell will preferably 
be found to hybridize to gDNA sequences presented and described in Tables 1 and 2 at medium 
or high stringency, relevant to the length of the probe and its base composition. 

Promoters are generally modular in nature. Promoters can consist of a basal promoter 
that functions as a site for assembly of a transcription complex comprising an RNA polymerase, 
for example RNA polymerase II. A typical transcription complex will include additional factors 
such as TFiiB, TF U D, and TF n E. Of these, TF n D appears to be the only one to bind DNA 
directly. The promoter might also contain one or more enhancers and/or suppressors that 
function as binding sites for additional transcription factors that have the function of modulating 
the level of transcription with respect to tissue specificity and of transcriptional responses to 
particular environmental or nutritional factors, and the like. 

Short DNA sequences representing binding sites for proteins can be separated from each 
other by intervening sequences of varying length. For example, within a particular functional 
module, protein binding sites may be constituted by regions of 5 to 60, preferably 10 to 30, more 
preferably 10 to 20 nucleotides. Within such binding sites, there are typically 2 to 6 nucleotides 
that specifically contact amino acids of the nucleic acid binding protein. The protein binding 
sites are usually separated from each other by 10 to several hundred nucleotides, typically by 15 
to 150 nucleotides, often by 20 to 50 nucleotides. DNA binding sites in promoter elements often 
display dyad symmetry in their sequence. Often elements binding several different proteins, 
and/or a plurality of sites that bind the same protein, will be combined in a region of 50 to 1,000 
basepairs. 

Elements that have transcription regulatory function can be isolated from their 
corresponding endogenous gene, or the desired sequence can be synthesized, and recombined in 
constructs to direct expression of a coding region of a gene in a desired tissue-specific, temporal- 
specific or other desired manner of inducibility or suppression. When hybridizations are 
performed to identify or isolate elements of a promoter by hybridization to the long sequences 
presented in Tables 1 and 2, conditions are adjusted to account for the above-described nature of 
promoters. For example short probes, constituting the element sought, are preferably used under 
low temperature and/or high salt conditions. When long probes, which might include several 
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promoter elements are used, low to medium stringency conditions are preferred when 
hybridizing to promoters across species. 

If a nucleotide sequence of an SDF, or part of the SDF, functions as a promoter or 
fragment of a promoter, then nucleotide substitutions, insertions or deletions that do not 
substantially affect the binding of relevant DNA binding proteins would be considered 
equivalent to the exemplified nucleotide sequence. It is envisioned that there are instances 
where it is desirable to decrease the binding of relevant DNA binding proteins to silence or 
down-regulate a promoter, or conversely to increase the binding of relevant DNA binding 
proteins to enhance or up-regulate a promoter and vice versa. In such instances, 
polynucleotides representing changes to the nucleotide sequence of the DNA-protein contact 
region by insertion of additional nucleotides, changes to identity of relevant nucleotides, 
including use of chemically-modified bases, or deletion of one or more nucleotides are 
considered encompassed by the present invention. In addition, fragments of the promoter 
sequences described by Tables 1 and 2 and variants thereof can be fused with other promoters 
or fragments to facilitate transcription and/or transcription in specific type of cells or under 
specific conditions. 

Promoter function can be assayed by methods known in the art, preferably by 
measuring activity of a reporter gene operatively linked to the sequence being tested for 
promoter function. Examples of reporter genes include those encoding luciferase, green 
fluorescent protein, GUS, neo, cat and bar. 

I.F. UTRs and Junctions 

Polynucleotides comprising untranslated (UTR) sequences and intron/exon junctions are 
also within the scope of the invention. UTR sequences include introns and 5 ' or 3 ' untranslated 
regions ( 5 7 UTRs or 3 ? UTRs). Fragments of the sequences shown in Tables 1 and 2 can 
comprise UTRs and intron/exon junctions. 

These fragments of SDFs, especially UTRs, can have regulatory functions related to, for 
example, translation rate and mRNA stability. Thus, these fragments of SDFs can be isolated 
for use as elements of gene constructs for regulated production of polynucleotides encoding 
desired polypeptides. 

Introns of genomic DNA segments might also have regulatory functions. Sometimes 
regulatory elements, especially transcription enhancer or suppressor elements, are found 
within introns. Also, elements related to stability of heteronuclear RNA and efficiency of 
splicing and of transport to the cytoplasm for translation can be found in intron elements. 
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Thus, these segments can also find use as elements of expression vectors intended for use to 
transform plants. 

Just as with promoters UTR sequences and intron/exon junctions can vary from those 
shown in Tables 1 and 2. Such changes from those sequences preferably will not affect the 
regulatory activity of the UTRs or intron/exon junction sequences on expression, 
transcription, or translation unless selected to do so. However, in some instances, down- or 
up-regulation of such activity may be desired to modulate traits or phenotypic or in vitro 
activity. 

LG. Coding Sequences 

Isolated polynucleotides of the invention can include coding sequences that encode 
polypeptides comprising an amino acid sequence encoded by sequences in Tables 1 and 2 or 
an amino acid sequence presented in Tables 1 and 2. 

A nucleotide sequence encodes a polypeptide if a cell (or a cell free in vitro system) 
expressing that nucleotide sequence produces a polypeptide having the recited amino acid 
sequence when the nucleotide sequence is transcribed and the primary transcript is 
subsequently processed and translated by a host cell (or a cell free in vitro system) harboring 
the nucleic acid. Thus, an isolated nucleic acid that encodes a particular amino acid sequence 
can be a genomic sequence comprising exons and introns or a cDNA sequence that represents 
the product of splicing thereof. An isolated nucleic acid encoding an amino acid sequence 
also encompasses heteronuclear RNA, which contains sequences that are spliced out during 
expression, and mRNA, which lacks those sequences. 

Coding sequences can be constructed using chemical synthesis techniques or by 
isolating coding sequences or by modifying such synthesized or isolated coding sequences as 
described above. 

In addition to coding sequences encoding the polypeptide sequences of Tables 1 and 
2, which are native to corn, Arabidopsis, soybean, rice, wheat, and other plants the isolated 
polynucleotides can be polynucleotides that encode variants, fragments, and fusions of those 
native proteins. Such polypeptides are described below in part II. 

In variant polynucleotides generally, the number of substitutions, deletions or insertions 
is preferably less than 20%, more preferably less than 15%; even more preferably less than 10%, 
5%, 3% or 1% of the number of nucleotides comprising a particularly exemplified sequence. It 
is generally expected that non-degenerate nucleotide sequence changes that result in 1 to 10, 
more preferably 1 to 5 and most preferably 1 to 3 amino acid insertions, deletions or 
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substitutions will not greatly affect the function of an encoded polypeptide. The most preferred 
embodiments are those wherein 1 to 20, preferably 1 to 10, most preferably 1 to 5 nucleotides 
are added to, deleted from and/or substituted in the sequences specifically disclosed in Tables 1 
and 2. 

5 Insertions or deletions in polynucleotides intended to be used for encoding a polypeptide 

preferably preserve the reading frame. This consideration is not so important in instances when 
the polynucleotide is intended to be used as a hybridization probe. 



II. Polypeptides and Proteins 

IIA. Native polypeptides and proteins 

10 

Polypeptides within the scope of the invention include both native proteins as well as 
variants, fragments, and fusions thereof. Polypeptides of the invention are those encoded by 
any of the six reading frames of sequences shown in Tables 1 and 2, preferably encoded by 
the three frames reading in the 5 ? to 3' direction of the sequences as shown. 
1 5 Native polypeptides include the proteins encoded by the sequences shown in Tables 1 

and 2. Such native polypeptides include those encoded by allelic variants. 

Polypeptide and protein variants will exhibit at least 75% sequence identity to those 
native polypeptides of Tables 1 and 2. More preferably, the polypeptide variants will exhibit at 
least 85% sequence identity; even more preferably, at least 90% sequence identity; more 
2 0 preferably at least 95%, 96%, 97%, 98%, or 99% sequence identity. Fragments of polypeptide 
or fragments of polypeptides will exhibit similar percentages of sequence identity to the 
relevant fragments of the native polypeptide. Fusions will exhibit a similar percentage of 
sequence identity in that fragment of the fusion represented by the variant of the native peptide. 
Furthermore, polypeptide variants will exhibit at least one of the functional properties of 

2 5 the native protein. Such properties include, without limitation, protein interaction, DNA 

interaction, biological activity, immunological activity, receptor binding, signal transduction, 
transcription activity, growth factor activity, secondary structure, three-dimensional structure, 
etc. As to properties related to in vitro or in vivo activities, the variants preferably exhibit at least 
60% of the activity of the native protein; more preferably at least 70%, even more preferably at 

3 0 least 80%, 85%, 90% or 95% of at least one activity of the native protein. 

One type of variant of native polypeptides comprises amino acid substitutions, deletions 
and/or insertions. Conservative substitutions are preferred to maintain the function or activity of 
the polypeptide. 
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Within the scope of percentage of sequence identity described above, a polypeptide of 
the invention may have additional individual amino acids or amino acid sequences inserted into 
the polypeptide in the middle thereof and/or at the N-terminal and/or C-terminal ends thereof. 
Likewise, some of the amino acids or amino acid sequences may be deleted from the 
5 polypeptide. 

A.l Antibodies 

Isolated polypeptides can be utilized to produce antibodies. Polypeptides of the invention 
can generally be used, for example, as antigens for raising antibodies by known techniques. The 
resulting antibodies are useful as reagents for determining the distribution of the antigen protein 
within the tissues of a plant or within a cell of a plant. The antibodies are also useful for 
examining the production level of proteins in various tissues, for example in a wild-type plant or 
following genetic manipulation of a plant, by methods such as Western blotting. 

Antibodies of the present invention, both polyclonal and monoclonal, may be prepared 
by conventional methods. In general, the polypeptides of the invention are first used to 
immunize a suitable animal, such as a mouse, rat, rabbit, or goat. Rabbits and goats are 
preferred for the preparation of polyclonal sera due to the volume of serum obtainable, and the 

1 0 availability of labeled anti-rabbit and anti-goat antibodies as detection reagents. Immunization is 
generally performed by mixing or emulsifying the protein in saline, preferably in an adjuvant 
such as Freund's complete adjuvant, and injecting the mixture or emulsion parenterally 
(generally subcutaneously or intramuscularly). A dose of 50-200 ^g/injection is typically 
sufficient. Immunization is generally boosted 2-6 weeks later with one or more injections of the 

1 5 protein in saline, preferably using Freund's incomplete adjuvant. One may alternatively 

generate antibodies by in vitro immunization using methods known in the art, which for the 
purposes of this invention is considered equivalent to in vivo immunization. 

Polyclonal antisera is obtained by bleeding the immunized animal into a glass or plastic 
container, incubating the blood at 25°C for one hour, followed by incubating the blood at 4°C 

2 0 for 2-18 hours. The serum is recovered by centrifugation (e.g., l,000xg for 10 minutes). About 
20-50 ml per bleed may be obtained from rabbits. 

Monoclonal antibodies are prepared using the method of Kohler and Milstein, Nature 
256: 495 (1975), or modification thereof. Typically, a mouse or rat is immunized as described 
above. However, rather than bleeding the animal to extract serum, the spleen (and optionally 

2 5 several large lymph nodes) is removed and dissociated into single cells. If desired, the spleen 
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cells can be screened (after removal of nonspecifically adherent cells) by applying a cell 
suspension to a plate, or well, coated with the protein antigen. B-cells producing membrane- 
bound immunoglobulin specific for the antigen bind to the plate, and are not rinsed away with 
the rest of the suspension. Resulting B-cells, or all dissociated spleen cells, are then induced to 
fuse with myeloma cells to form hybridomas, and are cultured in a selective medium (e.g., 
hypoxanthine, aminopterin, thymidine medium, HAT"). The resulting hybridomas are plated 
by limiting dilution, and are assayed for the production of antibodies which bind specifically to 
the immunizing antigen (and which do not bind to unrelated antigens). The selected Mab- 
secreting hybridomas are then cultured either in vitro {e.g., in tissue culture bottles or hollow 
fiber reactors), or in vivo (as ascites in mice). 

Other methods for sustaining antibody-producing B-cell clones, such as by EBV 
transformation, are known. 

If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using 
conventional techniques. Suitable labels include fiuorophores, chromophores, radioactive atoms 
(particularly 32 P and 125 I), electron-dense reagents, enzymes, and ligands having specific binding 
partners. Enzymes are typically detected by their activity. For example, horseradish peroxidase 
is usually detected by its ability to convert 3,3',5,5 ? -tetramethylbenzidine (TNB) to a blue 
pigment, quantifiable with a spectrophotometer. 

A.2 In Vitro Applications of Polypeptides 

Some polypeptides of the invention will have enzymatic activities that are useful in vitro. 
For example, the soybean trypsin inhibitor (Kunitz) family is one of the numerous families of 
proteinase inhibitors. It comprises plant proteins which have inhibitory activity against serine 
proteinases from the trypsin and subtilisin families, thiol proteinases and aspartic proteinases. 
Thus, these peptides find in vitro use in protein purification protocols and perhaps in 
therapeutic settings requiring topical application of protease inhibitors. 

Delta-aminolevulinic acid dehydratase (EC A2AJA) (ALAD) catalyzes the second 
step in the biosynthesis of heme, the condensation of two molecules of 5-aminolevulinate to 
form porphobilinogen and is also involved in chlorophyll biosynthesis(Kaczor et al. (1994) 
Plant Physiol. 1-4: 1411-7; Smith (1988) Biochem. J. 249: 423-8; Schneider (1976) Z. 
naturforsch. [C] 31: 55-63). Thus, ALAD proteins can be used as catalysts in synthesis of 
heme derivatives. Enzymes of biosynthetic pathways generally can be used as catalysts for in 
vitro synthesis of the compounds representing products of the pathway. 
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Polypeptides encoded by SDFs of the invention can be engineered to provide 
purification reagents to identify and purify additional polypeptides that bind to them. This 
allows one to identify proteins that function as multimers or elucidate signal transduction or 
metabolic pathways. In the case of DNA binding proteins, the polypeptide can be used in a 
similar manner to identify the DNA determinants of specific binding (S. Pierrou et dl.,Anal. 
Biochem. 229:99 (1995), S. Chusacultanachai et aL, J. Biol Chem. 274:23591 (1999), Q. Lin 
et aL,/. Biol Chem. 272:2727 '4 (1997)). 

II.B . POLYPEPTIDE VARIANTS , FRAGMENTS, AND FUSIONS 
Generally, variants , fragments, or fusions of the polypeptides encoded by the maximum 
length sequence(MLS) can exhibit at least one of the activities of the identified domains and/or 
related polypeptides described in Sections (C) and (D) of Table 1 corresponding to the MLS of 
interest. 

II.B .(1) Variants 

A type of variant of the native polypeptides comprises amino acid substitutions. 
Conservative substitutions, described above (see IL), are preferred to maintain the function or 
activity of the polypeptide. Such substitutions include conservation of charge, polarity, 
hydrophobicity, size, etc. For example, one or more amino acid residues within the sequence 
can be substituted with another amino acid of similar polarity that acts as a functional 
equivalent, for example providing a hydrogen bond in an enzymatic catalysis. Substitutes for an 
amino acid within an exemplified sequence are preferably made among the members of the class 
to which the amino acid belongs. For example, the nonpolar (hydrophobic) amino acids include 
alanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophan and methionine. The 
polar neutral amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and 
glutamine. The positively charged (basic) amino acids include arginine, lysine and histidine. 
The negatively charged (acidic) amino acids include aspartic acid and glutamic acid. 

Within the scope of percentage of sequence identity described above, a polypeptide of 
the invention may have additional individual amino acids or amino acid sequences inserted into 
the polypeptide in the middle thereof and/or at the N-terminal and/or C-terminal ends thereof. 
Likewise, some of the amino acids or amino acid sequences may be deleted from the 
polypeptide. Amino acid substitutions may also be made in the sequences; conservative 
substitutions being preferred. 
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One preferred class of variants are those that comprise (1) the domain of an 
encoded polypeptide and/or (2) residues conserved between the encoded polypeptide and 
related polypeptides. For this class of variants, the encoded polypeptide sequence is changed 
by insertion, deletion, or substitution at positions flanking the domain and/or conserved 
residues. 

Another class of variants includes those that comprise an encoded polypeptide 
sequence that is changed in the domain or conserved residues by a conservative substitution. 

Yet another class of variants includes those that lack one of the in vitro 
activities, or structural features of the encoded polypeptides. One example is polypeptides or 
proteins produced from genes comprising dominant negative mutations. Such a variant may 
comprise an encoded polypeptide sequence with non-conservative changes in a particular 
domain or group of conserved residues. 

II.A.(2) FRAGMENTS 

Fragments of particular interest are those that comprise a domain identified for a 
polypeptide encoded by an MLS of the instant invention and variants thereof. Also, fragments 
that comprise at least one region of residues conserved between an MLS encoded polypeptide 
and its related polypeptides are of great interest. Fragments are sometimes useful as 
polypeptides corresponding to genes comprising dominant negative mutations are. 

ILA.(3)FUSIONS 

Of interest are chimeras comprising (1) a fragment of the MLS encoded 
polypeptide or variants thereof of interest and (2) a fragment of a polypeptide comprising the 
same domain. For example, an AP2 helix encoded by a MLS of the invention fused to 
second AP2 helix from ANT protein, which comprises two AP2 helices. The present 
invention also encompasses fusions of MLS encoded polypeptides, variants, or fragments 
thereof fused with related proteins or fragments thereof. 
DEFINITION OF DOMAINS 

The polypeptides of the invention may possess identifying domains as shown in Table 
1. Specific domains within the MLS encoded polypeptides are indicated in Table 1. In 
addition, the domains within the MLS encoded polypeptide can be defined by the region that 
exhibits at least 70% sequence identity with the consensus sequences listed in the detailed 
description below of each of the domains. 
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The majority of the protein domain descriptions given below are obtained from 

Prosite, 

(http//www.expasyxh/prosite/) ? and Pfam, 
(http//pfam.wustLedu/browse.shtml). 

1. (AAA) AAA-protein family signature 

A large family of ATPases has been described [1 to 5] whose key feature is that 
they share a conserved region of about 220 amino acids that contains anATP-binding site. 
This family is now called AAA, for ATPases Associated with diverse cellular 
Activities. The proteins that belong to this family either contain one or two AAA 
domains. Proteins containing two AAA domains: 

- Mammalian and drosophila NSF (N-ethylmaleimide-sensitive fusion protein) and the 
fungal homology SEC18. These proteins are involved in intracellular transport between 
the endoplasmic reticulum and Golgi, as well as between different Golgi cisternae. 

- Mammalian transitional endoplasmic reticulum ATPase (previously known as p97 or 
VCP) which is involved in the transfer of membranes from the endoplasmic reticulum to 
the golgi apparatus. This protein forms a ring-shaped homooligomer composed of six 
subunits. The yeast homolog is CDC48 and it may play a role in spindle pole 
proliferation. 

- Yeast protein PAS1, essential for peroxisome assembly and the related protein PAS1 
from Pichia pastoris. 

- Yeast protein AFG2. 

- Sulfolobus acidocaldarius protein SAV and Halobacterium salinarium cdcH which may 
be part of a transduction pathway connecting light to cell division. 

Proteins containing a single AAA domain: 

- Escherichia coli and other bacteria ftsH (or hflB) protein. FtsH is an ATP-dependent zinc 
metallopeptidase that seems to degrade the heat-shock sigma-32 factor. 

It is an integral membrane protein with a large cytoplasmic C-terminal domain that 
contain both the AAA and the protease domains. 

- Yeast protein YME1 ? a protein important for maintaining the integrity of the 
mitochondrial compartment. YME1 is also a zinc-dependent protease. 

- Yeast protein AFG3 (or YTA10). This protein also seems to contain a AAA domain 
followed by a zinc-dependent protease domain. 
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Subunits from the regulatory complex of the 26S proteasome [6] which is involved in 
the ATP-dependent degradation of ubiquitinated proteins: 

a) Mammalian subunit 4 and homologs in other higher eukaryotes, in yeast (gene 
YTA5) and fission yeast (gene mts2). 

b) Mammalian subunit 6 (TBP7) and homologs in other higher eukaryotes and in 
yeast (gene YTA2). 

c) Mammalian subunit 7 (MSS1) and homologs in other higher eukaryotes and in 
yeast (gene CIM5 or YTA3). 

d) Mammalian subunit 8 (P45) and homologs in other higher eukaryotes and in 
yeast (SUG1 or CIM3 or TBY1) and fission yeast (gene letl). 

Other probable subunits such as human TBP1 which seems to influences HIV gene 
expression by interacting with the virus tat transactivator protein and yeast YTA1 and YTA6. 

- Yeast protein BCS1, a mitochondrial protein essential for the expression of the 
Rieske iron-sulfur protein. 

- Yeast protein MSP1, a protein involved in intramitochondrial sorting of proteins. 

- Yeast protein PAS8, and the corresponding proteins PASS from Pichia pastoris 
and PAY4 from Yarrowia lipolytica. 

- Mouse protein SKD1 and its fission yeast homolog (SpAC2Gll.G6). 

- Caenorhabditis elegans meiotic spindle formation protein mei-1 . 

- Yeast protein SAP 1 . 
Yeast protein YTA7. 

- Mycobacterium leprae hypothetical protein A2126A. 

It is proposed that, in general, the AAA domains in these proteins act as ATP- 
dependent protein clamps [5]. In addition to the ATP-binding A' and 'B 1 motifs, which are 
located in the N-terminal half of this domain, there is a highly conserved region located in the 
central part of the domain which was used to develop a signature pattern. 

Consensus pattern: [LIVMT]-x-[LIVMT]-[LIVMF]-x-[GATMC]-[ST]-[NS]-x(4)-[LIVM]- 
D-x-A-[LIFA]-x-R 

[1] Froehlich K.-U., Fries H.W., Ruediger M., Erdmann R., Botstein D. ? Mecke D. J. Cell 
Biol. 114:443-453(1991). 

[2] Erdmann R., Wiebel F.F., Flessau A., Rytka J. ? Beyer A., Froehlich K.-IL, Kunau W.-H. 
Cell 64:499-510(1991). 
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[3] Peters J.-M., Walsh M.J., Franke W.W. EMBO J. 9:1757-1767(1990). 

[4] Kunau W.~H. ? Beyer A., Goette K. ? Marzioch M., Saidowsky J., Skaletz-Rorowski A., 

Wiebel F.F. Biochimie 75:209-224(1993). 

[5] Confalonieri F., Duguet M. BioEssays 17:639-650(1995).[ 6] Hilt W, Wolf D.H. Trends 
5 Biochem. Sci. 21:96-102(1996). 

2. ABC Membrane (ABC transporter transmembrane region). This family represents a unit of 
six transmembrane helices. Many members of the ABC transporter family ( ABC tran )have 
two such regions. See also descriptions of ABC Tran, below, and ABC2 membrane, above. 



3. (ABC Tran) ABC transporters family signature. On the basis of sequence similarities a 
family of related ATP-bindingproteins has been characterized [1 to 5]. These proteins are 
associated with avariety of distinct biological processes in both prokaryotes and eukaryotes, 

15 but a majority of them are involved in active transport of small hydrophilic molecules across 
the cytoplasmic membrane. All these proteins share a conserved domain of some two 
hundred amino acid residues, which includes an ATP-binding site. These proteins are 
collectively known as ABC transporters. Proteins known to belong to this family are listed 
below (references are only provided for recently determined sequences).In prokaryotes: - 

20 Active transport systems components: alkylphosphonate uptake(phnC/phnK/ phnL); 

arabinose (araG); arginine (artP); dipeptide (dciAD;dppD/dppF); ferric enterobactin (fepC); 
ferrichrome (fhuC); galactoside (mglA); glutamine (glnQ); glycerol-3-phosphate (ugpC); 
glycine betaine/L-proline (proV); glutamate/aspatate (gltL); histidine (hisP); iron(III) (sfuC), 
iron(III) dicitrate (fecE); lactose (lacK); leucine/isoleucine/valine (braF/braG;livF/livG); 

2 5 maltose (malK); molybdenum (modC); nickel (nikD/ nikE); oligopeptide 

(amiE/amiF;oppD/oppF); peptide (sapD/sapF); phosphate (pstB); putrescine (potG); ribose 
(rbsA); spermidine/putrescine (potA); sulfate (cysA); vitamin B12 (btuD). - 
Hemolysin/leukotoxin export proteins hlyB, cyaB and lktB. - Colicin V export protein cvaB, 
- Lactococcin export protein IcnC [6]. - Lantibiotic transport proteins nisT (nisin) and spaT 

30 (subtilin). - Extracellular proteases B and C export protein prtD. - Alkaline protease secretion 
protein aprD. - Beta-(l,2)-glucan export proteins chvA and ndvA. - Haemophilus influenzae 
capsule-polysaccharide export protein bexA. - Cytochrome c biogenesis proteins ccmA (also 
known as cycV and helA). - Polysialic acid transport protein kpsT. - Cell division associated 
ftsE protein (function unknown). - Copper processing protein nosF from Pseudomonas 
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stutzeri. - Nodulation protein nodi from Rhizobium (function unknown). - Escherichia coli 
proteins cydC and cydD. - Subunit A of the ABC excision nuclease (gene uvrA). - 
Erythromycin resistance protein from Staphylococcus epidermidis (gene msrA). - Tylosin 
resistance protein from Streptomyces fradiae (gene tlrC) [7]. - Heterocyst differentiation 
5 protein (gene hetA) from Anabaena PCC 7120. - Protein P29 from Mycoplasma hyorhinis, a 
probable component of a high affinity transport system. - yhbG, a putative protein whose 
gene is linked with ntrA in many bacteria such as Escherichia coli, Klebsiella pneumoniae, 
Pseudomonas putida, Rhizobium meliloti and Thiobacillus ferrooxidans. - Escherichia coli 
and related bacteria hypothetical proteins yabJ ? yadG, yagC, ybbA, ycjW, yddA, yehX 7 yejF, 

1 0 yheS, yhiG, yhiH, yjcW, yjjK, yojl, yrbF and ytfR.In eukaryotes: - The multidrug 

transporters (Mdr) (P-glycoprotein) ? a family of closely related proteins which extrude a wide 
variety of drugs out of the cell (for a review see [8]). - Cystic fibrosis transmembrane 
conductance regulator (CFTR), which is most probably involved in the transport of chloride 
ions. - Antigen peptide transporters 1 (TAP1, PSF1 ? RING4, HAM-1, mtpl) and 2 (TAP2, 

15 PSF2, RING11, HAM-2, mtp2), which are involved in the transport of antigens from the 

cytoplasm to a membrane-bound compartment for association with MHC class I molecules. - 
70 Kd peroxisomal membrane protein (PMP70). - ALDP, a peroxisomal protein involved in 
X-linked adrenoleukodystrophy [9]. - Sulfonylurea receptor [10], a putative subunit of the B- 
cell ATP-sensitive potassium channel. - Drosophila proteins white (w) and brown (bw) ? 

2 0 which are involved in the import of ommatidium screening pigments. - Fungal elongation 

factor 3 (EF-3). - Yeast STE6 which is responsible for the export of the a-factor pheromone. - 
Yeast mitochondrial transporter ATM1. - Yeast MDL1 and MDL2. - Yeast SNQ2. - Yeast 
sporidesmin resistance protein (gene PDR5 or STS1 or YDR1). - Fission yeast heavy metal 
tolerance protein hmtl. This protein is probably involved in the transport of metal-bound 

2 5 phytochelatins. - Fission yeast brefeldin A resistance protein (gene bfrl or hba2). - Fission 
yeast leptomycin B resistance protein (gene pmdl). - mbpX, a hypothetical chloroplast 
protein from Liverwort. - Prestalk-specific protein tagB from slime mold. This protein 
consists of two domains: a N-terminal subtilase catalytic domain and a C-terminal ABC 
transporter domain As a signature pattern for this class of proteins, a conserved region which 

30 is located between the A' and the 'B' motifs of the ATP-binding site was used. 

Consensus pattern: [LI VMF YC] - [ S A] - [ S APGLVF YKQH] -G- [DENQMW] - 
[KRQASPCLIMFW]-[KRNQSTAVM]-[KRACLVM]-[LIVMFYPAN]-{PHY}-[LIV 
[SAGCLIVP]-{FYWHP}-{KRHP}-[LIVMFYWSTA] The ATP-binding region is 
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duplicated in araG, mdl, msrA, rbsA, tlrC, uvrA, yejF, Mdr's, CFTR, pmdl and in EF-3. In 
some of those proteins, the above pattern only detect one of the two copies of the domain. 
The proteins belonging to this family also contain one or two copies of the ATP -binding 
motifs 'A' and *B\ 

5 

[ 1] Higgins C.F., Hyde S.C., Mimmack M.M., Gileadi U., Gill D.R., Gallagher M.P. J. 
Bioenerg. Biomembr. 22:571-592(1990). 

[ 2] Higgins C.F., Gallagher M.P., Mimmack M.M., Pearce S.R. BioEssays 8:111-116(1988). 
[ 3] Higgins C.F., Hiles I.D., Salmond G.P.C., Gill D.R., Downie J.A., Evans I.J., Holland 
1 0 LB., Gray L., Buckels S.D., Bell A.W., Hermodson M.A. Nature 323:448-450(1986). 

[ 4] Doolittle R.F., Johnson M.S., Husain I., van Houten B., Thomas D.C., Sancar A. Nature 
323:451-453(1986). 

[ 5] Blight M.A., Holland I.B. Mol. Microbiol. 4:873-880(1990). 
[ 6] Stoddard G.W., Petzel J.P., van Belkum M.J., Kok J., McKay L.L. Appl. Environ. 
15 Microbiol. 58:1952-1961(1992). 

[ 7] Rosteck P.R. Jr., Reynolds P.A., Hershberger C.L. Gene 102:27-32(1991). 
[ 8] Gottesman M.M., Pastan I. J. Biol. Chem. 263:12163-12166(1988). 
[ 9] Valle D., Gaertner J. Nature 361:682-683(1993). 

[10] Aguilar-Bryan L., Nichols C.G., Wechsler S.W., Clement J.P. IV, Boyd A.E. III, 
2 0 Gonzalez G., Herrera-Sosa H., Nguy K., Bryan J., Nelson D.A. Science 268:423-426(1995). 



4. (ACBP) 

Acyl-CoA-binding protein signature 

25 

Acyl-CoA-binding protein (ACBP) is a small (10 Kd) protein that binds medium- and long- 
chain acyl-CoA esters with very high affinity and may function as an intracellular carrier of 
acyl-CoA esters [1]. ACBP is also known as diazepam binding inhibitor (DBI) or endozepine 
(EP) because of its ability to displace diazepam from the benzodiazepine (BZD) recognition 
3 0 site located on the GABA type A receptor. It is therefore possible that this protein also acts as 
a neuropeptide to modulate the action of the GABA receptor [2]. ACBP is a highly conserved 
protein of about 90 residues that has been so far found in vertebrates, insects and yeast. 
ACBP is also related to the N-terminal section of a probable transmembrane protein of 
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unknown function whichhas been found in mammals. As a signature pattern, the region that 
corresponds to residues 19 to 37 in mammalian ACBP was selected. 

Consensus pattern: P-[STA]-x-[DEN]-x-[LIVMF]-x(2)-[LIVMFY]-Y-[GSTA]-x-[FY]-K- Q- 
[STA](2)-x-G- 

[ 1] Rose T.M., Schultz E.R., Todaro GJ. Proc. Natl. Acad. Sci. U.S.A. 89:11287- 
11291(1992). 

[ 2] Costa E., Guidotti A. Life Sci. 49:325-344(1991). 

5. (AIRS) 

AIR synthase related proteins 

This family includes Hydrogen expression/formation protein HypE, AIR synthases, FGAM 
synthase and selenide, water dikinase. 

6. (AMP-binding) 

Putative AMP-binding domain signature 

It has been shown [1 to 5] that a number of prokaryotic and eukaryotic enzymes which all 
probably act via an ATP-dependent covalent binding of AMP to their substrate, share a 
region of sequence similarity. These enzymes are: - Insects luciferase (luciferin 4- 
monooxygenase). Luciferase produces light by catalyzing the oxidation of luciferin in 
presence of ATP and molecular oxygen. - Alpha-aminoadipate reductase from yeast (gene 
LYS2). This enzyme catalyzes the activation of alpha-aminoadipate by ATP-dependent 
adenylation and the reduction of activated alpha-aminoadipate by NADPH. - Acetate~CoA 
ligase (acetyl-CoA synthetase), an enzyme that catalyzes the formation of acetyl-CoA from 
acetate and CoA. - Long-chain-fatty-acid-CoA ligase, an enzyme that activates long-chain 
fatty acids for both the synthesis of cellular lipids and their degradation via beta-oxidation. - 
4-coumarate-CoA ligase (4CL), a plant enzyme that catalyzes the formation of 4- 
coumarate-CoA from 4-coumarate and coenzyme A; the branchpoint reactions between 
general phenylpropanoid metabolism and pathways leading to various specific end products. - 
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O-succinylbenzoic acid--CoA ligase (OSB-CoA synthetase) (gene menE) [6], a bacterial 
enzyme involved in the biosynthesis of menaquinone (vitamin K2). - 4-Chlorobenzoate— CoA 
ligase (EC 6,2.1.-) (4-CBA--CoA ligase) [7], a Pseudomonas enzyme involved in the 
degradation of 4-CBA. - Indoleacetate-lysine ligase (IAA-lysine synthetase) [8], an enzyme 
5 from Pseudomonas syringae that converts indoleacetate to lAA-lysine. - Bile acid-CoA ligase 
(gene baiB) from Eubacterium strain VPI 12708 [4]. This enzyme catalyzes the ATP- 
dependent formation of a variety of C-24 bile acid-CoA. - Crotonobetaine/carnitine-CoA 
ligase (EC 6.3.2.-) from Escherichia coli (gene caiC). - L-(alpha-aminoadipyl)-L-cysteinyl-D- 
valine synthetase (ACV synthetase) from various fungi (gene acvA or pcbAB). This enzyme 

1 0 catalyzes the first step in the biosynthesis of penicillin and cephalosporin, the formation of 

ACV from the constituent amino acids. The amino acids seem to be activated by adenylation. 
It is a protein of around 3700 amino acids that contains three related domains of about 1000 
amino acids. - Gramicidin S synthetase I (gene grsA) from Bacillus brevis. This enzyme 
catalyzes the first step in the biosynthesis of the cyclic antibiotic gramicidin S, the ATP- 

1 5 dependent racemization of phenylalanine - Tyrocidine synthetase I (gene tycA) from 
Bacillus brevis. The reaction carried out by tycA is identical to that catalyzed by grsA - 
Gramicidin S synthetase II (gene grsB) from Bacillus brevis. This enzyme is a 
multifunctional protein that activates and polymerizes proline, valine, ornithine and leucine. 
GrsB consists of four related domains. - Enterobactin synthetase components E (gene entE) 

2 0 and F (gene entF) from Escherichia coli. These two enzymes are involved in the ATP- 

dependent activation of respectively 2,3-dihydroxybenzoate and serine during enterobactin 
(enterochelin) biosynthesis. - Cyclic peptide antibiotic surfactin synthase subunits 1, 2 and 3 
from Bacillus subtilis. Subunits 1 and 2 contains three related domains while subunit 3 only 
contains a single domain. - HC-toxin synthetase (gene HTS1) from Cochliobolus carbonum. 

2 5 This enzyme activates the four amino acids (Pro, L-Ala, D-Ala and 2-amino-9,10-epoxi-8- 

oxodecanoic acid) that make up HC-toxin, a cyclic tetrapeptide. HTS1 consists of four related 
domains.There are also some proteins, whose exact function is not yet known, but whichare, 
very probably, also AMP -binding enzymes. These proteins are: - ORA (octapeptide-repeat 
antigen), a Plasmodium falciparum protein whose function is not known but which shows a 

3 0 high degree of similarity with the above proteins. - AngR, a Vibrio anguillarum protein. 

AngR is thought to be a transcriptional activator which modulates the anguibactin (an iron- 
binding siderophore) biosynthesis gene cluster operon. But it is believed [9], that angR is not 
a DNA-binding protein, but rather an enzyme involved in the biosynthesis of anguibactin. 
This conclusion is based on three facts: the presence of the AMP -binding domain; the size of 
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angR (1048 residues), which is far bigger than any bacterial transcriptional protein; and the 
presence of a probable S-acyl thioesterase immediately downstream of angR. - A 
hypothetical protein in mmsB 3 'region in Pseudomonas aeruginosa, - Escherichia coli 
hypothetical protein ydiD. - Yeast hypothetical protein YBR041w. - Yeast hypothetical 
5 protein YBR222c. - Yeast hypothetical protein YER147c.All these proteins contain a highly 
conserved region very rich in glycine, serine, and threonine which is followed by a conserved 
lysine. A parallel can be drawn between this type of domain and the G-x(4)-G-K-[ST] ATP- 
/GTP-binding P-loop' domain or the protein kinases G-x-G-x(2)-[SG]-x(10,20)-KATP- 
binding domains. 

10 

Consensus pattern: [LIVMFY]-x(2)-[STG]-[STAG]-G-[ST]-[STEI]-[SG]-x-[PASLIVM]- 
[KR] In a majority of cases the residue that follows the Lys at the end of the pattern is a Gly. 

[ 1] Toh H. Protein Seq. Data Anal. 4:111-117(1991). 
15 [2] Smith DJ., Earl AJ. ? Turner G. EMBO J. 9:2743-2750(1990). 
[ 3] Schroeder J. Nucleic Acids Res. 17:460-460(1989). 

[ 4] Mallonee D.H., Adams J.L., Hylemon P.B. J. BacterioL 174:2065-2071(1992). 
[ 5] Turgay K., Krause M., Marahiel MA Mol. Microbiol. 6:529-546(1992). 
[ 6] Driscoll J.R., Taber H.W. J. BacterioL 174:5063-5071(1992). 

2 0 [7] Babbitt P.C., Kenyon G.L., Matin B.M., Charest H., Sylvestre M., Scholten J.D., Chang 

K.-H., Liang P.-H., Dunaway-Mariano D. Biochemistry 31:5594-5604(1992). 
[ 8] Farrell D.H., Mikesell P., Actis L.A. ? Crosa J.H. Gene 86:45-51(1990). 

25 7. AP2 domain 

This 60 amino acid residue domain can bind to DNA [1]. This domain is plant specific. 
Members of this family are suggested to be related to pyridoxal phosphate-binding domains 
such as found in aminotran_2 [3]. AP2 domains are also described in Jofuku et aL, co- 

3 0 pending U.S. Patent applications 08/700,152, 08/879,827, 08/912,272, 09/026,039. 

[1] Ohme-takagi M, Shinshi H; Plant Cell 1995;7:173-182. 

[2] Weigel D; Plant Cell 1995;7:388-389. 

[3] Mushegian AR, Koonin EV; Genetics 1996;144:817-828. 
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8. ARID 

The ARID domain is an AT-Rich Interaction domain sharing structural homology to DNA 
5 replication and repair nucleases and polymerases. 

[1] Herrscher RF, Kaplan MH, Lelsz DL, Das C, Scheuermann R, Tucker PW; Genes Dev 
1995;9:3067-3082. 

[2] Yuan YC, Whitson RH, Liu Q, Itakura K, Chen Y; Nat Struct Biol 1998;5:959-964. 

10 

9. (ATP synt) 

ATP synthase gamma subunit signature 

1 5 ATP synthase (proton-translocating ATPase) (EC 3.6.1.34) [1,2] is a componentof the 

cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, and the thylakoid 
membrane of chloroplasts. The ATPase complex is composed of an oligomeric 
transmembrane sector, called CF(0), and a catalytic core, called coupling factor CF(1). The 
former acts as a proton channel; the latter is composed of five subunits, alpha, beta, gamma, 

2 0 delta and epsilon. Subunit gamma is believed to be important in regulating ATPase activity 
and the flow of protons through the CF(0) complex. The best conserved region of the gamma 
subunit [3] is its C- terminus which seems to be essential for assembly and catalysis. As a 
signature pattern to detect ATPase gamma subunits, a 14 residue conserved segment where 
the last amino acid is found one to three residues from the C-terminal extremity was used. 

25 

Consensus pattern: [IV]-T-x-E-x(2)-[DE]-x(3)-G-A-x-[SAKR]- Note: Pea chloroplast gamma 
and two Bacillus species gamma subunits are not detected by this motif. 

[ 1] Futai M., Noumi T., Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
30 [2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3] Miki J., Maeda M., Mukohata Y., Futai M. FEBS Lett. 232:221-226(1988). 



10. (ATP Synt A) 
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ATP synthase (proton-translocating ATPase) (EC 3.6.1.34 ) [1,2] is a component of the 
cytoplasmic membrane of eubacteria, the inner membrane of mitochondria ? and the thylakoid 
5 membrane of chloroplasts. The ATPase complex is composed of an oligomeric 

transmembrane sector, called CF(0), which acts as a proton channel, and a catalytic core, 
termed coupling factor CF(l).The CF(0) a subunit, also called protein 6, is a key component 
of the proton channel; it may play a direct role in translocating protons across the membrane. 
It is a highly hydrophobic protein that has been predicted to contain 8 transmembrane regions 
1 0 [3]. Sequence comparison of a subunits from all available sources reveals very few conserved 
regions. The best conserved region is located in what is predicted to be the fifth 
transmembrane domain. This region contains three perfectly conserved residues: an arginine, 
a leucine and an asparagine. Mutagenesis experiments of ATPase activity. This region was 
selected as a signature pattern. 

15 

Consensus pattern: [STAGN]-x-[STAG]-[LIVMF]-R-L-x-[SAGV]-N-[LIVMT] [R is 
important for proton translocation] 

[ 1] Futai M. 5 Noumi T., Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
20 [2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3] Lewis M.L., Chang J.A., Simoni R.D. J. Biol. Chem. 265:10541-10550(1990). 
[ 4] Cain B.D., Simoni R.D. J. Biol. Chem. 264:3292-3300(1989). 

25 11. ATP synthase B 

Part of the CF(0) (base unit) of the ATP synthase. The base unit is thought to translocate 
protons through membrane (inner membrane in mitochondria, thylakoid membrane in plants, 
cytoplasmic membrane in bacteria). The B subunits are thought to interact with the stalk of 
the CF(1) subunits. 

30 

12. (ATP synt C) 

ATP synthase c subunit signature 
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ATP synthase (proton-translocating ATPase) [1,2] is a component of the cytoplasmic 
membrane of eubacteria, the inner membrane of mitochondria,and the thylakoid membrane of 
chloroplasts. The ATPase complex is composed of an oligomeric transmembrane sector, 
called CF(0), which acts as a proton channel, and a catalytic core, termed coupling factor 
5 CF(l).The CF(0) c subunit (also called protein 9, proteolipid, or subunit III) [3,4]is a highly 
hydrophobic protein of about 8 Kd which has been implicated in the proton-conducting 
activity of ATPase. Structurally subunit c consist of two long terminal hydrophobic regions, 
which probably span the membrane, and a central hydrophilic region. N,N - 
dicyclohexylcarbodiimide (DCCD) can bind covalently to subunit c and thereby abolish the 
1 0 ATPase activity. DCCD binds to a specific glutamate or aspartate residue which is located in 
the middle ofthe second hydrophobic region near the C-terminus of the protein. A signature 
pattern which includes the DCCD-binding residue was derived. 

Consensus pattern: [GSTA]-R-[NQ]-P-x(10)-[LIVMFYW](2)-x(3)-[LIVMFYW]-x-[DE] [D 
15 or E binds DCCD] 

[ 1] Futai M., Noumi T., Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3] Ivaschenko A.T., Karpenyuk T.A., Ponomarenko S.V. Biokhimiia 56:406-419(1991). 
2 0 [4] Recipon H., Perasso R., Adoutte A., Quetier F. J. Mol. Evol. 34:292-303(1992). 

13. (ATP synt DE) 

ATP synthase, Delta/Epsilon chain 

25 

Part of the ATP synthase CF(1). These subunits are part of the head unit of the ATP synthase. 
The subunits are called delta and epsilon in human and metozoan species but in bacterial 
species the delta (D) subunit is theequivalent to the Oligomycin sensitive subunit (OSCP) in 
metozoans. 

30 

14. (ATP synt ab) 

ATP synthase alpha and beta subunits signature 
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ATP synthase (proton-translocating ATPase) [1,2] is a component of the cytoplasmic 
membrane of eubacteria, the inner membrane of mitochondria,and the thylakoid membrane of 
chloroplasts. The ATPase complex is composed of an oligomeric transmembrane sector, 
called CF(0), and a catalytic core, called coupling factor CF(1). The former acts as a proton 
5 channel; the latter is composed of five subunits, alpha, beta, gamma, delta and epsilon. The 
sequences of subunits alpha and beta are related and both contain a nucleotide -binding site 
for ATP and ADP. The beta chain has catalytic activity, while the alpha chain is a regulatory 
subunit. Vacuolar ATPases [3] (V-ATPases) are responsible for acidifying a variety of 
intracellular compartments in eukaryotic cells. Like F- ATPases, they are oligomeric 

1 0 complexes of a transmembrane and a catalytic sector. The sequenceof the largest subunit of 
the catalytic sector (70 Kd) is related to that ofF- ATPase beta subunit, while a 60 Kd subunit, 
from the same sector, is related to the F- ATPases alpha subunit [4].Archaebacterial 
membrane-associated ATPases are composed of three subunits.The alpha chain is related to 
F- ATPases beta chain and the beta chain is related to F-ATPases alpha chain [4] .A protein 

1 5 highly similar to F-ATPase beta subunits is found [5] in some bacterial apparatus involved in 
a specialized protein export pathway that proceeds without signal peptide cleavage. This 
protein is known as flil in Bacillus and Salmonella, Spa47 (mxiB) in Shigella flexneri, HrpB6 
in Xanthomonas campestris and yscN in Yersinia virulence plasmids.To detect these ATPase 
subunits, a segment of ten amino-acid residues, containing two conserved serines, as a 

2 0 signature pattern was selected. The first serine seems to be important for catalysis - in the 

ATPase alpha chain at least - as its mutagenesis causes catalytic impairment, 

Consensus pattern: P-[SAP]-[LIV]-[DNH]-x(3)-S-x-S [The first S is a putative active site 
residue] 

25 

[ 1] Futai M. ; Noumi T., Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 

[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3] Nelson N. J. Bioenerg. Biomembr. 21:553-571(1989). 

[ 4] Gogarten J.P., Kibak FL, Dittrich P., Taiz L., Bowman E.J., Bowman B J., Manolson 

3 0 M.F., Poole R.J., Date T., Oshima T., Konishi J., Denda K., Yoshida M. Proc. Natl. Acad. 

ScL U.S.A. 86:6661-665(1989). 

[ 5] Dreyfus G., Williams A.W., Kawagishi I. ? MacNab R.M. J. Bacteriol. 175:3131- 
3138(1993). 
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15. (ATP synt ab C) 

ATP synthase ab C terminal. 

5 Number of members: 190 

[1] Abrahams JP, Leslie AG, Lutter R, Walker JE; Structure at 2.8 A resolution of Fl- 
ATPase from bovine heart mitochondria/' Nature 1994;370:621-628. 

10 

16. (A deaminase) 

Adenosine and AMP deaminase signature 

Adenosine deaminase catalyzes the hydrolytic deamination ofadenosine into inosine. AMP 
1 5 deaminase catalyzes the hydrolytic deamination of AMP into IMP. It has been shown [1] that 
these two types of enzymes share three regions of sequence similarities; these regions are 
centered on residues which are proposed to play an important role in the catalytic mechanism 
of these two enzymes. One of these regions, containing two conserved aspartic acid residues 
that are potential active site residues was selected. 

20 

Consensus pattern: [SA]-[LIVM]-[NGS]-[STA]-D-D-P [The two D's are putative active site 
residues] 

[ 1] Chang Z. ? Nygaard P., Chinault A.C., Kellems R.K Biochemistry 30:2273-2280(1991). 

25 

17. (Acetyltransf) 
Acetyltransferase (GNAT) family. 

3 0 This family contains proteins with N-acetyltransferase functions. 

[1] Neuwald AF, Landsman D; Trends Biochem Sci 1997;22:154-155. 
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18. (Aconitase C) 
Aconitase family signature 

Aconitase (aconitate hydratase) (EC 4.2.1.3 ) [1] is the enzyme from the tricarboxylic acid 
5 cycle that catalyzes the reversible isomerization of citrate and isocitrate. Cis-aconitate is 
formed as an intermediary product during the course of the reaction. In eukaryotes two 
isozymes of aconitase are known to exist: one found in the mitochondrial matrix and the 
other found in the cytoplasm. Aconitase, in its active form, contains a 4Fe-4S iron-sulfur 
cluster; three cysteine residues have been shown to be ligands of the 4Fe-4S cluster.lt has 

1 0 been shown that the aconitase family also contains the followingproteins: - Iron-responsive 

element binding protein (IRE-BP). IRE-BP is a cytosolic protein that binds to iron-responsive 
elements (IREs). IREs are stem-loop structures found in the 5TJTR of ferritin, and delta 
aminolevulinic acid synthase mRNAs, and in the 3UTR of transferrin receptor mRNA. IRE- 
BP also express aconitase activity. - 3-isopropylmalate dehydratase (EC 4.2.1.33 ) 

1 5 (isopropylmalate isomerase), the enzyme that catalyzes the second step in the biosynthesis of 
leucine. - Homoaconitase (EC 4.2.1.36 ) (homoaconitate hydratase), an enzyme that 
participates in the alpha-aminoadipate pathway of lysine biosynthesis and that converts cis- 
homoaconitate into homoisocitric acid. - Esherichia coli protein ybhJ.As a signature for 
proteins from the aconitase family, two conserved regions that contain the three cysteine 

2 0 ligands of the 4Fe-4Scluster were selected. 

Consensus pattern: [LIVM]-x(2)-[GSACIVM]-x-[LIV]-[GTIV]-[STP]-C-x(0,l)-T-N- 
[GSTANI]-x(4)-[LIVMA] [C binds the iron-sulfur center] 

2 5 Consensus pattern: G-x(2)-[LIVWPQ]-x(3)-[GAC]-C-[GSTAM]-[LIMPTA]-C-[LIMV]- 
[GA] [The two C's bind the iron-sulfur center] 

[ 1] Gruer M.J., Artymiuk P.J., Guest J.R. Trends Biochem. Sci. 22:3-6(1997). 

30 

19. (Acyl-CoA dh) 

Acyl-CoA dehydrogenases signatures 
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Acyl-CoA dehydrogenases [1,2,3] are enzymes that catalyze the alpha, beta-dehydrogenation 
of acyl-CoA esters and transfer electrons to ETF, the electron transfer protein. Acyl-CoA 
dehydrogenases are FAD flavoproteins. This family currently includes: - Five eukaryotic 
isozymes that catalyze the first step of the beta-oxidation cycles for fatty acids with various 
5 chain lengths. These are short (SCAD) (EC 1.3.99.2 ). medium (MCAD) (EC 1.3.99.3) . long 
(LCAD) (EC 13.99.13 ). very-long (VLCAD) and short/branched (SBCAD) chain acyl-CoA 
dehydrogenases. These enzymes are located in the mitochondrion. They are all 
homotetrameric proteins of about 400 amino acid residues except VLCAD which is a dimer 
and which contains, in its mature form, about 600 residues. - Glutaryl-CoA dehydrogenase 

1 0 (EC 1.3.99.7 ) (GCDH), which is involved in the catabolism of lysine, hydroxylysine and 
tryptophan. - Isovaleryl-CoA dehydrogenase (EC 1.3.99.10 ) (IVD), involved in the 
catabolism of leucine. - Acyl-coA dehydrogenases acsA and mmgC from Bacillus subtilis. - 
Butyryl-CoA dehydrogenase (EC 1.3.99.2) from Clostridium acetobutylicum. - Escherichia 
coli protein caiA [4]. - Escherichia coli protein aidB. Two conserved regions were selected as 

1 5 signature patterns. The first is located in the center of these enzymes, the second in the C- 
terminal section. 

Consensus pattern: [GAC]-[LIVM]-[ST]-E-x(2)-[GSAN]-G-[ST]-D-x(2)-[GSA] 

2 0 Consensus pattern: [QDE]-x(2)-G-[GS]-x-G-[LIVMFY]-x(2)-[DEN]-x(4)-[KR]-x(3)~ [DEN] 

[ 1] Tanaka K. ? Ikeda, Matsubara Y., Hyman D.B. Enzyme 38:91-107(1987). 
[ 2] Matsubara Y.> Indo Y., Naito E., Ozasa H., Glassberg R., Vockley J. ? Ikeda Y., Kraus J., 
Tanaka K. J. Biol. Chem. 264:16321-16331(1989). 
25 [3] Aoyama T., Ueno L, Kamijo T., Hashimoto T. J. Biol. Chem. 269:19088-19094(1994). 
[ 4] Eichler K., Bourgis F., Buchet A., Kleber H.-P., Mandrand-Berthelot M.-A. Mol. 
Microbiol. 13:775-786(1994). 

30 20. (Acyl transf) 

Acyl transferase domain 



Number of members: 161 
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[1] Serre L, Verbree EC, Dauter Z, Stuitje AR, Derewenda ZS; Medline: 95286570 The 
Escherichia coli malonyl-CoA:acyl carrier protein transacylase at 1.5-A resolution. Crystal 
structure of a fatty acid synthase component." J Biol Chem 1995;270:12961-12964. 

21. Acy phosphatase signatures 

Acylphosphatase (EC 3.6.1.7 ) [1,2] catalyzes the hydrolysis of various acylphosphate 
carboxyl-phosphate bonds such as carbamyl phosphate, succinylphosphate, 1,3- 
diphosphoglycerate, etc. The physiological role of this enzymeis not yet clear. 
Acylphosphatase is a small protein of around 100 amino-acid residues. There are two known 
isozymes. One seems to be specific to muscular tissues, the other, called 'organ-common 
type', is found in many different tissues.While acylphosphatase have been so far only 
characterized in vertebrates,there are a number of bacterial and archebacterial hypothetical 
proteins that are highly similar to that enzyme and that probably possess the same 
activity .These proteins are: - Escherichia coli hypothetical protein yccX. - Bacillus subtilis 
hypothetical protein yflL. - Archaeoglobus fulgidus hypothetical protein AF0818. Two 
conserved regions were selected as signature patterns. The first is located in the N-terminal 
section, while the second is found in the central part ofthe protein sequence. 

Consensus pattern: [LIV]-x-G-x-V-Q-G-V-x-[FM]-R 

Consensus pattern: G-[FYW]-[AVC]-[KRQAM]-N-x(3)-G-x-V-x(5)-G 

[ 1] Stefani M., Ramponi G. Life Chem. Rep. 12:271-301(1995). 

[ 2] Stefani M., Taddei N., Ramponi G. Cell. Mol. Life Sci. 53:141-151(1997). 

22. (Adap comp sub) 

Clathrin adaptor complexes medium chain signatures. 

Clathrin coated vesicles (CCV) mediate intracellular membrane traffic such asreceptor 
mediated endocytosis. In addition to clathrin, the CCV are composed of a number of other 
components including oligomeric complexes which are knownas adaptor or clathrin assembly 
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proteins (AP) complexes [1]. The adaptor complexes are believed to interact with the 
cytoplasmic tails of membrane proteins, leading to their selection and concentration. In 
mammals two type of adaptor complexes are known: AP-1 which is associated with the Golgi 
complex and AP-2 which is associated with the plasma membrane. Both AP-1 and AP-2 are 
heterotetramers that consist of two large chains - the adaptins - (gamma and beta" in AP-1; 
alpha and beta in AP-2); a medium chain (AP47 in AP-1; AP50 inAP-2) and a small chain 
(AP19 in AP-1; AP17 in AP-2). The medium chains of AP-1 and AP-2 are evolutionary 
related proteins of about 50 Kd. Homologs of AP47 and AP50 have also been found in 
Caenorhabditis elegans (genes unc-101 and ap50) [2] and yeast (gene APM1 or YAP54) 
[3]. Some more divergent, but clearly evolutionary related proteins have also been found in 
yeast: APM2 and YBR288c, Two conserved regions were selected as signature patterns, one 
located in the N-terminal region, the other from the central section of these proteins. 

Consensus pattern: [IVT]-[GSP]-W-R-x(2,3)-[GAD]-x(2)-[HY]-x(2)-N-x- [LIVMAFY](3)- 
D-[LIVM]-[LIVMT]-E 

Consensus pattern: [LIV]-x-F-I-P-P-x-G-x-[LIVMFY]-x-L-x(2)-Y 

[ 1] Pearse B.M., Robinson M.S. Annu. Rev. Cell Biol. 6:151-171(1990). 

[ 2] Lee J., Jongeward G.D., Sternberg P.W. Genes Dev. 8:60-73(1994). 

[ 3] Nakayama Y., Goebl M., O'Brine G.B., Lemmon S., Pingchang C.E., Kirchhausen T. 

Eur. J. Biochem. 202:569-574(1991). 



23. (Adenylsucc synt) 
Adenylosuccinate synthetase signatures 

Adenylosuccinate synthetase (EC 6.3.4.4) [1] plays an important role in purinebiosynthesis, 
by catalyzing the GTP-dependent conversion of IMP and aspartic acid to AMP. 
Adenylosuccinate synthetase has been characterized from various sources ranging from 
Escherichia coli (gene purA) to vertebrate tissues. Invertebrates, two isozymes are present - 
one involved in purine biosynthesis and the other in the purine nucleotide cycle. Two 
conserved regions were selected as signature patterns. The first one is a perfectly conserved 
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octapeptide located in the N-terminal section and which is involved in GTP-binding [2]. The 
second one includes a lysine residue known [2] to be essential for the enzyme's activity. 

Consensus pattern: Q-W-G-D-E-G-K-G 

Consensus pattern: G-I-[GR]-P-x-Y-x(2)-K-x(2)-R [Kis the active site residue] 

[ 1] Wiesmueller L., Wittbrodt J., Noegel A.A., Schleicher M. J. Biol. Chem. 266:2480- 
2485(1991). 

[ 2] Silva M.M., Poland B.W., Hoffman C.R., Fromm H.J., Honzatko R.B. J. Mol. Biol. 
254:431-446(1995). 

[ 3] Bouyoub A., Barbier G., Forterre P., Labedan B. 2.3.CO;2-"J. Mol. Biol. 261:144- 
154(19961 

24. (AdoHcyase) 

S-adenosyl-L-homocysteine hydrolase signatures 

S-adenosyl-L-homocysteine hydrolase (EC 3-3.1,1 ) (AdoHcyase) is an enzyme of the 
activated methyl cycle, responsible for the reversible hydratation of S-adenosyl-L- 
homocysteine into adenosine and homocysteine. AdoHcyase is anubiquitous enzyme which 
binds and requires NAD+ as a cofactor. AdoHcyase is a highly conserved protein [1] of about 
430 to 470 amino acids. Two highly conserved regions were selected as signature patterns. 
The first pattern is located in the N-terminal section; the second is derived from aglycine-rich 
region in the central part of AdoHcyase; a region thought to be involved in NAD-binding. 

Consensus pattern: [GSA]-[CS]-N-x-[FYLM]-S-[ST]-[QA]-[DEN]-x-[AV]-[AT]-[AD]- 
[AC]-[LIVMCG] 

Consensus pattern: [GA]-[KS]-x(3)-[LIV]-x-G-[FY]-G-x-[VC]-G-[KRL]-G-x-[ASC] 

[ 1] Sganga M.W., Aksamit R.R., Cantoni G.L., Bauer C.E. Proc. Natl. Acad. Sci. U.S.A. 
89:6328-6332(1992). 
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25. AhpC/TSA family 

This family contains proteins related to alkyl hydroperoxide reductaseComment: (AhpC) and 
thiol specific antioxidant (TSA). 

[1] Chae HZ, Robison K, Poole LB, Church G, Storz G, Rhee SG, Proc Natl Acad Sci U S A 
1994;91:7017-7021 

26. (Aldose epim) 

Aldose 1-epimerase putative active site Aldose 1-epimerase (EC 5.1.3.3) (mutarotase) is the 
enzyme responsible for the anomeric interconversion of D-glucose and other aldoses 
between their alpha- and beta-forms. The sequence of mutarotase from two bacteria, 
Acinetobacter calcoaceticus and Streptococcus thermophilus is available [1]. It has also been 
shown that, on the basis of extensive sequence similarities, a mutarotase domain seem to be 
present in the C-terminal half of the fungal GAL10 protein which encodes, in the N-terminal 
part, for UDP-glucose 4-epimerase. The best conserved region in the sequence of 
mutarotase is centered around a conserved histidine residue which may be involved in the 
catalytic mechanism. 

Consensus pattern: [NS]-x-T-N-H-x-Y-[FW]-N-[LI] 

[ 1] Poolman B., Royer T.J., Mainzer S.E., Schmidt B.F. J. Bacteriol. 172:4037-4047(1990). 
27. (AlkA DNA repair) 

Alkylbase DNA glycosidases alkA family signature 

Alkylbase DNA glycosidases [1] are DNA repair enzymes that hydrolyzes the deoxyribose 
N-glycosidic bond to excise various alkylated bases from a damaged DNA polymer. In 
Escherichia coli there are two alkylbase DNA glycosidases: one (gene tag)which is 
constitutively expressed and which is specific for the removal of 3-methyladenine (EC 
3.2.2.20), and one (gene alkA) which is induced during adaptation to alkylation and which 
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can remove a variety of alkylation products (EC 3.2.2.21). Tag and alkA do not share any 
region of sequence similarity. In yeast there is an alkylbase DNA glycosidase (gene MAGI) 
[2,3], which can remove 3-methyladenine or 7-methyladenine and which is structurally 
related to alkA. MAG and alkA are both proteins of about 300 amino acid residues. While 
the C- and N-terminal ends appear to be unrelated, there is a central region of about 130 
residues which is well conserved. A portion of this region has been selected as a signature 
pattern . 

Consensus pattern: G-I-G-x-W-[ST]-[AV]-x-[LIVMFY](2)-x-[LIVM]-x(8)-[MF]-x(2)- 
[ED]-D 

[ 1] Lindahl T., Sedgwick B. Annu. Rev. Biochem. 57:133-157(1988). 

[ 2] Berdal K.G., Bjoras M., Bjelland S., Seeberg E.C. EMBO J. 9:4563-4568(1990). 

[ 3] Chen J., Derfler B., Samson L. EMBO J. 9:4569-4575(1990). 

28. Ammonium transporters signature 

A number of proteins involved in the transport of ammonium ions across amembrane as well 
as some yet uncharacterized proteins have been shown [1,2] to be evolutionary related. These 
proteins are: - Yeast ammonium transporters MEP1, MEP2 and MEP3. - Arabidopsis 
thaliana high affinity ammonium transporter (gene AMT1). - Corynebacterium glutamicum 
ammonium and methylammonium transport system. - Escherichia coli putative ammonium 
transporter amtB. - Bacillus subtilis nrgA. - Mycobacterium tuberculosis hypothetical 
protein MtCY338.09c. - Synechocystis strain PCC 6803 hypothetical proteins sll0108, 
S110537 and slll017. - Methanococcus jannaschii hypothetical proteins MJ0058 and MJ1343. 
- Caenorhabditis elegans hypothetical proteins C05E11.4, F49E11.3 and M195.3. As 
expected by their transport function, these proteins are highly hydrophobic and seem to 
contain from 10 to 12 transmembrane domains. The best conserved region seems to be 
located in the fifth (or sixth) transmembrane region and is used as a signature pattern. 

Consensus pattern: D-[FYWS]-A-G-[GSC]-x(2)-[IV]-x(3)-[SAG](2)-x(2)-[SAG]- [LIVMF]- 
x(3)-[LIVMFYWA](2)-x-[GK]-x-R 
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[ 1] Ninnemann 0. ? Janniaux J.-C, Frommer W.B. EMBO J. 13:3464-3471(1994). 

[ 2] Siewe R.M., Weil B., Burkovski A., Eikmanns B.J., Eikmanns M., Kraemer R. J. Biol. 

Chem. 271:5398-5403(1996). 

[ 3] Saier M.H. Jr. Adv. Microbiol. Physiol. 40:81-136(1998). 

29. (Archjiistone) 
CBF/NF-Y subunits signatures 

Diverse DNA binding proteins are known to bind the CCAAT box, a common cis-acting 
element found in the promoter and enhancer regions of a large number of genes in 
eukaryotes. Amongst these proteins is one known as the CCAAT-binding factor (CBF) or 
NF-Y [1], CBF is a heteromeric transcription factor that consists of two different components 
both needed for DNA-binding. The HAP protein complex of yeast binds to the upstream 
activation site of cytochrome C iso-1 gene (CYC1) as well as other genes involved in 
mitochondrial electron transport and activates their expression. It also recognizes the 
sequence CCAAT and is structurally and evolutionary related to CBF. The first subunit of 
CBF, known as CBF-A or NF-YB in vertebrates, HAP3 in budding yeast and as php3 in 
fission yeast, is a protein of 116 to 210 amino-acid residues which contains a highly 
conserved central domain of about 90residues. This domain seems to be involved in DNA- 
binding; a signature pattern had been developed from its central part. The second subunit of 
CBF, known as CBF-B or NF-YA in vertebrates, HAP2 in budding yeast and php2 in fission 
yeast, is a protein of 265 to 350 amino-acid residues which contains a highly conserved 
region of about 60 residues. This region, called the 'essential core 1 [2], seems to consist of two 
subdomains: an N-terminal subunit-association domain and a C-terminal DNA recognition 
domain. A signature pattern has been developed from a section of the subunit-association 
domain. 

Consensus pattern: C-V-S-E-x-I-S-F-[LIVM]-T-[SG]-E-A-[SC]-[DE]-[KRO]-C- 

Consensus pattern: Y-V-N-A-K-Q-Y-x-R-I-L-K-R-R-x-A-R-A-K-L-E- 

[ 1] Li X.-Y., Mantovani R., Hooft van Huijsduijnen R., Andre I., Benoist C, Mathis D. 
Nucleic Acids Res. 20:1087-1091(1992). 
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[ 2] Olesen J.T., Fikes J.D., Guarente L. Mol. Cell. Biol. 11:611-619(1991). 
30. Argininosuccinate synthase signatures 

Argininosuccinate synthase (EC 6.3 A5) (AS) is a urea cycle enzyme that catalyzes the 
penultimate step in arginine biosynthesis: the ATP-dependent ligation of citrulline to 
aspartate to form argininosuccinate, AMP andpyrophosphate [1,2]. In humans, a defect in the 
AS gene causes citrullinemia, a genetic disease characterized by severe vomiting spells and 
mental retardation .AS is a homotetrameric enzyme of chains of about 400 amino-acid 
residues. Anarginine seems to be important for the enzyme's catalytic mechanism. The 
sequences of AS from various prokaryotes, archaebacteria and eukaryotes show significant 
similarity. Two signature patterns have been selected for AS. The first is a highly conserved 
stretch of nine residues located in the N-terminal extremity of these enzymes, the second is 
derived from a conserved region which contains one of the conserved arginine residues. 

Consensus pattern: [AS]-[FY]-S-G-G-[LV]-D-T-[ST]- 

Consensus pattern: G-x-T-x-K-G-N-D-x(2)-R-F- 

[ 1] van Vliet F. ? Crabeel M., Boyen A., Tricot C, Stalon V., Falmagne P., Nakamura Y., 

Baumberg S., Glansdorff N. Gene 95:99-104(1990). 

[ 2] Morris C J., Reeve J.N. J. Bacterid. 170:3125-3130(1988). 

31. Armadillo/beta-catenin-like repeats 

Approx. 40 amino acid repeat. Tandem repeats form super-helix of helices that is proposed to 
mediate interaction of beta-catenin with its ligands. CAUTION: This family does not contain 
all known armadillo repeats. 

[1] Huber AH, Nelson WJ, Weis WI, Cell 1997;90:871-882. 

[2] Gumbiner BM, Curr Opin Cell Biol 1995;7:634-640. 

[3] Cavallo R, Rubenstein D, Peifer M, Curr Opin Genet Dev 1997;7:459-466. 

[4] Su LK, Vogelstein B, Kinzler KW, Science 1993;262:1734-1737. 
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[5] Masiarz FR, Munemitsu S, Polakis P Science 1993;262:1731-1734 
[6] Peifer M, Wieschaus E, Cell 1990;63:1167-1176. 

32. (Asn Synthase) 
Asparagine synthase 

This family is always found associated with OATase_2 . Members of this family catalyse the 
conversion of aspartate to asparagine. 

33. Asparaginase_2 
Asparaginase 12 members 

34. (Aspartyl tRNA N) 

Aminoacyl-transfer RNA synthetases class-II signatures 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino 
acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 
prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two 
aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
mitochondrial form. While all these enzymes have a common function, they are widely 
diverse in terms of subunit size and of quaternary structure. The synthetases specific for 
alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-II synthetases [2 to 6] and probably have a common 
folding pattern in their catalytic domain for the binding of ATP and amino acid which is 
different to the Rossmann fold observed for the class I synthetases [7]. Class-II tRNA 
synthetases do not share a high degree of similarity, however at least three conserved regions 
are present [2,5,8]. Signature patterns have been derived from two of these regions. 



Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE] 
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Consensus pattern: [GSTALVF]-{DENQHRKP}-[GSTA]-[LIVMF]-[DE]-R-[LIVMF]-x- 
[LI VMSTAG] - [LI VMF Y] 

[ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987). 
[ 2] Delarue M., Moras D. BioEssays 15:675-687(1993). 
[ 3] Schimmel P. Trends Biochem. Sci. 16:1-3(1991). 

[ 4] Nagel G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
[ 5] Cusack S., Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991). 
[ 6] Cusack S. Biochimie 75:1077-1081(1993). 

[ 7] Cusack S., Berthet-Colominas C, Haertlein M., Nassar N., Leberman R. Nature 347:249- 
255(1990). 

[ 8] Leveque F., Plateau P., Dessen P., Blanquet S. Nucleic Acids Res. 18:305-312(1990). 

35. (ArfGap) Putative GTP-ase activating protein for Arf. Putative zinc fingers with GTPase 
activating proteins (GAPs) towards the small GTPase, Arf. The GAP of ARD1 stimulates 
GTPase hydrolysis for ARD1 but not ARFs. Number of members: 34 

[l]Medline: 96324970. Identification and cloning of centaurin-alpha. A novel 
phosphatidylinositol 3,4,5-trisphosphate-binding protein from rat brain. Hammonds-Odie LP, 
Jackson TR, Profit AA, Blader IJ, Turck CW, Prestwich GD, Theibert AB; J Biol Chem 
1996;271:18859-18868. 

[2]Medline: 97296423. A target of phosphatidylinositol 3,4,5 -trisphosphate with a zinc finger 
motif similar to that of the ADP-ribosylation -factor GTPase-activating protein and two 
pleckstrin homology domains. Tanaka K, Imajoh-Ohmi S, Sawada T, Shirai R, Hashimoto Y, 
Iwasaki S, Kaibuchi K, Kanaho Y, Shirai T, Terada Y, Kimura K, Nagata S, Fukui Y; Eur J 
Biochem 1997;245:512-519. 

[3] 98112795. Molecular characterization of the GTPase-activating domain of ADP- 
ribosylation factor domain protein 1 (ARD1). Vitale N, Moss J, Vaughan M; J Biol Chem 
1998;273:2553-2560. 

36. Apolipoprotein. Apolipoprotein A1/A4/E family. This family includes: Swiss:P02647 
Apolipoprotein A-I. Swiss:P06727 Apolipoprotein A-IV. Swiss:P02649 Apolipoprotein E. 
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These proteins contain several 22 residue repeats which form a pair of alpha helices. Number 
of members: 42 

[l]Medline: 91289138. Three-dimensional structure of the LDL receptor-binding domain of 
5 human apolipoprotein E. Wilson C ? Wardell MR, Weisgraber KH, Mahley RW, Agard DA; 
Science 1991;252:1817-1822. 

37. Amino acid permeases signature 

1 0 Amino acid permeases are integral membrane proteins involved in the transport of amino 
acids into the cell. A number of such proteins have been found to be evolutionary related 
[1,2,3]. These proteins are: - Yeast general amino acid permeases (genes GAP1, AGP2 and 
AGP3). - Yeast basic amino acid permease (gene ALP1). - Yeast Leu/Val/Ile permease (gene 
BAP2). - Yeast arginine permease (gene CAN1). - Yeast dicarboxylic amino acid permease 

1 5 (gene DIPS). - Yeast asparagine/glutamine permease (gene AGP1). - Yeast glutamine 

permease (gene GNP1). - Yeast histidine permease (gene HIP1). - Yeast lysine permease 
(gene LYP1). - Yeast proline permease (gene PUT4). - Yeast valine and tyrosine permease 
(gene VAL1/TAT1). - Yeast tryptophan permease (gene TAT2/SCM2). - Yeast choline 
transport protein (gene HNM1/CTR1). - Yeast GABA permease (gene UGA4). - Yeast 

2 0 hypothetical protein YKL174c. - Fission yeast protein isp5. - Fission yeast hypothetical 
protein SpAC8A4.11 - Fission yeast hypothetical protein SpACllD3.08c. - Emericella 
nidulans proline transport protein (gene prnB). - Trichoderma harzianum amino acid 
permease INDA1. - Salmonella typhimurium L-asparagine permease (gene ansP). - 
Escherichia coli aromatic amino acid transport protein (gene aroP). - Escherichia coli D- 

2 5 serine/D-alanine/glycine transporter (gene cycA). - Escherichia coli GABA permease (gene 

gabP). - Escherichia coli lysine-specific permease (gene lysP). - Escherichia coli 
phenylalanine-specific permease (gene pheP). - Salmonella typhimurium proline-specific 
permease (gene proY). - Escherichia coli and Klebsiella pneumoniae hypothetical protein 
yeeF. - Escherichia coli and Salmonella typhimurium hypothetical protein yifK. - Bacillus 

3 0 subtilis permeases rocC and rocE which probably transports arginine or ornithine. These 

proteins seem to contain up to 12 transmembrane segments. As a signature for this family of 
proteins, the best conserved region which is located in the second transmembrane segment 
has been selected. 
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Consensus pattern: [STAGC]-G-[PAG]-x(2 ? 3)-[LIVMFYWA](2)-x-[LIVMFYW]-x- 
[LIVMFWSTAGC](2)-[STAGC]-x(3)-[LIVMFYWT]-x-[LIVMST]-x(3)- [LIVMCTA]- 
[GA]~E-x(5)-[PSAL]- 

[ 1] Weber E., Chevalier M.R., Jund R. J. Mol. Evol. 27:341-350(1988). 

[ 2] Vandenbol M., Jauniaux J.-C, Grenson M. Gene 83:153-159(1989). 

[ 3] Reizer J., Finley K., Kakuda D., McLeod C.L., Reizer A., Saier M.H. Jr. Protein Sci. 

2:20-30(1993). 

38. aakinase (1) Glutamate 5-kinase signature 

Glutamate 5-kinase (EC 2.7.2.11) (gamma-glutamyl kinase) (GK) is the enzyme that 
catalyzes the first step in the biosynthesis of proline from glutamate, the ATP-dependent 
phosphorylation of L-glutamate into L-glutamate 5-phosphate. In eubacteria (gene proB) and 
yeast [1] (gene PROl), GK is a monofunctional protein, while in plants and mammals, it is a 
bifunctional enzyme (P5CS) [2]that consists of two domains: a N-terminal GK domain and a 
C-terminal gamma-glutamyl phosphate reductase domain (EC 1.2.1.41) (see 
<PDOC00940>).As a signature pattern, a highly conserved glycine-and alanine-rich region 
located in the central section of these enzymes has been selected. Yeast hypothetical protein 
YHR033w is highly similar to GK. 

Consensus pattern: [GSTN]-x(2)-G-x-G-[GC]-[IM]-x-[STA]-K-[LIVM]-x-[SA]-[TCA]- 
x(2)-[GALV]-x(3)-G- 

[ 1] Li W., Brandriss M.C. J. Bacteriol. 174:4148-4156(1992). 

[ 2] Hu C.-A.A., Delauney AJ. ? Verma D.P.S. Proc. Natl. Acad. Sci. U.S A. 89:9354- 
9358(1992). 

aakinase (2) Aspartokinase signature 

Aspartokinase (EC 2.7.2.4) (AK) [1] catalyzes the phosphorylation of aspartate. The product 
of this reaction can then be used in the biosynthesis of lysine or in the pathway leading to 
homoserine, which participates in the biosynthesis of threonine, isoleucine and methionine. In 
Escherichia coli, there are three different isozymes which differ in their sensitivity to 
repression and inhibition by Lys, Met and Thr. AK1 (gene thrA) and AK2 (gene metL) are 
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bifunctional enzymes which both consist of an N- terminal AK domain and a C-terminal 
homoserine dehydrogenase domain. AK1 is involved in threonine biosynthesis and AK2, in 
that of methionine. The third isozyme, AK3 (gene lysC), is monofunctional and involved in 
lysine synthesis. In yeast, there is a single isozyme of AK (gene HOM3). As a signature 
pattern for AK, a conserved region located in the N-terminal extremity has been selected. 

Consensus pattern: [LIVM]-x-K-[FY]-G-G-[ST]-[SC]-[LIVM]- 

[ 1] Rafalski J.A., Falco S.C. J. Biol. Chem. 263:2146-2151(1988). 

aakinase (3) Gamma-glutamyl phosphate reductase signature 

Gamma-glutamyl phosphate reductase (EC 1-2.1.41) (GPR) is the enzyme that catalyzes the 
second step in the biosynthesis of proline from glutamate, the NADP-dependent reduction of 
L-glutamate 5-phosphate into L-glutamate 5-semialdehyde and phosphate. In eubacteria 
(gene proA) and yeast [1] (gene PR02), GPR is a monofunctional protein, while in plants and 
mammals, it is a bifunctional enzyme (P5CS) [2]that consists of two domains: a N-terminal 
glutamate 5-kinase domain(EC 2.7.2.11) (see <PDOC00701>) and a C-terminal GPR 
domain. As a signature pattern, a conserved region that contains two histidine residues has 
been selected. This region is located in the last third of GPR. 

Consensus pattern: V-x(5)-A-[LIV]-x-H-I-x(2)-[HY]-[GS]-[ST]-x-H-[ST]-[DE]-x- 1- 

[ 1] Pearson B.M., Hernando Y., Payne J., Wolf S.S., Kalogeropoulos A., Schweizer M. 
Yeast 12:1021-1031(1996). 

[ 2] Hu C.-A.A., Delauney A.J., Verma D.P.S. Proc. Natl. Acad. Sci. U.S.A. 89:9354- 
9358(1992). 

39. (abhydrolase) alpha/beta hydrolase fold. This catalytic domain is found in a very wide 
range of enzymes. 

[1] Ollis DL, Cheah E, Cygler M, Dijkstra B, Frolow F, Franken SM, Harel M, Remington 
SJ, Silman I, Schrag J, Sussman JL, Verschueren KHG, Goldman A, Protein Eng 
1992;5:197-211. 
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40. (Acid phosphat) Histidine acid phosphatases signatures 

Acid phosphatases (EC 3.1.3.2) are a heterogeneous group of proteins that hydrolyze 
phosphate esters, optimally at low pH. It has been shown [1] that a number of acid 
phosphatases, from both prokaryotes and eukaryotes, share two regions of sequence 
similarity, each centered around a conserved histidine residue. These two histidines seem 
to be involved in the enzymes' catalytic mechanism [2,3]. The first histidine is located in the 
N-terminal section and forms a phosphohistidine intermediate while the second is located in 
the C- terminal section and possibly acts as proton donor. Enzymes belonging to this family 
are called 'histidine acid phosphatases 1 and are listed below: 

- Escherichia coli pH 2.5 acid phosphatase (gene appA). 

- Escherichia coli glucose-l-phosphatase (EC 3.1.3.10) (gene agp). 

- Yeast constitutive and repressible acid phosphatases (genes PH03 and PH05). 

- Fission yeast acid phosphatase (gene phol). 

- Aspergillus phytases A and B (EC 3.1.3.8) (gene phyA and phyB). 

- Mammalian lysosomal acid phosphatase. 

- Mammalian prostatic acid phosphatase. 

- Caenorhabditis elegans hypothetical proteins B0361.7, C05C10.1, C05C10.4 
and F26C11.1. 

Consensus pattern[LIVM]-x(2)-[LIVMA]-x(2)-[LIVM]-x-R-H-[GN]-x-R-x-[PAS] [H is the 
phosphohistidine residue] 

Consensus pattern[LIVMF]-x-[LIVMFAG]-x(2)-[STAGI]-H-D-[STANQ]-x-[LIVM]-x(2)- 
[LIVMFY]-x(2)-[STA] [H is an active site residue] Sequences known to belong to this class 
detected by the patternALL, except for rat prostatic acid phosphatase which seems to have 
Tyr instead of the active site His 

[ 1] van Etten R.L., Davidson R., Stevis P.E., MacArthur H., Moore D.L. J. Biol. Chem. 
266:2313-2319(1991). 

[ 2] Ostanin K., Harms E.H., Stevis P.E., Kuciel R. ? Zhou M.-M., van Etten R.L. J. Biol. 
Chem. 267:22830-22836(1992). 
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[ 3] Schneider G., Lindqvist Y., Vihko P. EMBO J. 12:2609-2615(1993). 
41. Aconitase family signatures 

Aconitase (aconitate hydratase) (EC 4.2.1.3 ) [1] is the enzyme from the tricarboxylic acid 
cycle that catalyzes the reversible isomerization of citrate and isocitrate. Cis-aconitate is 
formed as an intermediary product during the course of the reaction. In eukaryotes two 
isozymes of aconitase are known to exist: one found in the mitochondrial matrix and the 
other found in the cytoplasm. Aconitase, in its active form, contains a 4Fe-4S iron-sulfur 
cluster; three cysteine residues have been shown to be ligands of the 4Fe-4S cluster. It has 
been shown that the aconitase family also contains the following proteins: - Iron-responsive 
element binding protein (IRE-BP). IRE-BP is a cytosolic protein that binds to iron-responsive 
elements (IREs). IREs are stem-loop structures found in the 5TJTR of ferritin, and delta 
aminolevulinic acid synthase mRNAs, and in the 3'UTR of transferrin receptor mRNA. IRE- 
BP also express aconitase activity. - 3-isopropylmalate dehydratase (EC 4.2.1.33) 
(isopropylmalate isomerase), the enzyme that catalyzes the second step in the biosynthesis of 
leucine. - Homoaconitase (EC 4.2.1.36 ) (homoaconitate hydratase), an enzyme that 
participates in the alpha-aminoadipate pathway of lysine biosynthesis and that converts cis- 
homoaconitate into homoisocitric acid. - Esherichia coli protein ybhJ 

Consensus pattern: [LIVM]-x(2)-[GSACIVM]-x-[LIV]-[GTIV]-[STP]-C-x(0,l)-T-N- 
[GSTANI]-x(4)-[LIVMA] [C binds the iron-sulfur center] 

Consensus pattern: G-x(2)-[LIVWPQ]-x(3)-[GAC]-C-[GSTAM]-[LIMPTA]-C-[LIMV]- 
[GA] [The two C's bind the iron-sulfur center] - 

[ 1] Gruer M.J., Artymiuk P.J., Guest J.R. Trends Biochem. Sci. 22:3-6(1997). 
42. Actins signatures 

Actins [1 to 4] are highly conserved contractile proteins that are present in all eukaryotic 
cells. In vertebrates there are three groups of actin isoforms: alpha, beta and gamma. The 
alpha actins are found in muscle tissues and are a major constituent of the contractile 
apparatus. The beta and gamma actins co-exists in most cell types as components of the 
cytoskeleton and as mediators of internal cell motility. In plants [5] there are many isoforms 
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which are probably involved in a variety of functions such as cytoplasmic streaming, cell 
shape determination, tip growth, graviperception, cell wall deposition, etc. Actin exists either 
in a monomeric form (G-actin) or in a polymerized form (F-actin). Each actin monomer can 
bind a molecule of ATP; when polymerization occurs, the ATP is hydrolyzed. Actin is a 
5 protein of from 374 to 379 amino acid residues. The structure of actin has been highly 

conserved in the course of evolution. Recently some divergent actin-like proteins have been 
identified in several species. These proteins are: - Centractin (actin-RPV) from mammals, 
fungi (yeast ACTS, Neurospora crassa ro-4) and Pneumocystis carinii (actin-II). Centractin 
seems to be a component of a multi-subunit centrosomal complex involved in microtubule 

1 0 based vesicle motility. This subfamily is also known as ARP1. - ARP2 subfamily which 
includes chicken ACTL, yeast ACT2, Drosophila 14D, C.elegans actC. - ARP3 subfamily 
which includes actin 2 from mammals, Drosophila 66B, yeast ACT4 and fission yeast act2. - 
ARP4 subfamily which includes yeast ACT3 and Drosophila 13E. Three signature patterns 
have been developed. The first two are specific to actins and span positions 54 to 64 and 357 

15 to 365. The last signature picks up both actins and the actin-like proteins and corresponds to 
positions 106 to 118 in actins. 

Consensus pattern: [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQ](2)-G- 
Consensus pattern: W-[IV]-[STA]-[RK]-x-[DE]-Y-[DNE]-[DE]- 
2 0 Consensus pattern: [LM]-[LIVM]-T-E-[GAPQ]-x-[LIVMFYWHQ]-N-[PSTAQ]-x(2)-N- 
[KR]- 

[ 1] Sheterline P., Clayton J., Sparrow J.C (In) Actins, 3rd Edition, Academic Press Ltd, 
London, (1996). 

2 5 [2] Pollard T.D., Cooper LA. Annu. Rev. Biochem. 55:987-1036(1986). 
[ 3] Pollard T.D. Curr. Opin. Cell Biol. 1:33-40(1990). 
[ 4] Rubenstein P.A. BioEssays 12:309-315(1990). 

[ 5] Meagher R.B., McLean B.G. Cell Motil. Cytoskeleton 16:164-166(1990). 

30 

43. Adenylate kinase signature 

Adenylate kinase (EC 2.7.4.3 ) (AK) [1] is a small monomeric enzyme that catalyzes the 
reversible transfer of MgATP to AMP (MgATP + AMP = MgADP 4- ADP).In mammals 
there are three different isozymes: - AK1 (or myokinase), which is cytosolic. - AK2, which is 
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located in the outer compartment of mitochondria. - AK3 (or GTPrAMP phosphotransferase), 
which is located in the mitochondrial matrix and which uses MgGTP instead of MgATP.The 
sequence of AK has also been obtained from different bacterial species and from plants and 
fungi. Two other enzymes have been found to be evolutionary related to AK. These are: - 
5 Yeast uridylate kinase (EC 2.7.4.-) (UK) (gene URA6) [2] which catalyzes the transfer of a 
phosphate group from ATP to UMP to form UDP and ADP. - Slime mold UMP-CMP kinase 
(EC 2.7.4.14 ) [3] which catalyzes the transfer of a phosphate group from ATP to either CMP 
or UMP to form CDP or UDP and ADP. Several regions of AK family enzymes are well 
conserved, including the ATP-binding domains. The most conserved of all regions have been 
1 0 selected as a signature for this type of enzyme. This region includes an aspartic acid residue 
that is part of the catalytic cleft of the enzyme and that is involved in a salt bridge. It also 
includes an arginine residue whose modification leads to inactivation of the enzyme 

Consensus pattern: [LIVMFYW](3)-D-G-[FYI]-P-R-x(3)-[NQ]- 

15 

[ 1] Schulz G.E. Cold Spring Harbor Symp. Quant. Biol. 52:429-439(1987). 

[ 2] Liljelund P., Sanni A., Friesen J.D., Lacroute F. Biochem. Biophys. Res. Commun. 

165:464-473(1989). 

[ 3] Wiesmueller L., Noegel A.A., Barzu 0. ? Gerisch G., Schleicher M. J. Biol. Chem. 
2 0 265:6339-6345(1990). 

[ 4] Kath T.H., Schmid R., Schaefer G. Arch. Biochem. Biophys. 307:405-410(1993). 

44. (adh_short) Short-chain dehydrogenases/reductases family signature. The short-chain 

2 5 dehydrogenases/reductases family (SDR) [1] is a very large family of enzymes, most of 

which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of 
this family to be characterized was Drosophila alcohol dehydrogenase, this family used to be 
called [2,3,4] 'insect-type', or 'short-chain' alcohol dehydrogenases. Most member of this 
family are proteins of about 250 to 300 amino acid residues. The proteins currently known to 

3 0 belong to this family are listed below. - Alcohol dehydrogenase (EC 1.1.1.1) from insects 

such as Drosophila. - Acetoin dehydrogenase (EC 1.1.1.5) from Klebsiella terrigena (gene 
budC). - D-beta-hydroxybutyrate dehydrogenase (BDH) (EC 1.1.1.30 ) from mammals. - 
Acetoacetyl-CoA reductase (EC 1.1.1.36 ) from various bacterial species (gene phbB or 
phaB). - Glucose 1 -dehydrogenase (EC 1.1.1.47 ) from Bacillus. - 3-beta-hydroxysteroid 
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dehydrogenase (EC 1.1.1.51 ) from Comomonas testosteroni. - 20-beta-hydroxysteroid 
dehydrogenase (EC 1.1.1.53 ) from Streptomyces hydrogenans. - Ribitol dehydrogenase (EC 
1.1.1.56 ) (RDH) from Klebsiella aerogenes. - Estradiol 17-beta-dehydrogenase (EC 1.1.1.62) 
from human. - Gluconate 5-dehydrogenase (EC 1.1.1.69 ) from Gluconobacter oxydans (gene 
5 gno), - 3-oxoacyl-[acyl-carrier protein] reductase (EC 1.1.1.100 ) from Escherichia coli (gene 
fabG) and from plants. - Retinol dehydrogenase (EC 1.1.1.105 ) from mammals. - 2-deoxy-d- 
gluconate 3 -dehydrogenase (EC 1.1.1.125 ) from Escherichia coli and Erwinia chrysanthemi 
(gene kduD). - Sorbitol-6-phosphate 2-dehydrogenase (EC 1.1.1.140 ) from Escherichia coli 
(gene gutD) and from Klebsiella pneumoniae (gene sorD). - 15-hydroxyprostaglandin 

10 dehydrogenase (NAD+) (EC 1.1.1.141 ) from human. - Corticosteroid 11-beta-dehydrogenase 
(EC 1.1.1.146 ) (11-DH) from mammals. - 7-alpha-hydroxysteroid dehydrogenase (EC 
1.1.1.159 ) from Escherichia coli (gene hdhA), Eubacterium strain VPI 12708 (gene baiA) and 
from Clostridium sordellii. - NADPH-dependent carbonyl reductase (EC 1.1.1.184 ) from 
mammals. - Tropinone reductase-I (EC 1.1.1.206 ) and -II (EC 1.1.1.236) from plants. - N- 

15 acylmannosamine 1 -dehydrogenase (EC 1.1.1.233 ) from Flavobacterium strain 141-8. - D- 
arabinitol 2-dehydrogenase (ribulose forming) (EC 1.1.1.250) from fungi. - 
Tetrahydroxynaphthalene reductase (EC 1.1.1.252 ) from Magnaporthe grisea. - Pteridine 
reductase 1 (EC 1.1.1.253) (gene PTR1) from Leishmania. - 2,5-dichloro-2 ? 5- 
cyclohexadiene-l ? 4-diol dehydrogenase (EC 1.1.-.-) from Pseudomonas paucimobilis. - Cis- 

2 0 l ? 2-dihydroxy-3,4-cyclohexadiene-l-carboxylate dehydrogenase (EC 1.3.1. -) from 

Acinetobacter calcoaceticus (gene benD) and Pseudomonas putida (gene xylL). - Biphenyl- 
2,3-dihydro-2 ? 3-diol dehydrogenase (EC 1.3.1.-) (gene bphB) from various Pseudomonaceae. 

- Cis-toluene dihydrodiol dehydrogenase (EC 1.3.1.-) from Pseudomonas putida (gene todD). 

- Cis-benzene glycol dehydrogenase (EC 1.3.1.19) from Pseudomonas putida (gene bnzE). - 

2 5 2,3-dihydro-2,3-dihydroxybenzoate dehydrogenase (EC 1.3.1.28) from Escherichia coli (gene 

entA) and Bacillus subtilis (gene dhbA). - Dihydropteridine reductase (EC 1.6.99.7 ) 
(HDHPR) from mammals. - Lignin degradation enzyme ligD from Pseudomonas 
paucimobilis. - Agropine synthesis reductase from Agrobacterium plasmids (gene masl). - 
Versicolorin reductase from Aspergillus parasiticus (gene VER1). - Putative keto-acyl 

3 0 reductases from Streptomyces polyketide biosynthesis operons. - A trifunctional hydratase- 

dehydrogenase-epimerase from the peroxisomal beta-oxidation system of Candida tropicalis. 
This protein contains two tandemly repeated 'short-chain dehydrogenase-type' domain in its 
N-terminal extremity. - Nodulation protein nodG from species of Azospirillum and 
Rhizobium which is probably involved in the modification of the nodulation Nod factor fatty 
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acyl chain. - Nitrogen fixation protein fixR from Brady rhizobium japonicum. - Bacillus 
subtilis protein dltE which is involved in the biosynthesis of D- alanyl-lipoteichoic acid. - 
Human follicular variant translocation protein 1 (FVT1). - Mouse adipocyte protein p27. - 
Mouse protein Ke 6. - Maize sex determination protein TASSELSEED 2. - Sarcophaga 
5 peregrina 25 Kd development specific protein. - Drosophila fat body protein P6. - A Listeria 
monocytogenes hypothetical protein encoded in the internalins gene region. - Escherichia coli 
hypothetical protein yciK. - Escherichia coli hypothetical protein ydfG. - Escherichia coli 
hypothetical protein yjgl. - Escherichia coli hypothetical protein yjgU. - Escherichia coli 
hypothetical protein yohF. - Bacillus subtilis hypothetical protein yoxD. - Bacillus subtilis 

1 0 hypothetical protein ywfD. - Bacillus subtilis hypothetical protein ywfH. - Yeast hypothetical 
protein YIL124w. - Yeast hypothetical protein YIR035c. - Yeast hypothetical protein 
YIR036c. - Yeast hypothetical protein YKL055c. - Fission yeast hypothetical protein 
SpAC23D3.11. One of the best conserved regions which includes two perfectly conserved 
residues, a tyrosine and a lysine has been selected as a signature pattern for this family of 

15 proteins. The tyrosine residue participates in the catalytic mechanism. 

Consensus pattern: [LIVSPADNK]-x(12)-Y-[PSTAGNCV]-[STAGNQCIVM]-[STAGC]-K- 
{PC}-[SAGFYR]-[LIVMSTAGD]-x(2)-[LIVMFYW]-x(3)- [LI VMFY WGAPTHQ] - 
[GSACQRHM] [Y is an active site residue] - 

20 

[ 1] Joernvall H., Persson B. ? Krook M., Atrian S., Gonzalez-Duarte R., Jeffery J. ? Ghosh D. 
Biochemistry 34:6003-6013(1995). 

[ 2] Villarroya A., Juan E. ? Egestad B., Joernvall H. Eur. J. Biochem. 180:191-197(1989). 
[ 3] Persson B., Krook M., Joernvall H. Eur. J. Biochem. 200:537-543(1991). 
25 [4] Neidle EX., Hartnett C, Ornston N.L., Bairoch A., Rekik M., Harayama S. Eur. J. 
Biochem. 204:113-120(1992). 



45. (adh_short_C2) Short-chain dehydrogenases/reductases family signature 
3 0 The short-chain dehydrogenases/reductases family (SDR) [1] is a very large family of 

enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the 
first member of this family to be characterized was Drosophila alcohol dehydrogenase, this 
family used to be called [2,3,4] 1 insect-type r , or T short-chain T alcohol dehydrogenases. Most 
member of this family are proteins of about 250 to 300 amino acid residues. The proteins 



Attorney No. 2750-1237P 

99 

currently known to belong to this family are listed below. - Alcohol dehydrogenase (EC 
1.1.1.1 ) from insects such as Drosophila. - Acetoin dehydrogenase (EC 1.1.1.5 ) from 
Klebsiella terrigena (gene budC). - D-beta-hydroxybutyrate dehydrogenase (BDH) (EC 
1.1.1.30 ) from mammals. - Acetoacetyl-CoA reductase (EC 1.1.136 ) from various bacterial 
5 species (gene phbB or phaB). - Glucose 1 -dehydrogenase (EC 1.1.1.47 ) from Bacillus. - 3- 
beta-hydroxysteroid dehydrogenase (EC 1.1.1.51 ) from Comomonas testosteroni. - 20-beta- 
hydroxysteroid dehydrogenase (EC 1.1.1.53 ) from Streptomyces hydrogenans. - Ribitol 
dehydrogenase (EC 1.1.1.56 ) (RDH) from Klebsiella aerogenes. - Estradiol 17-beta- 
dehydrogenase (EC 1.1.1.62 ) from human. - Gluconate 5-dehydrogenase (EC 1.1.1.69 ) from 

1 0 Gluconobacter oxydans (gene gno). - 3-oxoacyl-[acyl-carrier protein] reductase (EC 

1.1.1.100 ) from Escherichia coli (gene fabG) and from plants. - Retinol dehydrogenase (EC 
1.1.1.105 ) from mammals. - 2-deoxy-d-gluconate 3 -dehydrogenase (EC 1.1.1.125) from 
Escherichia coli and Erwinia chrysanthemi (gene kduD). - Sorbitol-6-phosphate 2- 
dehydrogenase (EC 1.1.1.140 ) from Escherichia coli (gene gutD) and from Klebsiella 

15 pneumoniae (gene sorD). - 15-hydroxyprostaglandin dehydrogenase (NAD+) (EC 1.1.1.141) 
from human. - Corticosteroid 11-beta-dehydrogenase (EC 1.1.1.146 ) (11-DH) from 
mammals. - 7-alpha-hydroxysteroid dehydrogenase (EC 1.1.1.159 ) from Escherichia coli 
(gene hdhA), Eubacterium strain VPI 12708 (gene baiA) and from Clostridium sordelliL - 
NADPH-dependent carbonyl reductase (EC 1.1.1.184 ) from mammals. - Tropinone 

2 0 reductase^ (EC 1.1.1.206) and -II (EC 1.1.1.236 ) from plants. - N-acyimannosamine 1- 
dehydrogenase (EC 1.1.1.233 ) from Flavobacterium strain 141-8. - D-arabinitol 2- 
dehydrogenase (ribulose forming) (EC 1.1.1.250) from fungi. - Tetrahydroxynaphthalene 
reductase (EC 1.1.1.252 ) from Magnaporthe grisea. - Pteridine reductase 1 (EC 1.1.1.253) 
(gene PTR1) from Leishmania. - 2 ? 5-dichloro-2,5-cyclohexadiene-l,4-diol dehydrogenase 

2 5 (EC 1.1.-.-) from Pseudomonas paucimobilis. - Cis-l,2-dihydroxy-3,4-cyclohexadiene-l- 

carboxylate dehydrogenase (EC 1.3.1. -) from Acinetobacter calcoaceticus (gene benD) and 
Pseudomonas putida (gene xylL). - Biphenyl-2 ? 3~dihydro-2 ? 3-diol dehydrogenase (EC 1.3.1.- 
) (gene bphB) from various Pseudomonaceae. - Cis-toluene dihydrodiol dehydrogenase (EC 
1.3.1.-) from Pseudomonas putida (gene todD). - Cis-benzene glycol dehydrogenase (EC 

3 0 1.3.1.19 ) from Pseudomonas putida (gene bnzE). - 2,3-dihydro-2,3-dihydroxybenzoate 

dehydrogenase (EC 1.3.1.28 ) from Escherichia coli (gene entA) and Bacillus subtilis (gene 
dhbA). - Dihydropteridine reductase (EC 1.6.99.7 ) (HDHPR) from mammals. - Lignin 
degradation enzyme ligD from Pseudomonas paucimobilis. - Agropine synthesis reductase 
from Agrobacterium plasmids (gene masl). - Versicolorin reductase from Aspergillus 
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parasiticus (gene VER1). - Putative keto-acyl reductases from Streptomyces polyketide 
biosynthesis operons. - A trifunctional hydratase-dehydrogenase-epimerase from the 
peroxisomal beta-oxidation system of Candida tropicalis. This protein contains two tandemly 
repeated 'short-chain dehydrogenase- type' domain in its N-terminal extremity. - Nodulation 
protein nodG from species of Azospirillum and Rhizobium which is probably involved in the 
modification of the nodulation Nod factor fatty acyl chain. - Nitrogen fixation protein fixR 
from Brady rhizobium japonicum. - Bacillus subtilis protein dltE which is involved in the 
biosynthesis of D- alanyl-lipoteichoic acid. - Human follicular variant translocation protein 1 
(FVT1). - Mouse adipocyte protein p27. - Mouse protein Ke 6. - Maize sex determination 
protein TASSELSEED 2. - Sarcophaga peregrina 25 Kd development specific protein. - 
Drosophila fat body protein P6. - A Listeria monocytogenes hypothetical protein encoded in 
the internalins gene region. - Escherichia coli hypothetical protein yciK. - Escherichia coli 
hypothetical protein ydfG. - Escherichia coli hypothetical protein yjgl. - Escherichia coli 
hypothetical protein yjgU. - Escherichia coli hypothetical protein yohF. - Bacillus subtilis 
hypothetical protein yoxD. - Bacillus subtilis hypothetical protein ywfD. - Bacillus subtilis 
hypothetical protein ywfH. - Yeast hypothetical protein YIL124w. - Yeast hypothetical 
protein YIR035c. - Yeast hypothetical protein YIR036c. - Yeast hypothetical protein 
YKL055c. - Fission yeast hypothetical protein SpAC23D3.11. One of the best conserved 
regions which includes two perfectly conserved residues, a tyrosine and a lysine has been 
used as a signature pattern for this family of proteins. The tyrosine residue participates in the 
catalytic mechanism. 

Consensus pattern: [LIVSPADNK]-x(12)-Y-[PSTAGNCV]-[STAGNQCIVM]-[STAGC]-K- 
{PC}-[SAGFYR]-[LIVMSTAGD]-x(2)-[LIVMFYW]-x(3)- [LIVMFYWGAPTHQ]- 
[GSACQRHM] [Y is an active site residue] 

[ 1] Joernvall H. ? Persson B., Krook M. 9 Atrian S. ? Gonzalez-Duarte R., Jeffery J., Ghosh D. 
Biochemistry 34:6003-6013(1995). 

[ 2] Villarroya A., Juan E., Egestad B., Joernvall H. Eur. J. Biochem. 180:191-197(1989). 
[ 3] Persson B., Krook M., Joernvall H. Eur. J. Biochem. 200:537-543(1991). 
[ 4] Neidle EX., Hartnett C, Ornston NX., Bairoch A., Rekik M., Harayama S. Eur. J. 
Biochem. 204:113-120(1992). 
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46. (adh_zinc) Zinc-containing alcohol dehydrogenases signatures 

Alcohol dehydrogenase (EC 1.1.1.1 ) (ADH) catalyzes the reversible oxidation of ethanol to 
acetaldehyde with the concomitant reduction of NAD [1]. Currently three, structurally and 
catalytically, different types of alcohol dehydrogenases are known: - Zinc-containing long- 
chain' alcohol dehydrogenases. - Insect-type, or 'short-chain' alcohol dehydrogenases. - Iron- 
containing alcohol dehydrogenases.Zinc-containing ADH's [2,3] are dimeric or tetrameric 
enzymes that bind two atoms of zinc per subunit One of the zinc atom is essential for 
catalytic activity while the other is not. Both zinc atoms are coordinated by either cysteine or 
histidine residues; the catalytic zinc is coordinated by two cysteines and one histidine. Zinc- 
containing ADH's are found in bacteria, mammals, plants, and in fungi. In most species there 
are more than one isozyme (for example, human have at least six isozymes, yeast have three, 
etc.). A number of other zinc-dependent dehydrogenases are closely related to zinc ADH [4], 
these are: - Xylitol dehydrogenase (EC 1.1.1.9 ) (D-xylulose reductase). - Sorbitol 
dehydrogenase (EC 1.1.1.14 ). - Aryl-alcohol dehydrogenase (EC 1.1.1.90) (benzyl alcohol 
dehydrogenase). - Threonine 3 -dehydrogenase (EC 1.1.1.103) . - Cinnamyl-alcohol 
dehydrogenase (EC 1.1.1.195 ) (CAD) [5]. CAD is a plant enzyme involved in the 
biosynthesis of lignin. - Galactitol-1 -phosphate dehydrogenase (EC 1.1.1.251). - 
Pseudomonas putida 5-exo-alcohol dehydrogenase (EC 1.1.1.-) [6]. - Escherichia coli 
starvation sensing protein rspB. - Escherichia coli hypothetical protein yjgB. - Escherichia 
coli hypothetical protein yjgV. - Escherichia coli hypothetical protein yjjN. - Yeast 
hypothetical protein YAL060w (FUN49). - Yeast hypothetical protein YAL061w (FUN50). - 
Yeast hypothetical protein YCR105w. The pattern that has been developed to detect this class 
of enzymes is based on a conserved region that includes a histidine residue which is the 
second ligand of the catalytic zinc atom. This family also includes NADP-dependent quinone 
oxidoreductase (EC 1.6.5.5 \an enzyme found in bacteria (gene qor), in yeast and in 
mammals where, in some species such as rodents, it has been recruited as an eye lens protein 
and is known as zeta-crystallin [7]. The sequence of quinone oxidoreductase is distantly 
related to that other zinc-containing alcohol dehydrogenases and it lacks the zinc-ligand 
residues. The torpedo fish and mammlian synaptic vesicle membrane protein vat-1 is related 
to qor. A specific pattern has been developed for this subfamily. 



Consensus pattern: G-H-E-x(2)-G-x(5)-[GA]-x(2)-[IVSAC] [H is a zinc ligand] 
Consensus pattern: [GSD]-[DEQH]-x(2)-L-x(3)-[SA](2)-G-G-x-G-x(4)-0-x(2)-[KR]- 
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[ 1] Branden C.-L, Joernvall H., Eklund H., Furugren B. (In) The Enzymes (3rd edition) 
11:104-190(1975). 

[ 2] Joernvall H., Persson B., Jeffery J. Eur. J. Biochem. 167:195-201(1987). 
[ 3] Sun H.-W., Plapp B.V. J. MoL Evol. 34:522-535(1992). 

[ 4] Persson B., Hallborn J., Walfridsson M., Hahn-Haegerdal B., Keraenen S., Penttilae M., 
Joernvall H. FEBS Lett. 324:9-14(1993). 

[ 5] Knight M.E., Halpin G, Schuch W. Plant MoL Biol. 19:793-801(1992). 

[ 6] Koga H., Aramaki H., Yamaguchi E., Takeuchi K., Horiuchi T., Gunsalus I.C. J. 

Bacteriol. 166:1089-1095(1986). 

[ 7] Joernvall H., Persson B., Du Bois G., Lavers G.C., Chen J.H., Gonzalez P., Rao P.V., 
Zigler J.S. Jr. FEBS Lett. 322:240-244(1993). 

47. (aldedh) Aldehyde dehydrogenases active sites 

Aldehyde dehydrogenases (EC 1.2.1.3 and EC 1.2.1.5 ) are enzymes which oxidize a wide 
variety of aliphatic and aromatic aldehydes. In mammals at least four different forms of the 
enzyme are known [1]: class-1 (or Aid C) a tetrameric cytosolic enzyme, class-2 (or Aid M) a 
tetrameric mitochondrial enzyme, class-3 (or Aid D) a dimeric cytosolic enzyme, and class 
IV a microsomal enzyme. Aldehyde dehydrogenases have also been sequenced from fungal 
and bacterial species. A number of enzymes are known to be evolutionary related to aldehyde 
dehydrogenases; these enzymes are listed below. - Plants and bacterial betaine-aldehyde 
dehydrogenase (EC 1.2.1.8 ) [2], an enzyme that catalyzes the last step in the biosynthesis of 
betaine. - Plants and bacterial NADP-dependent glyceraldehyde-3-phosphate dehydrogenase 
(EC 1.2.1.9 ). - Escherichia coli succinate-semialdehyde dehydrogenase (NADP+) (EC 
1.2.1.16 ) (gene gabD) [3], which reduces succinate semialdehyde into succinate. - 
Escherichia coli lactaldehyde dehydrogenase (EC 1.2.1.22 ) (gene aid) [4]. - Mammalian 
succinate semialdehyde dehydrogenase (NAD+) (EC 1.2.1.24). - Escherichia coli 
phenylacetaldehyde dehydrogenase (EC 1.2.1.39 ). - Escherichia coli 5-carboxymethyl-2- 
hydroxymuconate semialdehyde dehydrogenase (gene hpcC). - Pseudomonas putida 2- 
hydroxymuconic semialdehyde dehydrogenase [5] (genes dmpC and xylG), an enzyme in the 
meta-cleavage pathway for the degradation of phenols, cresols and catechol. - Bacterial and 
mammalian methylmalonate-semialdehyde dehydrogenase (MMSDH) (EC 1.2.1.27) [6], an 
enzyme involved in the distal pathway of valine catabolism. - Yeast delta-l-pyrroline-5- 
carboxylate dehydrogenase (EC 1.5.1.12 ) [7] (gene PUT2), which converts proline to 
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glutamate. - Bacterial multifunctional putA protein, which contains a delta- 1-pyrroline- 5- 
carboxylate dehydrogenase domain. - 26G, a garden pea protein of unknown function which 
is induced by dehydration of shoots [8]. - Mammalian formyltetrahydrofolate dehydrogenase 
(EC 1.5.1.6 ) [9]. This is a cytosolic enzyme responsible for the NADP-dependent 
decarboxylative reduction of 10-formyltetrahydrofolate into tetrahydrofolate. It is an protein 
of about 900 amino acids which consist of three domains; the C- terminal domain (480 
residues) is structurally and functionally related to aldehyde dehydrogenases. - Yeast 
hypothetical protein YBR006w. - Yeast hypothetical protein YER073w. - Yeast hypothetical 
protein YHR039c. - Caenorhabditis elegans hypothetical protein F01F1.6.A glutamic acid 
and a cysteine residue have been implicated in the catalytic activity of mammalian aldehyde 
dehydrogenase. These residues are conserved in all the enzymes of this family. Two patterns 
have been derived for this family, one for each of the active site residues. 

Consensus pattern: [LIVMFGA]-E-[LIMSTAC]-[GS]-G-[KNLM]-[SADN]-[TAPFV] [E is 
the active site residue]- 

Consensus pattern: [FYLVA]-x(3)-G-[QE]-x-C-[LIVMGSTANC]-[AGCN]-x- 
[GSTADNEKR] [C is the active site residue 

[ 1] Hempel J., Harper K., Lindahl R. Biochemistry 28:1160-1167(1989). 

[ 2] Weretilnyk E.A., Hanson A.D. Proc. Natl. Acad. Sci. U.S.A. 87:2745-2749(1990). 

[ 3] Niegemann E., Schulz A., Bartsch K. Arch. Microbiol. 160:454-460(1993). 

[ 4] Hidalgo E., Chen Y.-M, Lin RC.C. ? Aguilar J. J. Bacteriol. 173:6118-6123(1991). 

[ 5] Nordlund L, Shingler V. Biochim. Biophys. Acta 1049:227-230(1990). 

[ 6] Steele M.I., Lorenz D., Hatter K., Park A., Sokatch J.R. J. Biol. Chem. 267:13585- 

13592(1992). 

[ 7] Krzywicki KA, Brandriss M.C. Mol. Cell. Biol. 4:2837-2842(1984). 
[ 8] Guerrero F.D., Jones J.T., Mullet J.E. Plant Mol. BioL 15:11-26(1990). 
[ 9] Cook R.J., Lloyd R.S., Wagner C. J. BioL Chem. 266:4965-4973(1991). 

48. Aldo/keto reductase family signatures 

The aldo-keto reductase family [1,2] groups together a number of structurally and 
functionally related NADPH-dependent oxidoreductases as well as some other proteins. The 
proteins known to belong to this family are: - Aldehyde reductase (EC 1.1.1.2). - Aldose 
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reductase (EC 1.1.1.21 V - 3 -alpha-hydroxy steroid dehydrogenase (EC 1.1.1.50 ), which 
terminates androgen action by converting 5-alpha-dihydrotestosterone to 3-alpha- 
androstanediol. - Prostaglandin F synthase (EC 1.1.1.188) which catalyzes the reduction of 
prostaglandins H2 and D2 to F2-alpha. - D-sorbitol-6-phosphate dehydrogenase (EC 
1.1.1.200 ) from apple. - Morphine 6-dehydrogenase (EC 1,1.1,218) from Pseudomonas 
putida plasmid pMDH7.2 (gene morA). - Chlordecone reductase (EC 1.1.1.225) which 
reduces the pesticide chlordecone (kepone) to the corresponding alcohol. - 2,5-diketo-D- 
gluconic acid reductase (EC 1.1.1.-) which catalyzes the reduction of 2,5-diketogluconic acid 
to 2-keto-L-gulonic acid, a key intermediate in the production of ascorbic acid. - NAD(P)H- 
dependent xylose reductase (EC 1.1.1.-) from the yeast Pichia stipitis. This enzyme reduces 
xylose into xylit. - Trans- l,2-dihydrobenzene-l,2-diol dehydrogenase (EC 1.3.1-20) . - 3-oxo- 
5-beta-steroid 4-dehydrogenase (EC 1 .3.99.6) which catalyzes the reduction of delta(4)-3- 
oxosteroids. - A soybean reductase, which co-acts with chalcone synthase in the formation of 
4,2',4'-trihydroxychalcone. - Frog eye lens rho crystallin. - Yeast GCY protein, whose 
function is not known. - Leishmania major P110/11E protein. P110/11E is a developmentally 
regulated protein whose abundance is markedly elevated in promastigotes compared with 
amastigotes. Its exact function is not yet known. - Escherichia coli hypothetical protein yafB. 
- Escherichia coli hypothetical protein yghE. - Yeast hypothetical protein YBR149w. - Yeast 
hypothetical protein YHR104w. - Yeast hypothetical protein YJR096w.These proteins have 
all about 300 amino acid residues. Three consensus patterns have been developed that are 
specific to this family of proteins. The first one is located in the N-terminal section of these 
proteins. The second pattern is located in the central section. The third pattern, located in the 
C-terminal, is centered on a lysine residue whose chemical modification, in aldose and 
aldehydereductases, affect the catalytic efficiency. 

Consensus pattern: G-[FY]-R-[HSAL]-[LIVMF]-D-[STAGC]-[AS]-x(5)-E-x(2)-[LIVM]- G - 
Consensus pattern: [LIVMFY]-x(9)-[KREQ]-x-[LIVM]-G-[LIVM]-[SC]-N-[FY]- 
Consensus pattern: [LIVM]-[PAIV]-[KR]-[ST]-x(4)-R-x(2)-[GSTAEQK]-[NSL]-x(2)- 
[LIVMFA] [K is a putative active site residue]- 

[ 1] Bohren K.M., Bullock B., Wermuth B., Gabbay K.H. J. Biol. Chem. 264:9547- 
9551(1989). 

[ 2] Bruce N.C., Willey D.L., Coulson A.F.W., Jeffery J. Biochem. J. 299:805-811(1994). 
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49. Alpha amylase. This family is classified as family 13 of the glycosyl hydrolases. The 
structure is an 8 stranded alpha/beta barrel, interrupted by a -70 a.a. calcium-binding domain 
protruding between beta strand 3 and alpha helix 3, and a carboxyl-terminal Greek key beta- 
barrel domain. 

[1] Larson SB, Greenwood A, Cascio D, Day J, McPherson A, J Mol Biol 1994;235:1560- 
1584. 

50. Aminotransferases class-I pyridoxal-phosphate attachment site 
Aminotransferases share certain mechanistic features with other pyridoxal- phosphate 
dependent enzymes, such as the covalent binding of the pyridoxal- phosphate group to a 
lysine residue. On the basis of sequence similarity, these various enzymes can be grouped 
[1,2] into subfamilies. One of these, called class-I, currently consists of the following 
enzymes: - Aspartate aminotransferase (AAT) (EC 2.6.1.1 ). AAT catalyzes the reversible 
transfer of the amino group from L-aspartate to 2-oxoglutarate to form oxaloacetate and L- 
glutamate. In eukaryotes, there are two AAT isozymes: one is located in the mitochondrial 
matrix, the second is cytoplasmic. In prokaryotes, only one form of AAT is found (gene 
aspC). - Tyrosine aminotransferase (EC 2.6.1.5 ) which catalyzes the first step in tyrosine 
catabolism by reversibly transferring its amino group to 2- oxoglutarate to form 4- 
hydroxyphenylpyruvate and L-glutamate. - Aromatic aminotransferase (EC 2.6.1.57) 
involved in the synthesis of Phe, Tyr, Asp and Leu (gene tyrB). - 1-aminocyclopropane-l- 
carboxylate synthase (EC 4.4.1.14 ) (ACC synthase) from plants. ACC synthase catalyzes the 
first step in ethylene biosynthesis. - Pseudomonas denitrificans cobC, which is involved in 
cobalamin biosynthesis. - Yeast hypothetical protein YJL060w.The sequence around the 
pyridoxal-phosphate attachment site of this class of enzyme is sufficiently conserved to allow 
the creation of a specific pattern. 

Consensus pattern: [GS]-[LIVMFYTAC]-[GSTA]-K-x(2)-[GSALVN]-[LIVMFA]-x- 
[GNAR]- x-R-[LIVMA]-[GA] [K is the pyridoxal-P attachment site] 

[ 1] Bairoch A. Unpublished observations (1992). 
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[ 2] Sung M.H., Tanizawa K., Tanaka H., Kuramitsu S., Kagamiyama H., Hirotsu K., 
Okamoto A., Higuchi T., Soda K. J. Biol. Chem. 266:2567-2572(1991). 

51. Aminotransferases class-II pyridoxal-phosphate attachment site 
Aminotransferases share certain mechanistic features with other pyridoxal- phosphate 
dependent enzymes, such as the covalent binding of the pyridoxal- phosphate group to a 
lysine residue. On the basis of sequence similarity, these various enzymes can be grouped [1] 
into subfamilies. One of these, called class-II, currently consists of the following enzymes: - 
Glycine acetyltransferase (EC 2.3.1.29 ). which catalyzes the addition of acetyl-CoA to 
glycine to form 2-amino-3-oxobutanoate (gene kbl). - 5-aminolevulinic acid synthase (EC 
2.3,1.37 ) (delta- ALA synthase), which catalyzes the first step in heme biosynthesis via the 
Shemin (or C4) pathway, i.e. the addition of succinyl-CoA to glycine to form 5- 
aminolevulinate. - 8-amino-7-oxononanoate synthase (EC 23.1.47) (7-KAP synthetase), a 
bacterial enzyme (gene bioF) which catalyzes an intermediate step in the biosynthesis of 
biotin: the addition of 6-carboxy-hexanoyl-CoA to alanine to form 8-amino-7-oxononanoate. 
- Histidinol-phosphate aminotransferase (EC 2.6.1.9 ). which catalyzes the eighth step in 
histidine biosynthetic pathway: the transfer of an amino group from 3-(imidazol-4-yl)-2- 
oxopropyl phosphate to glutamic acid to form histidinol phosphate and 2-oxoglutarate. - 
Serine palmitoyltransferase (EC 2.3.1.50 ) from yeast (genes LCB1 and LCB2), which 
catalyzes the condensation of palmitoyl-CoA and serine to form 3- ketosphinganine.The 
sequence around the pyridoxal-phosphate attachment site of this class of enzyme is 
sufficiently conserved to allow the creation of a specific pattern 

Consensus pattern: T-[LIVMFYW]-[STAG]-K-[SAG]-[LIVMFYWR]-[SAG]-x(2)-[SAG] 
[K is the pyridoxal-P attachment site]- 

[ 1] Bairoch A. Unpublished observations (1991). 

52. Aminotransferases class-Ill pyridoxal-phosphate attachment site 
Aminotransferases share certain mechanistic features with other pyridoxal- phosphate 
dependent enzymes, such as the covalent binding of the pyridoxal- phosphate group to a 
lysine residue. On the basis of sequence similarity, these various enzymes can be grouped 
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[1,2] into subfamilies. One of these, called class-Ill, currently consists of the following 
enzymes: - Acetylornithine aminotransferase (EC 2,6.1.11 ) which catalyzes the transfer of an 
amino group from acetylornithine to alpha-ketoglutarate, yielding N-acetyl-glutamic-5 -semi- 
aldehyde and glutamic acid. - Ornithine aminotransferase (EC 2.6.1.13 ). which catalyzes the 
5 transfer of an amino group from ornithine to alpha-ketoglutarate, yielding glutamic-5- semi- 
aldehyde and glutamic acid. - Omega-amino acid-pyruvate aminotransferase (EC 2.6.1.18), 
which catalyzes transamination between a variety of omega-amino acids, mono- and 
diamines, and pyruvate. It plays a pivotal role in omega amino acids metabolism. - 4- 
aminobutyrate aminotransferase (EC 2.6.1.19 ^ (GABA transaminase), which catalyzes the 

1 0 transfer of an amino group from GABA to alpha-ketoglutarate, yielding succinate 

semialdehyde and glutamic acid. - DAPA aminotransferase (EC 2.6.1.62) . a bacterial enzyme 
(gene bioA) which catalyzes an intermediate step in the biosynthesis of biotin, the 
transamination of 7-keto-8-aminopelargonic acid (7-KAP) to form 7,8- diaminopelargonic 
acid (DAPA). - 2,2-dialkylglycine decarboxylase (EC 4.1.1.64 1 a Pseudomonas cepacia 

1 5 enzyme (gene dgdA) that catalyzes the decarboxylating amino transfer of 2,2-dialkylglycine 
and pyruvate to dialkyl ketone, alanine and carbon dioxide. - Glutamate-l-semialdehyde 
aminotransferase (EC 5.4.3.8 ) (GSA). GSA is the enzyme involved in the second step of 
porphyrin biosynthesis, via the C5 pathway. It transfers the amino group on carbon 2 of 
glutamate-1- semialdehyde to the neighbouring carbon, to give delta-aminolevulinic acid. - 

20 Bacillus subtilis aminotransferase yhxA. - Bacillus subtilis aminotransferase yodT. - 

Haemophilus influenzae aminotransferase HI0949. - Caenorhabditis elegans aminotransferase 
T01B11.2.The sequence around the pyridoxal-phosphate attachment site of this class 
ofenzyme is sufficiently conserved to allow the creation of a specific pattern. 

2 5 Consensus pattern: [LIVMFYWC](2)-x-D-E-[IVA]-x(2)-G-[LIVMFAGC]-x(0,l)- 
[RSACLI]-x-[GSAD]-x(12,16)-D-[LIVMFC]-[LIVMFYSTA]-x(2)- [GSA]-K-x(3)- 
[GSTADNV]-[GSAC] [K is the pyridoxal-P attachment site]- 

[ 1] Bairoch A. Unpublished observations (1992).[ 2] Yonaha K., Nishie M, Aibara S. J. 
30 Biol. Chem. 267:12506-12510(1992). 
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53. Ank repeat. There's no clear separation between noise and signal on the HMM search 
Ankyrin repeats generally consist of a beta, alpha, alpha, beta order of secondary structures. 
The repeats associate to form a higher order structure. 

[1] A, Holak TA, FEBS Lett 1997;401:127-132. 

[2] Lux SE, John KM, Bennett V, Nature 1990;345:736-739. 

54. Aminotransferases class-IV signature 

Aminotransferases share certain mechanistic features with other pyridoxal-phosphate 
dependent enzymes, such as the covalent binding of the pyridoxal-phosphate group to a 
lysine residue. On the basis of sequence similarity, these various enzymes can be grouped 
[1,2] into subfamilies. One of these, called class-IV, currently consists of the following 
enzymes: 

- Branched-chain amino-acid aminotransferase (EC 2.6.1.42) (transaminase B), a 
bacterial (gene ilvE) and eukaryotic enzyme which catalyzes the reversible 
transfer of an amino group from 4-methyl-2-oxopentanoate to glutamate, to form 
leucine and 2-oxoglutarate. 

- D-alanine aminotransferase (EC 2.6.1.2D . A bacterial enzyme which catalyzes the 
transfer of the amino group from D-alanine (and other D-amino acids) to 2- 
oxoglutarate, to form pyruvate and D-aspartate. 

- 4-amino-4-deoxychorismate (ADC) lyase (gene pabC). A bacterial enzyme that 
converts ADC into 4-aminobenzoate (PABA) and pyruvate. 

The above enzymes are proteins of about 270 to 415 amino-acid residues that share a 
few regions of sequence similarity. Surprisingly, the best-conserved region does not include 
the lysine residue to which the pyridoxal-phosphategroup is known to be attached, in ilvE. 
The region that has been selected as a signature pattern is located some 40 residues at the C- 
terminus side of the PIP-lysine 

Consensus pattern: E-x-[STAGCI]-x(2)-N-[LIVMFAC]-[FY]-x(6,12)-[LIVMF]-x-T- x(6,8)- 
[LIVM]-x-[GS]-[LIVM]-x-[KR]- 

[1] Green J.M., Merkel W.K., Nichols B.P. J. BacterioL 174:5317-5323(1992). 
[2] Bairoch A. Unpublished observations (1992). 
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55. Aminotransferases class- V pyridoxal-phosphate attachment site 
Aminotransferases share certain mechanistic features with other pyridoxal- phosphate 
dependent enzymes, such as the covalent binding of the pyridoxal- phosphate group to a 
lysine residue. On the basis of sequence similarity, these various enzymes can be grouped 
[1,2] into subfamilies. One of these, called class- V, currently consists of the following 
enzymes: - Phosphoserine aminotransferase (EC 2.6.1.52 ). an enzyme which catalyzes the 
reversible interconversion of phosphoserine and 2-oxoglutarate to 3-phosphonooxypyruvate 
and glutamate. It is required both in the major phosphorylated pathway of serine biosynthesis 
and in pyridoxine biosynthesis. The bacterial enzyme (gene serC) is highly similar to a rabbit 
endometrial progesterone-induced protein (EPIP), which is probably a phosphoserine 
aminotransferase [3]. - Serine-glyoxylate aminotransferase (EC 2.6.1.45) (SGAT) (gene 
sgaA) from Methylobacterium extorquens. - Serine-pyruvate aminotransferase (EC 
2.6.1.51 ). This enzyme also acts as an alanine-glyoxylate aminotransferase (EC 2.6.1.44). In 
vertebrates, it is located in the peroxisomes and/or mitochondria. - Isopenicillin N epimerase 
(gene cefD). This enzyme is involved in the biosynthesis of cephalosporin antibiotics and 
catalyzes the reversible isomerization of isopenicillin N and penicillin N. - NifS, a protein of 
the nitrogen fixation operon of some bacteria and cyanobacteria. The exact function of nifS is 
not yet known. A highly similar protein has been found in fungi (gene NFS1 or SPL1). - The 
small subunit of cyanobacterial soluble hydrogenase (EC 1.12.-.-). - Hypothetical protein 
ycbU from Bacillus subtilis. - Hypothetical protein YFL030w from yeast. The sequence 
around the pyridoxal-phosphate attachment site of this class of enzyme is sufficiently 
conserved to allow the creation of a specific pattern. 

Consensus pattern: [LIVFYCHT]-[DGH]-[LIVMFYAC]-[LIVMFYA]-x(2)-[GSTAC]- 
[GSTA]- [HQR]-K-x(4,6)-G-x-[GSAT]-x-[LIVMFYSAC] [K is the pyridoxal-P attachment 
site]- 

[ 1] Ouzounis C, Sander C. FEBS Lett. 322:159-164(1993). 
[ 2] Bairoch A. Unpublished observations (1992). 

[ 3] van der Zel A., Lam H.-M., Winkler M.E. Nucleic Acids Res. 17:8379-8379(1989). 



56. Annexins repeated domain signature 
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Annexins [1 to 6] are a group of calcium-binding proteins that associate reversibly with 
membranes. They bind to phospholipid bilayers in the presence of micromolar free calcium 
concentration. The binding is specific for calcium and for acidic phospholipids. Annexins 
have been claimed to be involved in cytoskeletal interactions, phospholipase inhibition, 
intracellular signalling, anticoagulation, and membrane fusion. Each of these proteins consist 
of an N-terminal domain of variable length followed by four or eight copies of a conserved 
segment of sixty one residues. The repeat (sometimes known as an 'endonexin fold 1 ) consists 
of five alpha-helices that are wound into a right-handed superhelix [7] .The proteins known to 
belong to the annexin family are listed below: - Annexin I (Lipocortin 1) (Calpactin 2) (p35) 
(Chromobindin 9). - Annexin II (Lipocortin 2) (Calpactin 1) (Protein I) (p36) (Chromobindin 
8). - Annexin III (Lipocortin 3) (PAP-III). - Annexin IV (Lipocortin 4) (Endonexin I) (Protein 
II) (Chromobindin 4). - Annexin V (Lipocortin 5) (Endonexin 2) (VAC-alpha) (Anchorin 
CII) (PAP-I). - Annexin VI (Lipocortin 6) (Protein III) (Chromobindin 20) (p68) (p70). This 
is the only known annexin that contains 8 (instead of 4) repeats. - Annexin VII (Synexin). - 
Annexin VIII (Vascular anticoagulant-beta) (VAC-beta). - Annexin IX from Drosophila. - 
Annexin X from Drosophila. - Annexin XI (Calcyclin-associated annexin) (CAP-50). - 
Annexin XII from Hydra vulgaris. - Annexin XIII (Intestine-specific annexin) (ISA).The 
signature pattern for this domain spans positions 9 to 61 of the repeatand includes the only 
perfectly conserved residue (an arginine in position 22)- 

Consensus pattern: [TG]-[STV]-x(8)-[LIVMF]-x(2)-R-x(3)-[DEQNH]-x(7)-[IFY]- x(7)- 
[LIVMF]-x(3)-[LIVMF]-x(ll)-[LIVMFA]-x(2)-[LIVMF]- 

[ 1] Raynal P., Pollard H.B. Biochim. Biophys. Acta 1197:63-93(1994). 

[ 2] Barton G.J., Newman R.H., Freemont P.S., Crumpton M.J. Eur. J. Biochem. 198:749- 

760(1991). 

[ 3] Burgoyne R.D., Geisow M.J. Cell Calcium 10:1-10(1989). 

[ 4] Haigler H.T., Fitch J.M., Jones J.M., Schlaepfer D.D. Trends Biochem. Sci. 14:48- 
50(1989). 

[ 5] Klee C.B. Biochemistry 27:6645-6653(1988). 

[ 6] Smith P.D., Moss S.E. Trends Genet. 10:241-246(1994). 

[ 7] Huber R., Roemisch J., Paques E.-P. EMBO J. 9:3867-3874(1990). 

[ 8] Fiedler K., Simons K. Trends Biochem. Sci. 20:177-178(1995). 
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57. (arf_l) ADP-ribosylation factors family signature 

ADP-ribosylation factors (ASF) [1,2,3,4] are 20 Kd GTP-binding proteins involved in 
protein trafficking. They may modulate vesicle budding and uncoating within the Golgi 
apparatus. ARF's also act as allosteric activators of cholera toxin ADP-ribosyltransferase 
activity. They are evolutionary conserved and present in all eukaryotes. At least six forms of 
ARF are present in mammals and three in budding yeast. The ARF family also includes 
proteins highly related to ARF's but which lack the cholera toxin cofactor activity, they are 
collectively known as ARL's (ARF-like).ARDl is a 64 Kd mammalian protein of unknown 
biological function that contains an ARF domain at its C-terminal extremity. Proteins from 
the ARF family are generally included in the RAS 'superfamily' of small GTP-binding 
proteins [5], but they are only slightly related to the other RAS proteins. They also differ 
from RAS proteins in that they lack cysteine residues at their C-termini and are therefore not 
subject to prenylation. The ARFs are N-terminally myristoylated (the ARLs have not yet 
been shown to be modified in such a fashion). A conserved region in the C-terminal part of 
ARF's and ARL's has been selected as a signature pattern. 

Consensus pattern: [HRQT]-x-[FYWI]-x-[LIVM]-x(4)-A-x(2)-G-x(2)-[LIVM]-x(2)- [GSA]- 
[LIVMF]-x-[WK]-[LIVM]- 

Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 
A' (P-loop) (see < PDQC00017 

[ 1] Boman A.L., Kahn R.A. Trends Biochem. Sci. 20:147-150(1995). 

[ 2] Moss J., Vaughan M. Cell. Signal. 4.367-399(1993). 

[ 3] Moss J., Vaughan M. Prog. Nucleic Acid Res. Mol. Biol. 45:47-65(1993). 

[ 4] Amor J.C., Harrison D.H., Kahn R.A., Ringe D. Nature 372:704-708(1994). 

[ 5] Valencia A., Chardin P., Wittinghofer A., Sander C. Biochemistry 30:4637-4648(1991). 

(arf_2) ATP/GTP-binding site motif A (P-loop) 

From sequence comparisons and crystallographic data analysis it has been shown 
[1,2,3,4,5,6] that an appreciable proportion of proteins that bind ATP or GTP share a number 
of more or less conserved sequence motifs. The best conserved of these motifs is a glycine- 
rich region, which typically forms a flexible loop between a beta-strand and an alpha-helix. 
This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is 
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generally referred to as the W consensus sequence [1] or the T-loop f [5]. There are numerous 
ATP- or GTP-binding proteins in which the P-loop is found. A number of protein families for 
which the relevance of the presence of such motif has been noted are listed below: - ATP 
synthase alpha and beta subunits (see < PDOC0Q137 >). - Myosin heavy chains. - Kinesin 
5 heavy chains and kinesin-Iike proteins (see < PDOC00343 >). - Dynamins and dynamin-like 
proteins (see < PDOC0Q362 >). - Guanylate kinase (see < FDOC00670 >). - Thymidine kinase 
(see < PDOC00524 >). - Thymidylate kinase (see < PDOC01034 >). - Shikimate kinase (see 
< PDOC00868 >). - Nitrogenase iron protein family (nifH/frxC) (see <PDQC00580>). - ATP- 
binding proteins involved in 'active transport* (ABC transporters) [7] (see < PDOC00185 >). - 

1 0 DNA and RNA helicases [8,9,10], - GTP-binding elongation factors (EF-Tu ? EF-lalpha, EF- 
G, EF-2, etc.). - Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, Yptl, SEC4, etc.). 
- Nuclear protein ran (see < PDOC00859 >). - ADP-ribosylation factors family (see 
< PDQC00781 >), - Bacterial dnaA protein (see < PDOC0Q771 >). - Bacterial recA protein (see 
< PDOC0Q131 >). - Bacterial recF protein (see <PDOC00539>). - Guanine nucleotide-binding 

1 5 proteins alpha subunits (Gi, Gs, Gt, GO, etc.). - DNA mismatch repair proteins mutS family 

(See < PDOC00388 >). - Bacterial type II secretion system protein E (see < PDOC00567 >).Not 
all ATP- or GTP-binding proteins are picked-up by this motif. A number of proteins escape 
detection because the structure of their ATP-binding site is completely different from that of 
the P-loop. Examples of such proteins are the E1-E2 ATPases or the glycolytic kinases. In 

2 0 other ATP- or GTP-binding proteins the flexible loop exists in a slightly different form; this 
is the case for tubulins or protein kinases. A special mention must be reserved for adenylate 
kinase, in which there is a single deviation from the P-loop pattern: in the last position Gly is 
found instead of Ser or Thr. 

2 5 Consensus pattern: [AG]-x(4)-G-K-[ST]- 

[ 1] Walker J.E., Saraste M., Runswick MJ. ? Gay N.J. EMBO J. 1:945-951(1982). 
[ 2] Moller W., Amons R. FEBS Lett. 186:1-7(1985). 

[ 3] Fry D.C., Kuby SA, Mildvan A.S. Proc. Natl. Acad. ScL U.S.A. 83:907-911(1986). 
30 [4] Dever T.E., Glynias M.J., Merrick W.C. Proc. NatL Acad. Sci. ILS.A. 84:1814- 
1818(1987). 

[ 5] Saraste M, Sibbald P.R., Wittinghofer A. Trends Biochem. Sci. 15:430-434(1990). 
[ 6] Koonin E.V. J. MoL Biol. 229:1165-1174(1993). 
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[ 7] Higgins C.F., Hyde S.C, Mimmack M.M, Gileadi U., Gill D.R., Gallagher M.P. J. 
Bioenerg. Biomembr. 22:571-592(1990). 

[ 8] Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 
[ 9] Linder P., Lasko P., Ashburner M. ? Leroy P. ? Nielsen P.J., Nishi K. ? Schnier J., Slonimski 
5 P,P. Nature 337:121-122(1989). 

[10] Gorbalenya A.E., Koonin E.V., Donchenko A.P., Blinov V.M. Nucleic Acids Res. 
17:4713-4730(1989). 

10 58. Arginase family signatures 

The following enzymes have been shown [1] to be evolutionary related: - Arginase (EC 
3.5.3.1) . a ubiquitous enzyme which catalyzes the degradation of arginine to ornithine and 
urea [2]. - Agmatinase (EC 3.5.3.11) (agmatine ureohydrolase), a prokaryotic enzyme (gene 
speB) that catalyzes the hydrolysis of agmatine into putrescine and urea. - 

15 Formiminoglutamase (EC 3.5.3.8 ) (formiminoglutamate hydrolase), a prokaryotic enzyme 
(gene hutG) that hydrolyzes N-formimino-glutamate into glutamate and formamide. - 
Hypothetical proteins from methanogenic archaebacteria. These enzymes are proteins of 
about 300 amino-acid residues. Three conserved regions that contain charged residues which 
are involved in the binding of the two manganese ions [3] can be used as signature patterns.- 

20 

Consensus pattern: [LIVMF]-G-G-x-H-x-[LIVMT]-[STAV]-x-[PAG]-x(3)-[GSTA] [H binds 
manganese] - 

Consensus pattern: [LIVM](2)-x-[LIVMFY]-D-[AS]-H-x-D [The two D's and the H bind 
manganese] - 

2 5 Consensus pattern: [ST]-[LIVMFY]-D-[LIVM]-D-x(3)-[PAQ]-x(3)-P-[GSA]-x(7)-G [The 
two D's bind manganese] 

[ 1] Ouzounis C, Kyrpides N.C. J. Mol. Evol. 39:101-104(1994). 

[ 2] Jenkinson CP., Grody W.W., Cederbaum SIX Comp. Biochem. Physiol. 114B:107- 
30 132(196). 

[ 3] Kanyo Z.F., Scolnick L.R., Ash D.E. ? Christianson D.W. Nature 383:554-557(1996). 



59. (asp) Eukaryotic and viral aspartyl proteases active site 
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Aspartyl proteases, also known as acid proteases, (EC 3.4.23.-) are a widely distributed 
family of proteolytic enzymes [1,2,3] known to exist invertebrates, fungi, plants, retroviruses 
and some plant viruses. Aspartate proteases of eukaryotes are monomeric enzymes which 
consist of two domains. Each domain contains an active site centered on a catalytic aspartyl 
5 residue.The two domains most probably evolved from the duplication of an ancestral gene 
encoding a primordial domain. Currently known eukaryotic aspartyl proteases are: - 
Vertebrate gastric pepsins A and C (also known as gastricsin). - Vertebrate chymosin 
(rennin), involved in digestion and used for making cheese. - Vertebrate lysosomal cathepsins 
D (EC 3.4.23.5 ) and E (EC 3.4.23.34 ). - Mammalian renin (EC 3.4.23.15 ) whose function is 

10 to generate angiotensin I from angiotensinogen in the plasma. - Fungal proteases such as 
aspergillopepsin A (EC 3.4.23.18 ), candidapepsin (EC 3.4.23.24 ). mucoropepsin (EC 
3.4.23.23 ) (mucor rennin), endothiapepsin (EC 3.4.23.22 ). polyporopepsin (EC 3.4.23.29) . 
and rhizopuspepsin (EC 3.4.23.21) . - Yeast saccharopepsin (EC 3.4.23.25 ) (proteinase A) 
(gene PEP4). PEP4 is implicated in posttranslational regulation of vacuolar hydrolases. - 

1 5 Yeast barrier pepsin (EC 3.4.23.35 ) (gene BAR1); a protease that cleaves alpha-factor and 

thus acts as an antagonist of the mating pheromone. - Fission yeast sxal which is involved in 
degrading or processing the mating pheromones. Most retroviruses and some plant viruses, 
such as badnaviruses, encode for anaspartyl protease which is an homodimer of a chain of 
about 95 to 125 amino acids. In most retroviruses, the protease is encoded as a segment of 

2 0 apolyprotein which is cleaved during the maturation process of the virus. It is generally part 
of the pol polyprotein and, more rarely, of the gagpolyprotein. Conservation of the sequence 
around the two aspartates of eukaryotic aspartyl proteases and around the single active site of 
the viral proteases allows us to develop a single signature pattern for both groups of protease. 

2 5 Consensus pattern: [LIVMFGAC]-[LIVMTADN]-[LIVFSA]-D~[ST]~G-[STAV]- 
[STAPDENQ]- x-[LIVMFSTNC]-x-[LIVMFGTA] [D is the active site residue] 
Note: these proteins belong to families Al and A2 in the classification of peptidases [4.E1 

[ 1] Foltmann B. Essays Biochem. 17:52-84(1981). 
30 [2] Davies D.R. Annu. Rev. Biophys. Chem. 19:189-215(1990). 

[ 3] Rao J.K.M., Erickson J.W., Wlodawer A. Biochemistry 30:4663-4671(1991). 
[ 4] Rawlings N.D., Barrett A.J. Meth. EnzymoL 248:105-120(1995). 
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60. (BIRA) Bio tin repressor 

[1] Wilson KP, Shewchuk LM, Brennan RG, Otsuka AJ, Matthews BW; Proc Natl Acad Sci 
USA 1992;89:9257-9261. 

61. BTB/POZ domain 

The BTB (for BR-C, ttk and bab) [1] or POZ (for Pox virus and Zinc finger)[2] domain is 
present near the N-terminus of a fraction of zinc finger 
( zf-C2H2 ) proteins and in proteins that contain the Kelch motif 
such as Kelch and a family of pox virus proteins. The BTB/POZ domain mediates 
homomeric dimerisation and in some instances heteromeric dimerisation [2] .The structure of 
the dimerised PLZF BTB/POZ domain has been solved and consists of a tightly intertwined 
homodimer. The central scaffolding of the protein is made up of a cluster of alpha-helices 
flanked by short beta-sheets at both the top and bottom of the molecule [3]. POZ domains 
from several zinc finger proteins have been shown to mediate transcriptional repression and 
to interact with components of histone deacetylase co-repressor complexes including N-CoR 
and SMRT [4,5,6]. The POZ or BTB domain is also known as BR-C/Ttk or ZiN 

[1] Zollman S, Godt D, Prive GG, Couderc JL, Laski FA; Proc Natl Acad Sci U S A 
1994;91:10717-10721. 

[2]Bardwell VJ, Treisman R; Genes Dev 1994;8:1664-1677. 

[3] Ahmad KF, Engel CK, Prive GG; Proc Natl Acad Sci U S A 1998;95:12123-12128. 

[4] Deweindt C, Albagli O, Bernardin F, Dhordain P, Quief S, 

Lantoine D, Kerckaert JP, Leprince D; Cell Growth Differ 1995;6:1495-1503. 

[5] Huynh KD, Bardwell VJ; Oncogene 1998;17:2473-2484. 

[6] Wong CW, Privalsky ML; J Biol Chem 1998;273:27695-27702. 

62. (Bac GSPproteins) Bacterial type II secretion system protein D signature 
A number of bacterial proteins, some of which are involved in a general secretion pathway 
(GSP) for the export of proteins (also called the type II pathway) [1 to 5], have been found to 
be evolutionary related. These proteins are listed below: - The 'D' protein from the GSP 
operon of: Aeromonas (gene exeD); Erwinia (gene outD); Escherichia coli (gene yheF), 
Klebsiella pneumoniae (gene pulD); Pseudomonas aeruginosa (gene xcpQ); Vibrio cholerae 
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(gene epsD) and Xanthomonas campestris (gene xpsD). - comE from Haemophilus 
influenzae, involved in competence (DNA uptake). - pilO from Pseudomonas aeruginosa, 
which is essential for the formation of the pili. - hofQ (hopQ) from Escherichia coli. - hrpH 
from Pseudomonas syringae, which is involved in the secretion of a proteinaceous elicitor of 
the hypersensitivity response in plants. - hrpAl from Xanthomonas campestris pv. 
vesicatoria, which is also involved in the hypersensitivity response. - mxiD from Shigella 
flexneri which is involved in the secretion of the Ipa invasins which are necessary for 
penetration of intestinal epithelial cells. - omc from Neisseria gonorrhoeae. - yssC from 
Yersinia enterocolitica virulence plasmid pYV, which seems to be required for the export of 
the Yop virulence proteins. - The gpIV protein from filamentous phages such as f 1, ike, or 
ml3. GpIV is said to be involved in phage assembly and morphogenesis. These proteins all 
seem to start with a signal sequence and are thought to be integral proteins in the outer 
membrane. As a signature pattern a conserved region in the C-terminal section of these 
proteins has been selected 

Consensus pattern: [GR]-[DEQKG]-[STVM]-[LIVMA](3)-[GA]-G-[LIVMFY]-x(l 1)- 
[LIVM]-P-[LIVMFYWGS]-[LIVMF]-[GSAE]-x-[LIVM]-P- [LIVMFYW](2)-x(2>[LV]-F 

[ 1] Salmond G.P.C., Reeves PJ. Trends Biochem. Sci. 18:7-12(1993). 

[ 2] Reeves P J., Whitcombe D. ? Wharam S., Gibson M., Allison G., Bunce N., Barallon R., 

Douglas P., Mulholland V., Stevens S., Walker S., Salmond G.P.C. Mol. Microbiol. 8:443- 

456(1993). 

[ 3] Martin P.R., Hobbs M., Free P.D. ? Jeske Y., Mattick J.S. Mol. Microbiol. 9:857- 
868(1993). 

[ 4] Hobbs M. ? Mattick J.S. Mol. Microbiol. 10:233-243(1993). 
[ 5] Genin S„ Boucher C.A. Mol. Gen. Genet. 243:112-118(1994). 

63. (Bac globin) Protozoan/cyanobacterial globins signature 

Globins are heme-containing proteins involved in binding and/or transporting oxygen [1]. 
Almost all globins belong to a large family (see < PDOC00793 >\ the only exceptions are the 
following proteins which form a family of their own[2,3]: - Monomeric hemoglobins from 
the protozoan Paramecium caudatum, Tetrahymena pyriformis and Tetrahymena 
thermophila. - Cyanoglobin from the cyanobacteria Nostoc commune. - Globins LI637 and 
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LI410 from the chloroplast of the alga Chlamydomonas eugametos. - Mycobacterium 
tuberculosis hypothetical protein MtCY48.23.These proteins contain a conserved histidine 
which could be involved in heme-binding. As a signature pattern, a conserved region that 
ends with this residue was used 

Consensus pattern: F-[LF]-x(5)-G-[PA]-x(4)-G-[KRA]-x-[LIVM]-x(3)-H- 

[ 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New- 
York (1988). 

[ 2] Takagi T. Curr. Opin. Struct. Biol. 3:413-418(1993). 

[ 3] Couture M., Chamberland FL, St-Pierre B., Lafontaine J., Guertin M.; Mol. Gen. Genet. 
243:185-197(1994). 

64. Band 7 protein family signature 

Mammalian band 7 protein [1] (also known as 7.2B or stomatin) is an integral membrane 
phosphoprotein of red blood cells thought to regulate cation conductance by interacting with 
other proteins of the junctional complex of the membrane skeleton. Structurally, band 7 is 
evolutionary related to the following proteins: - Caenorhabditis elegans protein mec-2 [2]. 
Mec-2 positively regulates the activity of the putative mechanosensory transduction channel. 
It may links the mechanosensory channel and the microtubule cytoskeleton of the touch 
receptor neurons. - Caenorhabditis elegans proteins sto-1 to sto-4. - Caenorhabditis elegans 
protein unc-1. - Escherichia coli hypothetical protein ybbK. - Mycobacterium tuberculosis 
hypothetical protein MtCY277.09. - Synechocystis strain PCC 6803 hypothetical protein 
slrll28. - Methanococcus jannaschii hypothetical protein MJ0827.Structurally all these 
proteins consist of a short N-terminal domain which is followed by a transmembrane region 
and a variable size (from 170 to 350residues) C-terminal domain .As a signature pattern, a 
conserved region located about HOresidues after the transmembrane domain was selected 

Consensus pattern: R-x(2)-[LIV]-[SAN]-x(6)-[LIV]-D-x(2)-T-x(2)-W-G-[LIV]- [KRH]- 
[LIV]-x-[KR]-[LIV]-E-[LIV]-[KR]- 

[ 1] Gallagher P.G., Forget B.G. J. Biol. Chem. 7.70:26358-26363(1995 V 
[ 2] Huang M., Gu G., Ferguson E.L., Chalfie M. Nature 378:292-295(1995). 
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65. Barwin domain signatures 

Barwin [1] is a barley seed protein of 125 residues that binds weakly a chitinanalog. It 
5 contains six cysteines involved in disulfide bonds, as shown in the following schematic 
representation. 

_i_ _i_ | ***** j **** 

xxxxxxxxxxxxxxxCxxxxxxxxxxCxxxxCxCxxxxxxxxCxxxxxxxxxxxxxxxxxxCx I 1 1 1 + 

+ + +'C: conserved cysteine involved in a disulfide bond/*': 

1 0 position of the patterns. Barwin is closely related to the following proteins: - Hevein, a 

wound-induced protein found in the latex of rubber trees. - HEL, an Arabidopsis thaliana 
hevein-like protein [2]. - Winl and win2, two wound-induced proteins from potato. - 
Pathogenesis-related protein 4 from tobacco. Hevein and the winl/2 proteins consist of an N- 
terminal chitin-binding domain followed by a barwin-like C-terminal domain. Barwin and its 

1 5 related proteins could be involved in a defense mechanism in plants. As signature patterns, 
two highly conserved regions that contain some of the cysteines were selected 

Consensus pattern: C-G-[KR]-C-L-x-V-x-N [The two Cs are involved in disulfide bonds]- 
Consensus pattern: V-[DN]-Y-[EQ]-F-V-[DN]-C [C is involved in a disulfide bond]- 

20 

[ 1] Svensson B. ? Svendsen L, Hoejrup P., Roepstorff P., Ludvigsen S., Poulsen F.M. 
Biochemistry 31:8767-8770(1992). 

[ 2] Potter S, Uknes S., Lawton K., Winter A.M., Chandler D., Dimaio J., Novitzky R., Ward 
E. ? Ryals J. Mol. Plant Microbe Interact. 6:680-685(1993). 

25 

66. (Bowman-Birk leg) Bowman-Birk serine protease inhibitors family signature 
PROSITE cross-reference(s). The Bowman-Birk inhibitor family [1] is one of the numerous 
families of serine proteinase inhibitors. As it can be seen in the schematic representation, they 
30 have a duplicated structure and generally possess two distinct inhibitory sites: 



+ + 

| + + + + + 4- | 

I I I I I I I I 
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xxCCxxCxxCxx#xxCxxCxxxxCxxxCxxxCxxxxCxx#xxCxxCxxCxxCxx 

j | jif:*******j**** j | 

II I I I I 

+ „| + + + | 

+ + 

< 70 residues > 

'C ! : conserved cysteine involved in a disulfide bond. 

active site residue. 
'*': position of the pattern. 

These inhibitors are found in the seeds of all leguminous plants as well as in 
cereal grains. In cereals they exist in two forms, one of which is a 
duplication of the basic structure shown above [2]. The pattern that was developed 
to pick up sequences belonging to this family of inhibitors is in the central 
part of the domain and includes four cysteines. 

Consensus pattern C-x(5,6)-[DENQKRHSTA]-C-[PASTDH]-[PASTDK]-[ASTDV]-C- 
[NDKS] - [DEKRHSTA] - C [The four Cs are involved in disulfide bonds] Note this pattern 
can be found twice in some duplicated cereal inhibitors. 

[ 1] Laskowski M ., Kato I. Annu. Rev. Biochem. 49:593-626(1980). 

[ 2] Tashiro M., Hashino K., Shiozaki M. ? Ibuki F., Maki Z. J. Biochem. 102:297-306(1987). 

67. Pathogenesis-related protein Bet v I family signature 

A number of plant proteins, which all seem to be involved in pathogen defense 
response, are structurally related [1,2,3], These proteins are: 

- Bet v I, the major pollen allergen from white birch. Bet v I is the main cause of 
type I allergic reactions in Europe, North America and USSR. 

- Aln g I, the major pollen allergen from alder. 

- Api G I, the major allergen from celery. 

- Car b I, the major pollen allergen from hornbeam. 
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- Cor a I, the major pollen allergen from hazel. 

- Mai d I, the major pollen allergen from apple, 

- Asparagus wound-induced protein AoPRl. 

- Kidney bean pathogenesis-related proteins 1 and 2. 

- Parsley pathogenesis-related proteins PR1-1 and PR1-3. 

- Pea disease resistance response proteins pI49 ? pI176 and DRRG49-C. 

- Pea abscisic acid-responsive proteins ABR17 and ABR18. 

- Potato pathogenesis-related proteins STH-2 and STH-21. 
Soybean stress-induced protein SAM22. 

These proteins are thought to be intracellular^ located. They contain from 155 to 160 
amino acid residues. As a signature pattern, a conserved region located in the third quarter of 
these proteins has been selected 

Consensus pattern: G-x(2)-[LIVMF]-x(4)-E-x(2)-[CSTAEN]-x(8 ? 9)-[GND]-G-[GS]- [CS]- 
x(2)-K-x(4)-[FY]- 

[1] Breiteneder H., Pettenburger K. ? Bito A. ? Valenta R., Kraft D., Rumpold H. ? Schemer O., 
Breitenbach M. EMBO J. 8:1935-1938(1989). 

[2] Crowell D., John M.E., Russell D. ? Amasino R.M. Plant Mol. Biol. 18:459-466(1992). 
[3] Warner S.A.J., Scott R. ? Draper J. Plant MoL Biol. 19:555-561(1992). 

68. bZIP transcription factors basic domain signature 

The bZIP superfamily [1,2,] of eukaryotic DNA-binding transcription factors groups together 
proteins that contain a basic region mediating sequence-specific DNA-binding followed by a 
leucine zipper required for dimerization. This family is quite large, therefore only a parital list 
of some representative members appears here. - Transcription factor AP-1, which binds 
selectively to enhancer elements in the cis control regions of SV40 and metallothionein IIA. 
AP-1, also known as c-jun, is the cellular homolog of the avian sarcoma virus 17 (ASV17) 
oncogene v-jun. - Jun-B and jun-D, probable transcription factors which are highly similar to 
jun/AP-1. - The fos protein, a proto-oncogene that forms a non-covalent dimer with c-jun. - 
The fos-related proteins fra-1, and fos B. - Mammalian cAMP response element (CRE) 
binding proteins CREB, CREM, ATF-1, ATF-3, ATF-4, ATF-5, ATF-6 and LRF-1. - Maize 
Opaque 2, a trans-acting transcriptional activator involved in the regulation of the production 
of zein proteins during endosperm. - Arabidopsis G-box binding factors GBF1 to GBF4, 
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Parsley CPRF-1 to CPRF-3, Tobacco TAF-1 and wheat EMBP-1. All these proteins bind the 
G-box promoter elements of many plant genes. - Drosophila protein Giant, which represses 
the expression of both the kruppel and knirps segmentation gap genes. - Drosophila Box B 
binding factor 2 (BBF-2), a transcriptional activator that binds to fat body-specific enhancers 
of alcohol dehydrogenase and yolk protein genes. - Drosophila segmentation protein 
cap'ntollar (gene cnc), which is involved in head morphogenesis. - Caenorhabditis elegans 
skn-1, a developmental protein involved in the fate of ventral blastomeres in the early 
embryo. - Yeast GCN4 transcription factor, a component of the general control system that 
regulates the expression of amino acid-synthesizing enzymes in response to amino acid 
starvation, and the related Neurospora crassa cpc-1 protein. - Neurospora crassa cys-3 which 
turns on the expression of structural genes which encode sulfur-catabolic enzymes. - Yeast 
MET28, a transcriptional activator of sulfur amino acids metabolism. - Yeast PDR4 (or 
YAP1), a transcriptional activator of the genes for some oxygen detoxification enzymes. - 
Epstein-Barr virus trans-activator protein BZLF1.- 

Consensus pattern: [KR]-x(l,3)-[RKSAQ]-N-x(2)-[SAQ](2)-x-[RKTAENQ]-x-R-x-[RK]- 

[ 1] Hurst H.C. Protein Prof. 2:105-168(1995).[ 2] Ellenberger T. Curr. Opin. Struct. Biol. 
4:12-21(1994). 

69. Biotin-requiring enzymes attachment site 

Biotin, which plays a catalytic role in some carboxyl transfer reactions, is 
covalently attached, via an amide bond, to a lysine residue in enzymes 
requiring this coenzyme [1,2,3,4]. Such enzymes are: 

- Pyruvate carboxylase (EC 6.4.1.1). 

- Acetyl-CoA carboxylase (EC 6.4.1.2). 

- Propionyl-CoA carboxylase (EC 6.4.1.3). 

- Methylcrotonoyl-CoA carboxylase (EC 6.4.1.4). 

- Geranoyl-CoA carboxylase (EC 6.4.1.5). 

- Urea carboxylase (EC 6.3.4.6). 

- Oxaloacetate decarboxylase (EC 4.1.1.3). 

- Methylmalonyl-CoA decarboxylase (EC 4.1.1.41). 

- Glutaconyl-CoA decarboxylase (EC 4.1.1.70). 
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- Methylmalonyl-CoA carboxyl-transferase (EC 2.13.1) (transcarboxylase). 
Sequence data reveal that the region around the biocytin (biotin-lysine) 
residue is well conserved and can be used as a signature pattern. 

Consensus pattern[GN]-[DEQTR]-x-[LIVM 

[LIVM]-x-[SAV] [K is the biotin attachment site] Note the domain around the biotin-binding 
lysine residue is evolutionary related to that around the lipoyl-binding lysine residue of 2-oxo 
acid dehydrogenase acyltransferases 

[ 1] Knowles J.R. Annu. Rev. Biochem. 58:195-221(1989). 

[ 2] Samols D., Thronton C.G., Murtif VI., Kumar G.K., Haase F.C., Wood H.G. J. Biol. 
Chem. 263:6461-6464(1988). 

[ 3] Goss N.H., Wood H.G. Meth. Enzymol. 107:261-278(1984). 

[ 4] Shenoy B.C., Xie Y., Park V.L., Kumar G.K., Beegen H., Wood H.G., Samols D. J. BioL 
Chem. 267:18407-18412(1992). 

2-oxo acid dehydrogenases acyltransferase component lipoyl binding site 
The 2-oxo acid dehydrogenase multienzyme complexes [1,2] from bacterial and 
eukaryotic sources catalyze the oxidative decarboxylation of 2-oxo acids to 
the corresponding acyl-CoA. The three members of this family of multienzyme 
complexes are: 

- Pyruvate dehydrogenase complex (PDC). 

- 2-oxoglutarate dehydrogenase complex (OGDC). 

- Branched-chain 2-oxo acid dehydrogenase complex (BCOADC). 

These three complexes share a common architecture: they are composed of 
multiple copies of three component enzymes - El, E2 and E3. El is a thiamine 
pyrophosphate-dependent 2-oxo acid dehydrogenase, E2 a dihydrolipamide 
acyltransferase, and E3 an FAD-containing dihydrolipamide dehydrogenase. 
E2 acyltransferases have an essential cofactor, lipoic acid, which is 
covalently bound via a amide linkage to a lysine group. The E2 components of 
OGCD and BCOACD bind a single lipoyl group, while those of PDC bind either one 
(in yeast and in Bacillus), two (in mammals), or three (in Azotobacter and in 
Escherichia coli) lipoyl groups [3]. 

In addition to the E2 components of the three enzymatic complexes described 
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above, a lipoic acid cofactor is also found in the following proteins: 

- H-protein of the glycine cleavage system (GCS) [4], GCS is a multienzyme 
complex of four protein components, which catalyzes the degradation of 
glycine. H protein shuttles the methylamine group of glycine from the P 
protein to the T protein. H-protein from either prokaryotes or eukaryotes 
binds a single lipoic group. 

- Mammalian and yeast pyruvate dehydrogenase complexes differ from that of 
other sources, in that they contain, in small amounts, a protein of unknown 
function - designated protein X or component X. Its sequence is closely 
related to that of E2 subunits and seems to bind a lipoic group [5]. 

- Fast migrating protein (FMP) (gene acoC) from Alcaligenes eutrophus [6]. 
This protein is most probably a dihydrolipamide acyltransferase involved in 
acetoin metabolism. 

A signature pattern was developed which allows the detection of the lipoyl- 
binding site. 

Consensus pattern[GN]-x(2)-[LIVF]-x(5)-[LIVFC]-x(2)-[LIVFA]-x(3)-K-[STAIV]- 
[STAVQDN]-x(2)-[LIVMFS]-x(5)-[GCN]-x-[LIVMFY] [Kis the lipoyl-binding site] Note 
the domain around the lipoyl-binding lysine residue is evolutionary related to that around the 
biotin-binding lysine residue of biotin requiring enzymes 

[ 1] Yeaman S J. Biochem. J. 257:625-632(1989). 

[ 2] Yeaman S J. Trends Biochem. Sci. 11:293-296(1986). 

[ 3] Russel G.C., Guest J.R. Biochim. Biophys. Acta 1076:225-232(1991). 

[ 4] Fujiwara K., Okamura-Ikeda K., Motokawa Y. J. Biol. Chem. 261:8836-8841(1986). 

[ 5] Behal R.H., Browning K.S., Hall T.B., Reed LJ. Proc. Natl. Acad. Sci. U.S.A. 86:8732- 

8736(1989). 

[ 6] Priefert H., Hein S. 9 Krueger N., Zeh K., Schmidt B., Steinbuechel A. J. Bacterid, 
173:4056-4071(1991). 

70. C2 (C2 domain) Number of members: 295 

Some isozymes of protein kinase C (PKC) [1,2] contain a domain, known as C2, of about 
116 amino-acid residues which is located between the two copies of the CI domain (that 
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bind phorbol esters and diacylglycerol) (see <PDOC00379>) and the protein kinase 
catalytic domain (see <PDOC00100>). Regions with significant homology [3,E1] to the 
C2-domain have been found in the following proteins: 

- PKC isoforms alpha, beta and gamma and Drosophila isoforms PKC1 and PKC2. 

- PKC isoforms delta, epsilon and eta, Caenorhabditis elegans kin-13 and yeast PKC1 
have a C2-like domain at the N-terminal extremity [4]. 

- Yeast cAMP dependent protein kinase SCH9 contains a C2-like domain. 

- Mammalian phosphatidylinositol-specific phospholipase C (PI-PLC) (see <PDOC50007>) 
isoforms beta, gamma and delta as well as several non-mammalian PI-PLCs have a C2-like 
domain C-terminal of the catalytic domain. 

- Mammalian and plants phosphatidylinositol-3-kinase have a C2-like domain in the central 
region of the 110 Kd catalytic subunit. 

- Yeast phosphatidylserine-decarboxylase 2 (gene PSD2) contains a C2 domain in its central 
region. 

- Cytosolic phospholipase D from plants and cytosolic phospholipase A2 have a C2-like 
domain at their N-terminus. 

- Synaptotagmins (p65). This is a family of related synaptic vesicle proteins that bind acidic 
phospholipids and that may have a regulatory role in the membrane interactions during 
trafficking of synaptic vesicles at the active zone of the synapse. All isoforms of 
synaptotagmins have two copies of the C2 domain in their C-terminal region. 

- Rabphilin-3A, a synaptic protein contains two C2 domains. 

_ Caenorhabditis elegans protein unc-13 whose function is not known. Unc-13 has a C2 
domain in its central part and a C2-like domain at the C-terminus. 

- rasGAP and the breakpoint cluster protein bcr have a C2-domain C-terminal of a PH- 
domain. 

- Yeast protein BUD2 (or CLA2) has a C2-domain in the central region. 

- Yeast protein RSP5 and human protein NEDD-4, both proteins also contain WW domains 
(see <PDOC50020>). 

-Perforin (see <PDOC00251>) has a C2 domain at the C-terminus. It is the only 
extracellular protein known to contain a C2 domain. 

- Yeast hypothetical protein YML072C has a C2 domain. 

- Yeast hypothetical protein YNL087W has three C2 domains. 

- Caenorhabditis elegans hypothetical protein F37A4.7 has two C2 domains. 
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The C2 domain is thought to be involved in calcium-dependent phospholipid binding [5]. 
Since domains related to the C2 domain are also found in proteins that do not bind calcium, 
other putative functions for the C2 domain like e.g. binding to inositol- 1,3 ,4,5 -tetraphosphate 
have been suggested [6]. Recently, the 3D structure of the first C2 domain of 
synaptotagmin has been reported [7], the domain forms an eight-stranded beta sandwich. The 
signature pattern that has been developed for the C2 domain is located in a conserved part of 
that domain, the connecting loop between beta strands 2 and 3. A profile has been 
developed for the C2 domain that covers the total domain. 

-Consensus pattern: [ACG]-x(2)-L-x(2,3)-D-x(l,2)-[NGSTLIF]-[GTMR]-x-[STAP]-D-[PA]- 
[FY] 

-Note: this documentation entry is linked to both a signature pattern and a profile. As the 
profile is much more sensitive than the pattern, you should use it if you have access to the 
necessary software tools to do so. 

[l]Medline: 96367095 Extending the C2 domain family: C2s in PKCs delta, epsilon, eta and 
theta, phospholipases, GAPs and perforin. Ponting CP, Parker PJ; Protein Sci 1996;5:162- 
166. 

[ 1] Azzi A., Boscoboinik D., Hensey C. Eur. J. Biochem. 208:547-557(1992). 
[ 2] Stabel S. Semin. Cancer Biol. 5:277-284(1994). 

[ 3] Brose N., Hofmann K.O., Hata Y., Suedhof T.C. J. Biol. Chem. 270:25273-25280(1995). 

[ 4] Sossin W.S., Schwartz J.H. Trends Biochem. Sci. 18:207-208(1993). 

[ 5] Davletov B.A., Suedhof T.C. J. Biol. Chem. 268:26386-26390(1993). 

[ 6] Fukuda M., Aruga J., Niinobe M., Aimoto S., Mikoshiba K. J. Biol. Chem. 269:29206- 

29211(1994). 

[ 6] Sutton R.B., Davletov B.A., Berghuis A.M., Suedhof T.C, Sprang S.R. Cell 80:929- 
938(1995). 

71. CAP (CAP protein) Number of members: 11 

In budding and fission yeasts the CAP protein is a bifunctional protein whose N-terminal 
domain binds to adenylyl cyclase, thereby enabling that enzyme to be activated by upstream 
regulatory signals, such as Ras. The function of the C-terminal domain is less clear, but it is 
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required for normal cellular morphology and growth control [1]. CAP is conserved in 
higher eukaryotic organisms where its function is not yet clear [2]. 

Structurally, CAP is a protein of 474 to 551 residues which consist of two domains separated 
by a proline-rich hinge. Two signature patterns, one corresponding to a conserved region in 
the N-terminal extremity and the other to a C-terminal region have been developed. 

-Consensus pattern: [LIVM](2)-x-R-L-[DE]-x(4)-R-L-E 

-Consensus pattern: D-[LIVMFY]-x-E-x-[PA]-x-P-E-Q-[LIVMFY]-K 

[ 1] Kawamukai M., Gerst J., Field J., Riggs M., Rodgers L., Wigler M., Young D. Mol. Biol. 
Cell 3:167-180(1992). 

[ 2] Yu G., Swiston J., Young D. J. Cell Sci. 107:1671-1678(1994). 
72. CAP_GLY (CAP-Gly domain) 

CAP stands for cytoskeleton-associated proteins. Swiss:P39937 may be a member but has not 
been included. It has a weak match to the family between residues 22-67. Number of 
members: 24 

[l]Medline: 93242656. Sequence homologies between four cytoskeleton-associated proteins. 
Riehemann K, Sorg C; Trends Biochem Sci 1993;18:82-83. 

It has been shown [1] that some cytoskeleton-associated proteins (CAP) share the presence 
of a conserved, glycine-rich domain of about 42 residues, called here CAP-Gly. Proteins 
known to contain this domain are listed below. 

- Restin (also known as cytoplasmic linker protein-170 or CLIP-170), a 160 Kd protein 
associated with intermediate filaments and that links endocy tic vesicles to microtubules. 
Restin contains two copies of the CAP-Gly domain. 

- Vertebrate dynactin (150 Kd dynein-associated polypeptide; DAP) and Drosophila 
glued, a major component of activator I, a 20S polypeptide complex that stimulates 
dynein-mediated vesicle transport. 

- Yeast protein BIK1 which seems to be required for the formation or stabilization of 
microtubules during mitosis and for spindle pole body fusion during conjugation. 

- Yeast protein NIP100 (NIP80). 
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- Human protein CKAP1/TFCB, Schizosaccharomyces pombe protein alpll and 
Caenorhabditis elegans hypothetical protein F53F4.3. These proteins contain a N-terminal 
ubiquitin domain (see <PDOC00271>) and a C-terminal CAP-Gly domain. 

- Caenorhabditis elegans hypothetical protein M01A8.2. 

- Yeast hypothetical protein YNL148c. 

Structurally, these proteins are made of three distinct parts: an N-terminal section that is 
most probably globular and contains the CAP-Gly domain, a large central region 
predicted to be in an alpha-helical coiled-coil conformation and, finally, a short C- 
terminal globular domain. The signature for the CAP-Gly domain corresponds to the first 32 
residues of the domain and includes five of the six conserved glycines. 

-Consensus pattern: G-x(8,10)-[FYW]-x-G-[LIVM]-x-[LIVMFY]-x(4)-G-K-[NH]-x-G- 
[STAR]-x(2>G-x(2)-[LY]-F 

[ 1] Riehemann K., Sorg C. Trends Biochem. Sci. 18:82-83(1993). 
73. (CBD 1) 

Cellulose-binding domain, fungal type 

The microbial degradation of cellulose and xylans requires several types of enzymes such as 
endoglucanases (EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91) (exoglucanases), or xylanases 
(EC 3.2.1.8) [1], 

Structurally, cellulases and xylanases generally consist of a catalytic domain joined to a 
cellulose-binding domain (CBD) by a short linker sequence rich in proline and/or hydroxy- 
amino acids. 

The CBD of a number of fungal cellulases has been shown to consist of 36 amino acid 
residues. Enzymes known to contain such a domain are: 

- Endoglucanase I (gene egll) from Trichoderma reeseL 

- Endoglucanase II (gene egl2) from Trichoderma reesei. 

- Endoglucanase V (gene egl5) from Trichoderma reesei. 

- Exocellobiohydrolase I (gene CBHI) from Humicola grisea, Neurospora crassa, 
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Phanerochaete chrysosporium, Trichoderma reesei, and Trichoderma viride. 

- Exocellobiohydrolase II (gene CBHII) from Trichoderma reesei. 

- Exocellobiohydrolase 3 (gene ceB) from Agaricus bisporus 

- Endoglucanases B ? C2, F and K from Fusarium oxysporum. 

The CBD domain is found either at the N-terminal (Cbh-II or egl2) or at the C-terminal 
extremity (Cbh-I, egll or egl5) of these enzymes. As it is shown in the following schematic 
representation, there are four conserved cysteines in this type of CBD domain, all involved in 
disulfide bonds. 



+ + 

I + __.._| + 

I I I I 

xxxxxxxCxxxxxxxxxxCxxxxxCxxxxxxxxxCx 



! C: conserved cysteine involved in a disulfide bond. 
'* ! : position of the pattern. 



Such a domain has also been found in a putative polysaccharide binding protein from the red 
alga, Porphyra purpurea [2]. Structurally, this protein consists of four tandem repeats of the 
CBD domain. 



Consensus patternC-G-G-x(4,7)-G-x(3)-C-x(5)-C-x(3,5)-[NHG]-x-[FYWM]- x(2)-Q-C [The 
four C's are involved in disulfide bonds] Sequences known to belong to this class detected by 
the pattern ALL. 



[ 1] Gilkes N.R., Henrissat B. ? Kilburn D.G., Miller R.C. Jr., Warren R.AJ. Microbiol. Rev. 

55:303-315(1991). 

[ 2] Liu Q. ; der Meer J.P., Reith M.E. 



74. CBS domain. 3D Structure found as a subdomain in TIM barrel of inosine-. CBS domain 
web page. CBS domains are small intracellular modules mostly found in 2 or four copies 
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within a protein. CBS domains are found in cystathionine-beta-synthase (CBS) where 
mutations lead to homocystinuria. Two CBS domains are found in inosine-monophosphate 
dehydrogenase from all species, however the CBS domains are not needed for activity. Two 
CBS domains are found in intracellular loops of several chloride channels. Mutations in this 
domain of Swiss:P35520 lead to homocystinuria. 
Number of members: 414 

[IJMedline: 97172695 The structure of a domain common to archaebacteria and the 
homocystinuria disease protein. Bateman A; Trends Biochem Sci 1997;22:12-13. 
[2]Medline: 96279836 Structure and mechanism of inosine monophosphate dehydrogenase 
in complex with the immunosuppressant mycophenolic-acid. Sintchak MD, Fleming MA, 
Futer O, Raybuck SA, Chambers SP, Caron PR, Murcko MA, Wilson KP; Cell 1996;85:921- 
930. 

Discovery of CBS domain. 

[3]Medline: 97259972 CBS domains in C1C chloride channels implicated in myotonia and 
nephrolithiasis (kidney stones). Ponting CP; J Mol Med 1997;75:160-163. 

75. CDP-OH_Pjransf (CDP-alcohol phosphatidyltransf erase) 

All of these members have the ability to catalyze the displacement of CMP from a CDP- 
alcohol by a second alcohol with formation of a phosphodiester bond and concomitant 
breaking of a phosphoride anhydride bond. Number of members: 32 
A number of phosphatidyltransferases, which are all involved in phospholipid biosynthesis 
and that share the property of catalyzing the displacement of CMP from a CDP-alcohol by a 
second alcohol with formation of a phosphodiester bond and concomitant breaking of a 
phosphoride anhydride bond share a conserved sequence region [1,2]. These enzymes are: 

- Ethanolaminephosphotransferase (EC 2.7.8.1) from yeast (gene EPT1). 

- Diacylglycerol cholinephosphotransferase (EC 2.7.8.2) from yeast (gene CPT1). 

- Phosphatidylglycerophosphate synthase (EC 2.7.8.5) (CDP-diacylglycerol-glycerol-3- 
phosphate 3-phosphatidyltransferase) from bacteria (gene pgsA). 

- Phosphatidylserine synthase (EC 2.7.8.8) (CDP-diacylglycerol-serine O- 
phosphatidyltransferase) from yeast (gene CHOI) and from Bacillus subtilis (gene pssA). 

- Phosphatidylinositol synthase (EC 2.7.8.11) (CDP-diacylglycerol-inositol 3- 
phosphatidyl transferase) from yeast (gene PIS). 
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These enzymes are proteins of from 200 to 400 amino acid residues. The conserved 
region contains three aspartic acid residues and is located in the N-terminal section of the 
sequences. 

5 -Consensus pattern: D-G-x(2)-A-R-x(8)-G-x(3)-D-x(3)-D 

[l]Medline: 97075020 Two-dimensional 1H-NMR of transmembrane peptides from 
Escherichia coli phosphatidylglycerophosphate synthase in micelles. Morein S, Trouard TP, 
Hauksson JB, Rilfors L, Arvidson G ? Lindblom G; Eur J Biochem 1996;241:489-497, 
10 [1] Nikawa J.-L, Kodaki T., Yamashita S. 

J. Biol. Chem. 262:4876-4881(1987). 
[ 2] Hjelmstad R.H., Bell R.M. 

J. Biol. Chem. 266:5094-5134(1991). 

15 

76. CHOD (Cholesterol oxidase) Members of the GMC oxidoreductase family. Number of 
members: 3 

[l]Medline: 94032271. Crystal structure of cholesterol oxidase complexed with a steroid 
2 0 substrate: implications for flavin adenine dinucleotide dependent alcohol oxidases. Li J, 
Vrielink A, Brick P, Blow DM; Biochemistry 1993;32:11507-11515. 

The following FAD flavoproteins oxidoreductases have been found [1,2] to be evolutionary 
related. These enzymes, which are called T GMC oxidoreductases', are listed below. 
2 5 - Glucose oxidase (EC 1.1.3.4) (GOX) from Aspergillus niger. Reaction catalyzed: glucose 
+ oxygen -> delta-luconolactone + hydrogen peroxide. 

- Methanol oxidase (EC 1.1.3.13) (MOX) from fungi. Reaction catalyzed: methanol + 
oxygen -> acetaldehyde + hydrogen peroxide. 

- Choline dehydrogenase (EC 1.1.99.1) (CHD) from bacteria. Reaction catalyzed: choline + 
30 unknown acceptor -> betaine acetaldehyde + reduced acceptor. 

- Glucose dehydrogenase (GLD) (EC 1.1.99.10) from Drosophila. Reaction catalyzed: 
glucose + unknown acceptor -> delta-gluconolactone + reduced acceptor. 
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- Cholesterol oxidase (CHOD) (EC 1.1.3.6) from Brevibacterium sterolicum and 
Streptornyces strain SA-COO. Reaction catalyzed: cholesterol + oxygen -> cholest-4-en-3- 
one + hydrogen peroxide. 

- AlkJ [3], an alcohol dehydrogenase from Pseudomonas oleovorans, which converts 
aliphatic medium-chain-length alcohols into aldehydes. This family also includes a lyase: 

- (R)-mandelonitrile lyase (EC 4.1.2.10) (hydroxynitrile lyase) from plants [4], an enzyme 
involved in cyanogenis, the release of hydrogen cyanide from injured tissues. 

These enzymes are proteins of size ranging from 556 (CHD) to 664 (MOX) amino acid 
residues which share a number of regions of sequence similarities. One of these regions, 
located in the N-terminal section, corresponds to the FAD ADP- binding domain. The 
function of the other conserved domains is not yet known; two of these domains have been 
selected as signature patterns. The first one is located in the N-terminal section of these 
enzymes, about 50 residues after the ADP-binding domain, while the second one is 
located in the central section. 

-Consensus pattern: [GA]-[RKN]-x-[LIV]-G(2)-[GST](2)-x-[LIVM]-N-x(3)-[FYWA]~ x(2)- 
[PAG]-x(5)-[DNESH] 

-Consensus pattern: [GS]-[PSTA]-x(2)-[ST]-P-x-[LIVM](2)-x(2)-S-G-[LIVM]-G 

[ 1] Cavener D.R. J. MoL Biol. 223:811-814(1992). 

[ 2] Henikoff S., Henikoff J.G. Genomics 19:97-107(1994). 

[ 3] van Beilen J.B., Eggink G., Enequist EL, Bos R., Witholt B, MoL Microbiol. 6:3121- 
3136(1992). 

[ 4] Cheng LP., Poulton J.E. Plant Cell Physiol. 34:1139-1143(1993). 

77. CKS (Cyclin-dependent kinase regulatory subunit) Number of members: 11. Cyclin- 
dependent kinases (CDK) are protein kinases which associate with cyclins to regulate 
eukaryotic cell cycle progression. The most well known CDK is p34-cdc2 (CDC28 in yeast) 
which is required for entry into S-phase and mitosis. CDK 1 s bind to a regulatory subunit 
which is essential for their biological function. This regulatory subunit is a small protein of 
79 to 150 residues. In yeast (gene CKS1) and in fission yeast (gene sucl) a single isoform 
is known, while mammals have two highly related isoforms. It has been shown [1] that 
these CDK regulatory subunits assemble as an hexamer which then acts as a hub for the 
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oligomerization of six CDK catalytic subunits. The sequence of CDK regulatory subunits are 
highly conserved therefore, the two most conserved regions have been used as signature 
patterns. 

5 -Consensus pattern: Y-S-x-[KR]-Y-x-[DE](2)-x-[FY]-E-Y-R-H-V-x-[LV]-[PT]-[KRP] 
-Consensus pattern: H-x-P-E-x-H-[IV]-L-L-F-[KR] 

[ 1] Parge H.E., Arvai A.S., Murtari D.J., Reed S.I., Tainer J A. Science 262:387-395(1993). 

10 

78. CK_II_beta (Casein kinase II regulatory subunit) 

Number of members: 16. Casein kinase II (CK-2) [1] is an ubiquitous eukaryotic 
serine/threonine protein kinase which is found both in the cytoplasm and the nucleus and 
whose substrates are numerous. It generally phosphorylates Ser or Thr at the N-terminal 

15 of stretch of acidic residues (see <PDOC00006>). CK-2 exists as an heterotetramer 

composed of two catalytic subunits (alpha) and two regulatory subunits (beta). In most 
species there are two closely related isoforms of the catalytic subunit: alpha and alpha*. 
Some species, such as fungi and plants, express two forms of regulatory subunits: beta and 
beta'. The exact function of the regulatory subunit is not yet known. It is a highly conserved 

2 0 protein of about 25 Kd that contains, in its central section, a cysteine-rich motif that could 
be involved in binding a metal such as zinc [2]. This region has been used as a signature 
pattern. 

-Consensus pattern: C-P-x-[LIVMY]-x-C-x(5)-[LI]-P-[LIVMC]-G-x(9)-V-[KR]-x(2)-C-P-x- 

25 C 

[ 1] Allende J.E, Allende C.C. FASEB J. 9:313-323(1995). 

[ 2] Reed J.C., Bidwai A.P., Glover C.V.C. J. Biol. Chem. 269:18192-18200(1994). 

30 

79. CLP_protease (Clp protease) 

These proteins belong to family S14 in the classification of peptidases. 

The Clp protease has an active site catalytic triad. In E. coli Clp protease, ser-111, his- 
136 and asp- 185 form the catalytic triad. 
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-!- Swiss:P48254 has lost all of these active site residues and is therefore inactive. 
-!- Swiss:P42379 contains two large insertions, Swiss:P42380 contains one large insertion. 
Number of members: 38 

The endopeptidase Clp (EC 3.4.21.92) from Escherichia coli cleaves peptides in various 
5 proteins in a process that requires ATP hydrolysis [1,2]. Clp is a dimeric protein which 

consists of a proteolytic subunit (gene clpP) and either of two related ATP-binding regulatory 
subunits (genes clpA and clpX). ClpP is a serine protease which has a chymotrypsin-like 
activity. Its catalytic activity seems to be provided by a charge relay system similar to that 
of the trypsin family of serine proteases, but which evolved by independent convergent 
1 0 evolution. Proteases highly similar to ClpP have been found to be encoded in the genome of 
the chloroplast of plants and seem to be also present in other eukaryotes. The sequences 
around two of the residues involved in the catalytic triad (a serine and a histidine) are 
highly conserved and can be used as signature patterns specific to that category of 
proteases. 

15 

-Consensus pattern: T-x(2)-[LIVMF]-G-x-A-[SAC]~S-[MSA]-[PAG]-[STA] [S is the active 
site residue] 

-Consensus pattern: R-x(3)-[EAP]-x(3)-[LIVMFYT]-M-[LIVM]-H-Q-P [H is the active site 
residue] 

20 

[l]Medline: 98050920. The structure of ClpP at 2.3 angstroms resolution suggests a model 
for ATP-dependent proteolysis. Wang J, Hartling JA, Flanagan JM; Cell 1997;91:447-456. 
[ 1] Maurizi M.R., Clark W.P., Kim S.-H., Gottesman S. J. Biol. Chem. 265:12546- 
12552(1990). 

2 5 [2] Gottesman S. ? Maurizi MR. Microbiol. Rev. 56:592-621(1992). 
[ 3] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 



80. CNGjnembrane (Transmembrane region cyclic Nucleotide Gated Channel) 
30 [l]Medline: 94224763. Cyclic nucleotide-gated channels: an expanding new family of ion 
channels. Yau KW; Proc Natl Acad Sci USA 1994;91:3481-3483. 
This family is found to the N-terminus of the cNMP__binding. Number of members: 56. 
Proteins that bind cyclic nucleotides (cAMP or cGMP) share a structural domain of about 
120 residues [1-3], The best studied of these proteins is the prokaryotic catabolite gene 
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activator (also known as the cAMP receptor protein) (gene crp) where such a domain 
is known to be composed of three alpha-helices and a distinctive eight-stranded, 
antiparallel beta-barrel structure. Such a domain is known to exist in the following proteins: 

- Prokaryotic catabolite gene activator protein (CAP). 

5 - cAMP- and cGMP-dependent protein kinases (cAPK and cGPK). Both types of kinases 
contains two tandem copies of the cyclic nucleotide-binding domain. The cAPK's are 
composed of two different subunits: a catalytic chain and a regulatory chain which contains 
both copies of the domain. The cGPK's are single chain enzymes that include the two copies 
of the domain in their N-terminal section. The nucleotide specificity of cAPK and cGPK is 
1 0 due to an amino acid in the conserved region of beta-barrel 7: a threonine that is invariant in 
cGPK is an alanine in most cAPK. 

- Vertebrate cyclic nucleotide-gated ion-channels. Two such cations channels have been 
fully characterized. One is found in rod cells where it plays a role in visual signal 
transduction. It specifically binds to cGMP leading to an opening of the channel and 

15 thereby causing a depolarization of rod photoreceptors. In olfactory epithelium a similar, 
cAMP-binding, channel plays a role in odorant signal transduction. There are six invariant 
amino acids in this domain, three of which are glycine residues that are thought to be 
essential for maintenance of the structural integrity of the beta-barrel. Two signature 
patterns have been developed for this domain. The first pattern is located within beta-barrels 

2 0 and 3 and contains the first two conserved Gly. The second pattern is located within beta- 
barrels 6 and 7 and contains the third conserved Gly as well as the three other invariant 
residues. 

-Consensus pattern: [LIVM]-[VIC]-x(2)-G-[DENQTA]-x-[GAC]-x(2)-[LIVMFY](4)-x(2)-G 
2 5 -Consensus pattern: [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,ll)-R-[STAQ]-A-x-[LIVMA]-x- 
[STACV] 

[ 1] Weber I.T., Shabb J.B., Corbin J.D. Biochemistry 28:6122-6127(1989). 
[ 2] Kaupp U.B. Trends Neurosci. 14:150-157(1991). 
30 [3] Shabb J.B., Corbin J.D. J. Biol. Chem. 267:5723-5726(1992). 

81. COX10_ctaB_cyoE (Cytochrome c oxidase assembly factor) 
[l]Medline: 95191390 
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Biosynthesis and functional role of haem O and haem A 
Mogi T, Saiki K, Anraku Y; Mol Microbiol 1994;14:391-398. 
Cytochrome c oxidase is a multi subunit enzyme. The complexity 
of this enzyme requires assistance in building the complex. 
5 This is carried out by the Cytochrome c oxidase assembly factor. 
Number of members: 31 

Cytochrome c oxidase is an oligomeric enzymatic complex which seems to require 
the aid of a number of proteins that either act as chaperonins to help the 
1 0 subunits of the enzyme to fold correctly, or assist in the assembly of the 

metal centers [1]. One of these subunits is known as COX10 in yeast and as 
ctaB [2] in aerobic prokaryotes. It is evolutionary related to cyoE protein 
from the Escherichia coli cytochrome O terminal oxidase complex. 

15 These proteins probably contain [3] seven transmembrane segments. The most 
conserved region is located in a loop between the second and third of these 
segments and has been selected as a signature pattern. 

-Consensus pattern: [ED]-x-D-x(2)-M-x-R-T-x(2)-R-x(4)-G 

20 

[ 1] Nobrega M.P., Nobrega F.G., Tzagoloff A. 

J. BioL Chem. 265:14220-14226(1990). 
[ 2] Cao J., Hosier J., Shapleigh J., Revzin A., Ferguson-Miller S. 

J. Biol. Chem. 267:24273-24278(1992). 
25 [3] Chepuri V., Gennis R.B. 

J. BioL Chem. 265:12978-12986(1990). 

82. COX3 (Cytochrome c oxidase subunit III) 
3 0 This family corresponds to chains c and p. 
[l]Medline: 96216288 
The whole structure of the 13-subunit oxidized cytochrome c 

oxidase at 2.8 A. Tsukihara T, Aoyama H, Yamashita E ? Tomizaki T, Yamaguchi H, 
Shinzawa-Itoh K, Nakashima R, Yaono R, Yoshikawa S; Science 1996;272:1136-1144. 
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83. COX5B (Cytochrome c oxidase subunit Vb) 
[1] 

Medline: 96216288 

The whole structure of the 13-subunit oxidized cytochrome c 
oxidase at 2.8 A. 

Tsukihara T, Aoyama H, Yamashita E, Tomizaki T, Yamaguchi H, 
Shinzawa-Itoh K, Nakashima R ? Yaono R, Yoshikawa S; 

Science 1996;272:1136-1144. 
This family consists of chains F and S 
Number of members: 10 

Cytochrome c oxidase (EC 1.9.3.1) [1] is an oligomeric enzymatic complex which 

is a component of the respiratory chain complex and is involved in the 

transfer of electrons from cytochrome c to oxygen. In eukaryotes this enzyme 

complex is located in the mitochondrial inner membrane; in aerobic prokaryotes 

it is found in the plasma membrane. In addition to the three large subunits 

that form the catalytic center of the enzyme complex there are, in eukaryotes, 

a variable number of small polypeptidic subunits. One of these subunits which 

is known as Vb in mammals, V in slime mold and IV in yeast, binds a zinc atom. 

The sequence of subunit Vb is well conserved and includes three conserved 

cysteines that are thought to coordinate the zinc ion [2]. Two of these 

cysteines are clustered in the C-terminal section of the subunit; this region has been selected 

as a signature pattern. 

-Consensus pattern: [LIVM](2)-[FYW]-x(10)-C-x(2)-C-G-x(2)-[FY]-K-L [The two Cs 
probably bind zinc] 

[ 1] Capaldi R.A., Malatesta F. ? Darley-Usmar V.M. 

Biochim. Biophys. Acta 726:135-148(1983). 
[ 2] Rizzuto R., Sandona D., Brini M., Capaldi R.A., Bisson R. 

Biochim. Biophys. Acta 1129:100-104(1991). 
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84. COesterase (Carboxylesterases) 
Cholinesterase pages 
5 The prints entry is specific to acetylcholinesterase 
Number of members: 273 

Higher eukaryotes have many distinct esterases. Among the different types are 
those which act on carboxylic esters (EC 3.1.1.-). Carboxyl-esterases have 
10 been classified into three categories (A, B and C) on the basis of 

differential patterns of inhibition by organophosphates. The sequence of a 
number of type-B carboxylesterases indicates [1,2,3] that the majority are 
evolutionary related. This family currently consists of the following 
proteins: 

15 

-Acetylcholinesterase (EC 3.1.1.7) (AChE) [El] from vertebrates and from 
Drosophila. 

- Mammalian cholinesterase II (butyryl cholinesterase) (EC 3.1.1.8). 
Acetylcholinesterase and cholinesterase II are closely related enzymes that 

20 hydrolyze choline esters [4]. 

- Mammalian liver microsomal carboxylesterases (EC 3.1.1.1). 

- Drosophila esterase 6, produced in the anterior ejaculatory duct of the 
male insect reproductive system where it plays an important role in its 
reproductive biology. 

25 - Drosophila esterase P. 

- Culex pipiens (mosquito) esterases Bl and B2. 

- Myzus persicae (peach-potato aphid) esterases E4 and FE4. 

- Mammalian bile-salt-activated lipase (BAL) [5], a multifunctional lipase 
which catalyzes fat and vitamin absorption. It is activated by bile salts 

30 in infant intestine where it helps to digest milk fats. 

- Insect juvenile hormone esterase (JH esterase) (EC 3.1.1.59). 

- Lipases (EC 3.1.1.3) from the fungi Geotrichum candidum and Candida rugosa. 

- Caenorhabditis gut esterase (gene ges-1). 

- Duck fatty acyl-CoA hydrolase, medium chain (EC 3.1.2.14), an enzyme that 
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may be associated with peroxisome proliferation and may play a role in the 
production of 3 -hydroxy fatty acid diester pheromones. 

- Membrane enclosed crystal proteins from slime mold. These proteins are, 
most probably esterases; the vesicles where they are found have therefore 

5 been termed esterosomes. 

So far two bacterial proteins have been found to belong to this family: 

- Phenmedipham hydrolase (phenylcarbamate hydrolase), an Arthrobacter oxidans 
10 plasmid-encoded enzyme (gene pcd) that degrades the phenylcarbamate 

herbicides phenmedipham and desmedipham by hydrolyzing their central 
carbamate linkages. 

- Para-nitrobenzyl esterase from Bacillus subtilis (gene pnbA). 

15 The following proteins, while having lost their catalytic activity, contain a 
domain evolutionary related to that of carboxylesterases type-B: 

- Thyroglobulin (TG), a glycoprotein specific to the thyroid gland, which is 
the precursor of the iodinated thyroid hormones thyroxine (T4) and triiodo 

20 thyronine (T3). 

- Drosophila protein neuractin (gene nrt) which may mediate or modulate cell 
adhesion between embryonic cells during development. 

- Drosophila protein glutactin (gene git), whose function is not known. 

25 As is the case for lipases and serine proteases, the catalytic apparatus of 
esterases involves three residues (catalytic triad): a serine, a glutamate or 
aspartate and a histidine. The sequence around the active site serine is well 
conserved and can be used as a signature pattern. A conserved region located in 
the N-terminal section containing a cysteine involved in a disulfide bond 

3 0 has been selected as a second signature pattern. 

-Consensus pattern: F-[GR]-G-x(4)-[LIVM]-x-[LIV]-x-G-x-S-[STAG]-G[S is the active site 
residue] 
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-Consensus pattern: [ED]-D-C-L-[YT]-[LIV]-[DNS]-[LIV]-[LIVFYW]-x-[PQR] [C is 
involved in a disulfide bond] 

[ 1] Myers M., Richmond R.C, Oakeshott J.G. MoL Biol. EvoL 5:113-119(1988). 

[ 2] Krejci E., Duval N., Chatonnet A., Vincens P., Massoulie J. Proc. Natl. Acad. Sci. U.S. A, 

88:6647-6651(1991). 

[ 3] Cygler M., Schrag J.D., Sussman J.L., Harel M. ? Silman I. Gentry M.K., Doctor B.P. 

Protein Sci. 2:366-382(1993). 

[ 4] Lockridge O. BioEssays 9:125-128(1988). 

[ 5] Wang C.-S., Hartsuck J.A. Biochim. Biophys. Acta 1166:1-19(1993). 

85. CPSase_L_chain (Carbamoyl-phosphate synthase (CPSase)) 
[1] 

Medline: 94347758 

Three-dimensional structure of the biotin carboxylase subunit. 
of acetyl-CoA carboxylase. 
Waldrop GL, Rayment I, Holden HM; 
Biochemistry 1994;33:10249-10256. 

[1] 

Medline: 90285162 

Mammalian carbamyl phosphate synthetase (CPS). DNA sequence and 
evolution of the CPS domain of the Syrian hamster multifunctional 
protein CAD. 

Simmer JP, Kelly RE, Rinker AG Jr, Scully JL, Evans DR; 

Biol Chem 1990;265:10395-10402. 
Carbamoyl-phosphate synthase catalyzes the ATP-dependent synthesis of 
carbamyl-phosphate from glutamine or ammonia and bicarbonate. This 
important enzyme initiates both the urea cycle and the biosynthesis 
of arginine and/or pyrimidines [2]. 

The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a 
heterodimer of a small and large chain. The small chain promotes 
the hydrolysis of glutamine to ammonia, which is used by the large 
chain to synthesize carbamoyl phosphate. See CPSase_sm_chain. 
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The small chain has a GATase domain in the carboxyl terminus. 
See GATase. 
Number of members: 181 

5 Carbamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of 
carbamyl-phosphate from glutamine (EC 6.3.5.5) or ammonia (EC 6.3.4.16) and 
bicarbonate [1]. This important enzyme initiates both the urea cycle and the 
biosynthesis of arginine and pyrimidines. 

10 Glutamine-dependent CPSase (CPSase II) is involved in the biosynthesis of 
pyrimidines and purines. In bacteria such as Escherichia coli, a single enzyme 
is involved in both biosynthetic pathways while other bacteria have separate 
enzymes. The bacterial enzymes are formed of two subunits. A small chain (gene 
carA) that provides glutamine amidotransferase activity (GATase) necessary for 

15 removal of the ammonia group from glutamine, and a large chain (gene carB) 
that provides CPSase activity. Such a structure is also present in fungi for 
arginine biosynthesis (genes CPA1 and CPA2). In most eukaryotes, the first 
three steps of pyrimidine biosynthesis are catalyzed by a large 
multifunctional enzyme - called URA2 in yeast, rudimentary in Drosophila and 

2 0 CAD in mammals [2]. The CPSase domain is located between an N-terminal GATase 
domain and the C-terminal part which encompass the dihydroorotase and 
aspartate transcarbamylase activities. 

Ammonia-dependent CPSase (CPSase I) is involved in the urea cycle in ureolytic 

2 5 vertebrates; it is a monofunctional protein located in the mitochondrial 

matrix. 

The CPSase domain is typically 120 Kd in size and has arisen from the 
duplication of an ancestral subdomain of about 500 amino acids. Each subdomain 

3 0 independently binds to ATP and it is suggested that the two homologous halves 

act separately, one to catalyze the phosphorylation of bicarbonate to carboxy 
phosphate and the other that of carbamate to carbamyl phosphate. 



The CPSase subdomain is also present in a single copy in the biotin-dependent 
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enzymes acetyl-CoA carboxylase (EC 6.4.1.2) (ACC), propionyl-CoA carboxylase 
(EC 6.4.1.3) (PCCase), pyruvate carboxylase (EC 6.4.1.1) (PC) and urea 
carboxylase (EC 6.3.4,6). 

5 Two conserved regions which are probably important for binding ATP and/or catalytic 
activity have been selected as signatures for the subdomain. 

-Consensus pattern: [FW]-[PS]-[LIVMC]-[LIVMA]-[LIVM]-[KR]-[PSA]-[STA]-x(3)- 
[SG]-G-x-[AG] 

1 0 -Consensus pattern: [LIVMF]-[LIMN]-E-[LIVMCA]-N-[PATLIVM]-[KR]-[LIVMSTAC] 

[ 1] Simmer J .P., Kelly R.E., Rinker A.G. Jr., Scully J.L., Evans D.R. 

J. Biol. Chem. 265:10395-10402(1990). 
[ 2] Davidson J.N., Chen K.C., Jamison R.S., Musmanno L.A., Kern C.B. 
15 BioEssays 15:157-164(1993). 

86. CPSase_sm_chain (Carbamoyl-phosphate synthase small chain, CPSase domain) 
[1] 

2 0 Medline: 90285162 

Mammalian carbamyl phosphate synthetase (CPS). DNA sequence and 
evolution of the CPS domain of the Syrian hamster multifunctional 
protein CAD. 

Simmer JP, Kelly RE, Rinker AG Jr, Scully JL, Evans DR; 
2 5 Biol Chem 1990;265:10395-10402. 

The carbamoyl-phosphate synthase domain is in the amino terminus of 
protein. 

Carbamoyl-phosphate synthase catalyzes the ATP-dependent synthesis of 
carbamyl-phosphate from glutamine or ammonia and bicarbonate. This 

3 0 important enzyme initiates both the urea cycle and the biosynthesis 

of arginine and/or pyrimidines [1]. 

The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a 
heterodimer of a small and large chain. The small chain promotes 
the hydrolysis of glutamine to ammonia, which is used by the large 
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chain to synthesize carbamoyl phosphate. See CPSase_L_chain. 
The small chain has a GATase domain in the carboxyl terminus. 
See GATase. 
Number of members: 46 

5 

Carbamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of 
carbamyl-phosphate from glutamine (EC 6.3.5.5) or ammonia (EC 63.4.16) and 
bicarbonate [1]. This important enzyme initiates both the urea cycle and the 
biosynthesis of arginine and pyrimidines. 

10 

Glutamine-dependent CPSase (CPSase II) is involved in the biosynthesis of 
pyrimidines and purines. In bacteria such as Escherichia coli, a single enzyme 
is involved in both biosynthetic pathways while other bacteria have separate 
enzymes. The bacterial enzymes are formed of two subunits. A small chain (gene 

15 carA) that provides glutamine amidotransferase activity (GATase) necessary for 
removal of the ammonia group from glutamine, and a large chain (gene carB) 
that provides CPSase activity. Such a structure is also present in fungi for 
arginine biosynthesis (genes CPA1 and CPA2). In most eukaryotes, the first 
three steps of pyrimidine biosynthesis are catalyzed by a large 

2 0 multifunctional enzyme - called URA2 in yeast, rudimentary in Drosophila and 

CAD in mammals [2]. The CPSase domain is located between an N-terminal GATase 
domain and the C-terminal part which encompass the dihydroorotase and 
aspartate transcarbamylase activities. 

25 Ammonia-dependent CPSase (CPSase I) is involved in the urea cycle in ureolytic 
vertebrates; it is a monofunctional protein located in the mitochondrial 
matrix. 

The CPSase domain is typically 120 Kd in size and has arisen from the 
30 duplication of an ancestral subdomain of about 500 amino acids. Each subdomain 
independently binds to ATP and it is suggested that the two homologous halves 
act separately, one to catalyze the phosphorylation of bicarbonate to carboxy 
phosphate and the other that of carbamate to carbamyl phosphate. 
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The CPSase subdomain is also present in a single copy in the biotin-dependent 
enzymes acetyl-CoA carboxylase (EC 6.4.1.2) (ACC) ? propionyl-CoA carboxylase 
(EC 6.4.1.3) (PCCase), pyruvate carboxylase (EC 6.4.1.1) (PC) and urea 
carboxylase (EC 6.3.4.6). 

5 

Two conserved regions which are probably important for binding ATP and/or catalytic 
activity have been selected as signatures for the subdomain. 

-Consensus pattern: [FW]-[PS]-[LIVMC]-[LIVMA]-[LIVM]-[KR]-[PSA]-[STA]-x(3)- 
10 [SG]-G-x-[AG] 

-Consensus pattern: [LIVMF]-[LIMN]-E-[LIVMCA]-N-[PATLIVM]-[KR]-[LIVMSTAC] 

[ 1] Simmer J .P., Kelly R.E., Rinker A.G. Jr., Scully J.L. ? Evans D.R. 
J. Biol. Chem. 265:10395-10402(1990). 
15 [2] Davidson J.N., Chen K.C., Jamison R.S., Musmanno LA., Kern CB. 
BioEssays 15:157-164(1993). 

87. CRAL TRIO (CRAL/TRIO domain) 

2 0 [1] 

Medline: 98121119 

Crystal structure of the Saccharomyces cerevisiae phosphatidyl- 
inositol-transfer protein. 

Sha B ? Phillips SE ? Bankaitis VA, Luo M; 
25 Nature 1998;391:506-510. 

The original profile has been extended to include the carboxyl 

domain from the known structure of Secl4. Swiss:P10911 has not 

been included in the Pfam family because it does not appear to 

contain a complete structural domain. 

3 0 Number of members: 39 



88. CSD ( 'Cold-shock* DNA-binding domain) 
[1] 
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Medline: 94255482 

Crystal structure of CspA, the major cold shock 
protein of Escherichia coli. 
Schindelin H, Jiang W, Inouye M, Heinemann U; 
Proc Natl Acad Sci U S A 1994;91:5119-5123. 
Number of members: 121 

A conserved domain of about 70 amino acids has been found in prokaryotic and 
eukaryotic DNA-binding proteins [1,2,3,E1]. This domain, which is known as the 
'cold-shock domain' (CSD) is present in the proteins listed below. 

- Escherichia coli protein CS7.4 (gene cspA) which is induced in response to 
low temperature (cold-shock protein) and which binds to and stimulates the 
transcription of the CCAAT-containing promoters of the HN-S protein and of 
gyrA. 

- Mammalian Y box binding protein 1 (YB1). A protein that binds to the CCAAT- 
containing Y box of mammalian HLA class II genes. 

- Xenopus Y box binding proteins -1 and -2 (Yl and Y2). Proteins that bind to 
the CCAAT-containing Y box of Xenopus hsp70 genes. 

- Xenopus B box binding protein (YB3). YB3 binds the B box promoter element 
of genes transcribed by RNA polymerase III. 

- Enhancer factor I subunit A (EFI-A) (dbpB). A protein that also bind to 
CCAAT-motif in various gene promoters. 

- DbpA, a Human DNA-binding protein of unknown specificity. 

- Bacillus subtilis cold-shock proteins cspB and cspC. 

- Streptomyces clavuligerus protein SC 7.0. 

- Escherichia coli proteins cspB, cspC, cspD, cspE and cspF. 

- Unr, a mammalian gene encoded upstream of the N-ras gene. Unr contains nine 
repeats that are similar to the CSD domain. The function of Unr is not yet 
known but it could be a multivalent DNA-binding protein. 

As a signature pattern for the CSD domain, its most conserved 

region which is located in its N-terminal section has been selected. It must be noted that the 
beginning of this region is highly similar [4] to the RNP-1 RNA-binding motif. 
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-Consensus pattern: [FY]-G-F-I-x(6 ? 7)-[DER]-[LIVM]-F-x-H-x-[STKR]-x-[LIVMFY] 

[ 1] Doniger J., Landsman D., Gonda M.A., Wistow G. 

New Biol. 4:389-395(1992). 
[ 2] Wistow G. 

Nature 344:823-824(1990). 
[ 3] Jones P.G., Inouye M. 

Mol. Microbiol. 11:811-818(1994). 
[ 4] Landsman D. 

Nucleic Acids Res. 20:2861-2864(1992). 

89. CTF_NFI (CTF/NF-I family) 
Number of members: 45 

Nuclear factor I (NF-I) or CCAAT box-binding transcription factor (CTF) [1,2] 
(also known as TGGCA-binding proteins) are a family of vertebrate nuclear 
proteins which recognize and bind, as dimers, the palindromic DNA sequence 
5 -TGGCANNNTGCCA-3 \ CTF/NF-I binding sites are present in viral and cellular 
promoters and in the origin of DNA replication of Adenovirus type 2. 

The CTF/NF-I proteins were first identified as nuclear factor I, a collection 
of proteins that activate the replication of several Adenovirus serotypes 
(together with NF-II and NF-III) [3]. The family of proteins was also 
identified as the CTF transcription factors, before the NFI and CTF families 
were found to be identical [4]. The CTF/NF-I proteins are individually capable 
of activating transcription and DNA replication. The CTF/NF-I family name has 
also been dubbed as NFI, NF-I or NFI. 

In a given species, there are a large number of different CTF/NF-I proteins. 
The multiplicity of CTF/NF-I is known to be generated both by alternative 
splicing and by the occurrence of four different genes. The known forms of 
NF-I genes have been classified as: 
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- The CTF-like factors subfamily (prototype form: CTF-1) [4] 

- The NFI-X proteins. 

- The NFI-A proteins. 

- The NFI-B proteins. 

So far, all CTF/NF-I family members appear to have similar transcription and 
replication activities. 

CTF/NF-1 proteins contains 400 to 600 amino acids. The N-terminal 200 amino- 
acid sequence, almost perfectly conserved in all species and genes sequenced, 
mediates site-specific DNA recognition, protein dimerization and Adenovirus 
DNA replication. The C-terminal 100 amino acids contain the transcriptional 
activation domain. This activation domain is the target of gene expression 
regulatory pathways ellicited by growth factors and it interacts with basal 
transcription factors and with histone H3 [6]. 

A perfectly conserved, highly charged 12 residue peptide located in the N-terminal part of 
CTF/NF-I has been selected as a specific signature for this family of proteins. 

-Consensus pattern: R-K-R-K-Y-F-K-K-H-E-K-R 

[ 1] Mermod N., O'Neill EA, Kelly T.J., Tjian R. 

Cell 58:741-753(1989). 
[ 2] Rupp R.A.W., Kruse IL, Multhaup G., Goebel U., Beyreuther K., 

Sippel A.E. 

Nucleic Acids Res. 18:2607-2616(1990). 
[ 3] Nagata K., Guggenheimer R.A., Enomoto T., Lichy J.H., Hurwitz J. 

Proc. Natl. Acad. Sci. U.S.A. 79:6438-6442(1982). 
[ 4] Santoro C, Mermod N., Andrews P.C., Tjian R. 

Nature 334:2118-2224(1988). 
[ 5] Gil G., Smith J.R., Goldstein J.L., Slaughter C.A., Orth K. ? Brown M.S., 

Osborne T.F. 

Proc. Natl. Acad. Sci. U.S A 85:8963-8967(1988). 
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[ 6] Alevizopoulos A. ? Dusserre Y., Tsai-Pflugfelder M. ? von der Weid T., 
Wahli W. ? Mermod N. 
Genes Dev. 9:3051-3066(1995). 

5 

90. Calsequestrin (Calsequestrin) 
Number of members: 13 

Calsequestrin is a moderate-affinity, high-capacity calcium-binding protein 
10 of cardiac and skeletal muscle [1], where it is located in the lumenal space 
of the sarcoplasmic reticulum terminal cisternae. Calsequestrin acts as a 
calcium buffer and plays an important role in the muscle excitation- 
contraction coupling. It is a highly acidic protein of about 400 amino acid 
residues that binds more than 40 moles of calcium per mole of protein. There 
1 5 are at least two different forms of calsequestrin: one which is expressed in 
cardiac muscles and another in skeletal muscles. Both forms have highly 
similar sequences. 

Two signature sequences have been developed. The first corresponds to the N- 
2 0 terminus of the mature protein, the second is located just in front of the 
C-terminus of the protein which is composed of a highly acidic tail of 
variable length. 

-Consensus pattern: [EQ]-[DE]-G-L-[DN]-F-P-x-Y-D-G-x-D-R-V 
2 5 -Consensus pattern: [DE]-L-E-D-W-[LIVM]-E-D-V-L-x-G-x-[LIVM]-N-T-E-D-D-D 

[ 1] Treves S., Vilsen B., Chiozzi P., Andersen J.P., Zorzato F. 
Biochem. J. 283:767-772(1992). 

30 

91. CarboxyMrans (Carboxyl transferase domain) 
[1] 

Medline: 93374821 

Primary structure of the monomer of the 12S subunit of 
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transcarboxylase as deduced from DNA and characterization of the 
product expressed in Escherichia coli. 
Thornton CG, Kumar GK, Haase FC ? Phillips NF, Woo SB, Park VM, 
Magner WJ, Shenoy BC, Wood HG, Samols D; 
5 J Bacteriol 1993;175:5301-5308. 

[2] 

Medline: 93358891 

Molecular evolution of biotin-dependent carboxylases. 

Toh H, Kondo H, Tanabe T; 
10 Eur J Biochem 1993;215:687-696. 

All of the members in this family are biotin dependent carboxylases. 

The carboxyl transferase domain carries out the following reaction; 

transcarboxylation from biotin to an acceptor molecule. There are 

two recognised types of carboxyl transferase. One of them uses acyl-CoA 
15 and the other uses 2-oxo acid as the acceptor molecule of carbon dioxide. 

All of the members in this family utilise acyl-CoA as the acceptor 

molecule. 
Number of members: 47 

20 

92. Chal_stil_synt (Chalcone and stilbene synthases) 
Number of members: 146 

Chalcone synthases (CHS) (EC 23.1.74) and stilbene synthases (STS) (formerly 

2 5 known as resveratrol synthases) are related plant enzymes [1]. CHS is an 

important enzyme in flavanoid biosynthesis and STS a key enzyme in stilbene- 
type phyloalexin biosynthesis. Both enzymes catalyze the addition of three 
molecules of malonyl-CoA to a starter CoA ester (a typical example is 
4-coumaroyl-CoA), producing either a chalcone (with CHS) or stilbene (with 

3 0 STS). 

These enzymes are proteins of about 390 amino-acid residues. A conserved 
cysteine residue, located in the central section of these proteins, has been 
shown [2] to be essential for the catalytic activity of both enzymes and 
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probably represents the binding site for the 4-coumaryl-CoA group. The region 
around this active site residue is well conserved and can be used as a 
signature pattern. 

In addition to the plant enzymes, this family also includes Bacillus subtilis 
bcsA. 

-Consensus pattern: R-[LIVMFYS]-x-[LIVM]-x-[QHG]-x-G-C-[FYNA]-[GA]-G-[GA]- 
[STAV]-x-[LIVMF]-[RA] [C is the active site residue] 

[ 1] Schroeder J., Schroeder G. 

Z. Naturforsch. 45C: 1-8(1990). 
[ 2] Lanz T., Tropf S. ? Marner F.-J., Schroeder J. ? Schroeder G. 

J. Biol. Chem. 266:9971-9976(1991). 

93. Chorismate_synt (Chorismate synthase) 
Number of members: 19 

Chorismate synthase (EC 4.6.1.4) catalyzes the last of the seven steps in the 
shikimate pathway which is used in prokaryotes, fungi and plants for the 
biosynthesis of aromatic amino acids. It catalyzes the 1,4-trans elimination 
of the phosphate group from 5-enolpyruvylshikimate-3-phosphate (EPSP) to form 
chorismate which can then be used in phenylalanine, tyrosine or tryptophan 
biosynthesis. Chorismate synthase requires the presence of a reduced flavin 
mononucleotide (FMNH2 or FADH2) for its activity. 

Chorismate synthase from various sources shows [1,2] a high degree of sequence 

conservation. It is a protein of about 360 to 400 amino-acid residues. 

Three signature patterns have been developed from conserved regions rich in basic 

residues (mostly arginines). The first is in the N-terminal section, the 

second is central and the third is C-terminal. 



-Consensus pattern: G-E-S-H-[GC]-x(2)-[LIVM]-[GTV]-x-[LIVM](2)-[DE]-G-x-[PV] 
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-Consensus pattern: [GE]-R-[SA](2)-[SAG]-R-[EV]-[ST]-x(2)-[RH]-V-x(2)-G 

-Consensus pattern: R-[SH]-D-[PSV]-[CSAV]-x(4)-[GAI]-x-[IVGSP]-[LIVM]-x-E-[STAH]- 

[LIVM] 

[ 1] Schaller A., Schmid J., Leibinger U., Amrhein N. 

J. Biol. Chem. 266:21434-21438(1991). 
[ 2] Jones D.G.L., Reusser U., Braus G.H. 

Mol. Microbiol. 5:2143-2152(1991). 

94. Clat_adaptor_s (Clathrin adaptor complex small chain) 
Number of members: 21 

Clathrin coated vesicles (CCV) mediate intracellular membrane traffic such as 
receptor mediated endocytosis. In addition to clathrin, the CCV are composed 
of a number of other components including oligomeric complexes which are known 
as adaptor or clathrin assembly proteins (AP) complexes [1]. The adaptor 
complexes are believed to interact with the cytoplasmic tails of membrane 
proteins, leading to their selection and concentration. In mammals two type of 
adaptor complexes are known: AP-1 which is associated with the Golgi complex 
and AP-2 which is associated with the plasma membrane. Both AP-1 and AP-2 are 
heterotetramers that consist of two large chains - the adaptins - (gamma and 
beta* in AP-1; alpha and beta in AP-2); a medium chain (AP47 in AP-1; AP50 in 
AP-2) and a small chain (AP19 in AP-1; AP17 in AP-2). 

The small chains of AP-1 and AP-2 are evolutionary related proteins of about 
18 Kd. Homologs of AP17 and AP19 have also been found in yeast (genes APS1/ 
YAP19 and APS2/YAP17) [2,3,4]. AP17 and AP19 are also related to the zeta- 
chain [5] of coatomer (zeta-cop), a cytosolic protein complex that reversibly 
associates with Golgi membranes to form vesicles that mediate biosynthetic 
protein transport from the endoplasmic reticulum, via the Golgi up to the 
trans Golgi network. 
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A conserved region in the central section of these proteins has been selected as a signature 
pattern. 

-Consensus pattern: [LIVM](2>Y-[KR]-x(4)-L-Y-F 

5 

[ 1] Pearse B.M., Robinson M.S. 

Annu. Rev. Cell Biol. 6:151-171(1990). 
[ 2] Kirchhausen T., Davis A.C., Frucht S., O'Brine Greco B., Payne G.S., 

Tubb B. 

10 J. Biol. Chem. 266:11153-11157(1991). 
[ 3] Nakai M, Takada T., Endo T. 

Biochim. Biophys. Acta 1174:282-284(1993). 
[ 4] Phan H.L., Finlay J A., Chu D.S., Tan P.K., Kirchhausen T., Payne G.S. 
EMBO J. 13:1706-1717(1994). 
15 [5] Kuge O., Hara-Kuge S. ? Orci L., Ravazzola M. ? Amherdt M., Tanigawa G., 
Wieland F.T., Rothman J.E. 
J. Cell Biol. 123:1727-1734(1993). 

2 0 95 . Clathrin_lg__ch (Clathrin light chain.) 

Number of members: 8 

Clathrin [1,2] is the major coat-forming protein that encloses vesicles such 
as coated pits and forms cell surface patches involved in membrane traffic 
25 within eukaryotic cells. The clathrin coats (called triskelions) are composed 
of three heavy chains (180 Kd) and three light chains (23 to 27 Kd). 

The clathrin light chains [3], which may help to properly orient the assembly 
and disassembly of the clathrin coats, bind non-covalently to the heavy chain, 

3 0 they also bind calcium and interact with the hsc70 uncoating ATPase. 

- In higher eukaryotes two genes code for distinct but related light chains: 
LC(a) and LC(b). Each of the two genes can yield, by tissue-specific 
alternative splicing, two separate forms which differ by the insertion of a 
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sequence of respectively thirty or eighteen residues. There is, in the N- 
terminal part of the clathrin light chains a domain of twenty one amino 
acid residues which is perfectly conserved in LC(a) and LC(b). 
- In yeast there is a single light chain (gene CLC1) whose sequence is only 
distantly related to that of higher eukaryotes. 

Two signature patterns have been developed for clathrin light chains. The first 
pattern is a heptapeptide from the center of the conserved N-terminal region 
of eukaryotic light chains; the second pattern is derived from a positively 
charged region located in the C-terminal extremity of all known clathrin light 
chains. 

-Consensus pattern: F-L-A-Q-Q-E-S 

[ 1] Keen J.H. 

Annu. Rev, Biochem. 59:415-438(1990). 
[ 2] Brodsky F.M. 

Science 242:1396-1402(1988). 
[ 3] Brodsky F.M., Hill B.L., Acton S.L., Naethke I., Wong D.H., 

Ponnambalam S. ? Parham P. 

Trends Biochem. Sci. 16:208-213(1991). 

96. (Clathrin repeat) 7-fold repeat in Clathrin and VPS 

Each repeat is about 140 amino acids long. The repeats 

occur in the arm region of the Clathrin heavy chain. 
Number of members: 79 

[i] 

Medline: 92191269 

Folding and trimerization of clathrin subunits at the 
triskelion hub. 

Nathke IS, Heuser J, Lupas A, Stock J, Turck CW, Brodsky FM; 
Cell 1992;68:899-910. [2] 
Medline: 88097376 
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Clathrin heavy chain: molecular cloning and complete primary 
structure. 

Kirchhausen T ? Harrison SC, Chow EP, Mattaliano RJ, 
Ramachandran KL ? Smart J, Brosius J; 
Proc Natl Acad Sci U S A 1987;84:8805-8809. 

97. Collagen (Collagen triple helix repeat (20 copies)) 

[1] Medline: 94059583 

New members of the collagen superf amily 

Mayne R ? Brewton RG; 
Curr Opin Cell Biol 1993;5:883-890. 

Scurvy is associated with collagens. 

Members of this family belong to the collagen superf amily [1]. 
Collagens are generally extracellular structural proteins 
involved in formation of connective tissue structure. 
The alignment contains 20 copies of the G-X-Y repeat that 
forms a triple helix. The first position of the repeat is 
glycine, the second and third positions can be any residue 
but are frequently proline and hydroxyproline. Collagens 
are post translationally modified by proline hydoxylase 
to form the hydroxyproline residues. Defective 
hydroxy lation is the cause of scurvy. 

Some members of the collagen superfamily are not involved 
in connective tissue structure but share the same triple 
helical structure. 
Number of members: 2125 

98. Coprogen_oxidas (Coproporphyrinogen III oxidase) 
Number of members: 12 

Coproporphyrinogen III oxidase (EC 1.3.3.3) (coproporphyrinogenase) [1,2] 
catalyzes the oxidative decarboxylation of coproporphyrinogen III into 
protoporphyrinogen IX, a common step in the pathway for the biosynthesis of 
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porphyrins such as heme, chlorophyll or cobalamin. 

Coproporphyrinogen III oxidase is an enzyme that requires iron forks 
activity. A cysteine seems to be important for the catalytic mechanism [3]. 
Sequences from a variety of eukaryotic and prokaryotic sources show that 
this enzyme has been evolutionarily conserved. A highly conserved region in 
the central part of the sequence has been selected as a signature pattern. This 
region contains the only conserved cysteine and is rich in charged amino 
acids. 

-Consensus pattern: K-x-W-C-x(2)-[FYH](3)-[LIVM]-x-H-R-x-E-x-R-G-[LIVM]-G- 
[LIVM]-F-F-D 

[ 1] Xu K., Elliott T. 

J. Bacteriol. 175:4990-4999(1993). 
[ 2] Kohno H., Furukawa T., Yoshinaga T., Tokunaga R., Taketani S. 

J. Biol. Chem. 268:21359-21363(1993). 
[ 3] Camadro J.M., Chambon H. ? Jolles J., Labbe P. 

Eur. J. Biochem. 156:579-587(1986). 
[ 4] Xu K., Elliott T. 

J. Bacteriol. 176:3196-3203(1994). 

99. Corona_nucleoca (Coronavirus nucleocapsid protein) 
[1] 

Medline: 98087828 

Identification of a specific interaction between the 
coronavirus mouse hepatitis virus A59 nucleocapsid protein 
and packaging signal. 
Molenkamp R, Spaan WJ; 
Virology 1997;239:78-86. 
Number of members: 44 
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100. Cu-oxidase (Multicopper oxidase) 
[1] 

Medline: 90126844 

The blue oxidases, ascorbate oxidase, laccase and ceruloplasmin. 
Modelling and structural relationships. 
Messerschmidt A, Huber R; 
Eur J Biochem 1990;187:341-352. 
Number of members: 150 

Multicopper oxidases [1,2] are enzymes that possess three spectroscopically 
different copper centers. These centers are called: type 1 (or blue), type 2 
(or normal) and type 3 (or coupled binuclear). The enzymes that belong to 
this family are: 

- Laccase (EC 1.10.3.2) (urishiol oxidase), an enzyme found in fungi and 
plants, which oxidizes many different types of phenols and diamines. 

- Ascorbate oxidase (EC 1.103.3), a higher plant enzyme. 

- Ceruloplasmin (EC 1.16.3.1) (ferroxidase), a protein found in the serum of 
mammals and birds, which oxidizes a great variety of inorganic and organic 
substances. Structurally ceruloplasmin exhibits internal sequence homology, 
and seem to have evolved from the triplication of a copper-binding domain 
similar to that found in laccase and ascorbate oxidase. 

In addition to the above enzymes there are a number of proteins which, on the 
basis of sequence similarities, can be said to belong to this family. These 
proteins are: 

- Copper resistance protein A (copA) from a plasmid in Pseudomonas syringae. 
This protein seems to be involved in the resistance of the microbial host 

to copper. 

- Blood coagulation factor V (Fa V). 

- Blood coagulation factor VIII (Fa VIII) [El]. 

- Yeast FET3 [3], which is required for ferrous iron uptake. 

- Yeast hypothetical protein YFL041w and SpAClF7.08, the fission yeast 
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homolog. 

Factors V and VIII act as cofactors in blood coagulation and are structurally 
similar [4]. Their sequence consists of a triplicated A domain, a B domain and 
a duplicated C domain; in the following order: A-A-B-A-C-C. The A-type domain 
is related to the multicopper oxidases. 

Two signature patterns have been developed for these proteins. Both patterns are 
derived from the same region, which in ascorbate oxidase, laccase, in the 
third domain of ceruloplasmin, and in copA, contains five residues that are 
known to be involved in the binding of copper centers. The first pattern does 
not make any assumption on the presence of copper-binding residues and thus 
can detect domains that have lost the ability to bind copper (such as those in 
Fa V and Fa VIII), while the second pattern is specific to copper-binding 
domains. 

-Consensus pattern: G-x-[FYW]-x-[LIVMFW]-x-[CST]-x(8)-G-[LM]-x(3)-[LIVMFYW] 
-Consensus pattern: H-C-H-x(3)-H-x(3)-[AG]-[LM] 

[The first two H's are copper type 3 binding residues] 

[The C, the 3rd H, and L or M are copper type 1 ligands] 

101. Cullin (Cullin family) 
Number of members: 24 

The following proteins are collectively termed cullins [1]: 

- Caenorhabditis elegans cul-1 (or lin-19), a protein required for 
developmental^ programmed transitions from the Gl phase of the cell cycle 
to the GO phase or the apoptotic pathway. 

- Caenorhabditis elegans cul-2, cul-3, cul-4 (F45E12.3), cul-5 (ZK856.1) and 
cul-6 (K08E7.7). 

- Mammalian CUL1, CUL2, CUL3, CUL4A and CUL4B. 

- Mammalian vasopressin-activated calcium-mobilizing receptor (VACM-1), a 
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kidney-specific protein thought to form a cell surface receptor [2] but 
which does not have any structural hallmarks of a receptor. 

- Drosophila linl9. 

- Yeast CDC53 [3], which acts in concert with CDC4 and UBC3 (CDC34) to 
control the Gl-to-S phase transition. 

- Yeast hypothetical protein YGR003w. 

- Fission yeast hypothetical protein SpAC24H6.03. 

The cullins are hydrophilic proteins of 740 to 815 amino acids. The C-terminal 
extremity is the most conserved part of these proteins. A 
signature pattern has been developed from that region. 

-Consensus pattern: [LIV]-K-x(2)-[LIV]-x(2)-L-I-[DEQ]-[KRHNQ]-x-Y-[LIVM]-x-R- 
x(6,7)-[FY]-x-Y-x-[SA]> 

[ 1] Kipreos E.T., Lander L.E., Wing J.P., He W.W., Hedgecock E.M. 
Cell 85:829-839(1996). 
[ 2] Burnatowska-Hledin M.A., Spielman W.S., Smith W.L., Shi P., Meyer J.M., 

Dewitt D.L. 

Am. J. Physiol. 268:fll98-F1210(1995). 
[ 3] Mathias N., Johnson S.L., Winey M., Adams A.E., Goetsch L., Pringle J.R., 
Byers B., Goebl M.G. 
Mol. Cell. Biol. 16:6634-6643(1996). 

102. (Cu_amine_oxid) 

Copper amine oxidase signatures 

Amine oxidases (AO) [1] are enzymes that catalyze the oxidation of a wide range of biogenic 
amines including many neurotransmitters, histamine and xenobiotic amines. There are two 
classes of amine oxidases: flavin-containing (EC 1.4.3.4) and copper-containing (EC 1.4.3.6). 

Copper-containing AO is found in bacteria, fungi, plants and animals, it is an homodimeric 
enzyme that binds one copper ion per subunit as well as a 2,4,5- trihydroxyphenylalanine 
quinone (or topaquinone) (TPQ) cofactor. This cofactor is derived from a tyrosine residue. 
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Two signature patterns were derived for copper AO, the first one contains the tyrosine which 
give rises to the TPQ cofactor while the second one contains one of the three histidines 
that bind the copper atom [2]. 

Consensus pattem[LIVM]-[LIVMA]-[LIVMF]-x(4)-[ST]-x(2)-N-Y-[DE]-[YN] [The first Y 
gives rises to TPQ] Sequences known to belong to this class detected by the patternALL. 

Consensus patternT-x-[GS]-x(2)-H-[LIVMF]-x(3)-E-[DE]-x-P [H is a copper ligand] 
Sequences known to belong to this class detected by the pattern ALL, except for lentil AO. 

[ 1] Knowles P.F., Dooley D.M. (In) Metal ions in biological systems; Sigel H. ? Sigel A., 
Eds., 30:361- 403, Marcel Dekker, New-York, (1993). 

[ 2] Parsons M.R., Convery M.A., Wilmot CM., Yadav K.D.S., Blakeley V., Corner A.S., 
Phillips S.E.V., McPherson M.J., Knowles P.F. Structure 3:1171-1184(1995). 

103. Cys-protease (Cysteine protease) 
Number of members: 358 

Eukaryotic thiol proteases (EC 3.4.22.-) [1] are a family of proteolytic 
enzymes which contain an active site cysteine. Catalysis proceeds through a 
thioester intermediate and is facilitated by a nearby histidine side chain; an 
asparagine completes the essential catalytic triad. The proteases which are 
currently known to belong to this family are listed below (references are 
only provided for recently determined sequences). 

- Vertebrate lysosomal cathepsins B (EC 3.4.22.1), H (EC 3.4.22.16), L 
(EC 3.4.22.15), and S (EC 3.4.22.27) [2]. 

- Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as 
cathepsin C) [2]. 

- Vertebrate calpains (EC 3.4.22.17). Calpains are intracellular calcium- 
activated thiol protease that contain both a N-terminal catalytic domain 
and a C-terminal calcium-binding domain. 
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- Mammalian cathepsin K, which seems involved in osteoclastic bone resorption 
[3]. 

- Human cathepsin 0 [4]. 

- Bleomycin hydrolase. An enzyme that catalyzes the inactivation of the 
antitumor drug BLM (a glycopeptide). 

- Plant enzymes: barley aleurain (EC 3.4.22.16), EP-B1/B4; kidney bean EP-C1, 
rice bean SH-EP; kiwi fruit actinidin (EC 3.4.22.14); papaya latex papain 
(EC 3.4.22.2), chymopapain (EC 3.4.22.6), caricain (EC 3.4.22.30), and 
proteinase IV (EC 3.4.22.25); pea turgor-responsive protein 15A; pineapple 
stem bromelain (EC 3.4.22.32); rape COT44; rice oryzain alpha, beta, and 
gamma; tomato low-temperature induced, Arabidopsis thaliana A494, RD19A and 
RD21A. 

- House-dust mites allergens DerPl and EurMl. 

- Cathepsin B-like proteinases from the worms Caenorhabditis elegans (genes 
gcp-l,cpr-3, cpr-4, cpr-5 and cpr-6), Schistosoma mansoni (antigen SM31) 
and Japonica (antigen SJ31), Haemonchus contortus (genes AC-1 and AC-2), 
and Ostertagia ostertagi (CP-1 and CP-3). 

- Slime mold cysteine proteinases CP1 and CP2. 

- Cruzipain from Trypanosoma cruzi and brucei, 

- Throphozoite cysteine proteinase (TCP) from various Plasmodium species. 

- Proteases from Leishmania mexicana, Theileria annulata and Theileria parva. 

- Baculoviruses cathepsin-like enzyme (v-cath). 

- Drosophila small optic lobes protein (gene sol), a neuronal protein that 
contains a calpain-like domain. 

- Yeast thiol protease BLH1/YCP1/LAP3. 

- Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like 
protein. 

Two bacterial peptidases are also part of this family: 

- Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 

- Thiol protease tpr from Porphyromonas gingivalis. 



Three other proteins are structurally related to this family, but may have 
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-Soybean oil body protein P34. This protein has its active site cysteine 
replaced by a glycine. 

- Rat testin, a Sertoli cell secretory protein highly similar to cathepsin L 
but with the active site cysteine is replaced by a serine. Rat testin 
should not be confused with mouse testin which is a LIM-domain protein (see 
<PDOC00382>). 

- Plasmodium falciparum serine-repeat protein (SERA), the major blood stage 
antigen. This protein of 111 Kd possesses a C-terminal thiol-protease-like 
domain [6], but the active site cysteine is replaced by a serine. 

The sequences around the three active site residues are well conserved and can 
be used as signature patterns. 

-Consensus pattern: Q-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC]-[STAGCV] [C is the active site 
residue] 

-Consensus pattern: [LIVMGSTAN]-x-H-[GSACE]-[LIVM]-x-[LIVMAT](2)-G-x- 
[GSADNH] [H is the active site residue] 

-Consensus pattern: [FYCH]-[WI]-[LIVT]-x-[KRQAG]-N-[ST]-W-x(3)-[FYW]-G-x(2)-G- 
[LFYW]-[LIVMFYG]-x-[LIVMF] [N is the active site residue] 

[ 1] Dufour E. Biochimie 70:1335-1342(1988). 

[ 2] Kirschke H., Barrett A.J., Rawlings N.D. Protein Prof. 2:1587-1643(1995). 

[ 3] Shi G.-P., Chapman H.A., Bhairi S.M., Deleeuw C, Reddy V.Y., Weiss S.J. FEBS Lett. 

357:129-134(1995). 

[ 4] Velasco G., Ferrando A.A., Puente X.S., Sanchez L.M., Lopez-Otin C. J. Biol. Chem. 
269:27136-27142(1994). 

[ 5] Chapot-Chartier M.P., Nardi M., Chopin M.C., Chopin A., Gripon J.C. Appl. Environ. 
Microbiol. 59:330-333(1993). 

[ 6] Higgins D.G., McConnell D.J., Sharp P.M. Nature 340:604-604(1989). 
[ 7] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 
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104, Cys__Met_Meta _PP (Cys/Met metabolism PLP-dependent enzyme) 
[1] Medline: 96428687 

Crystal structure of the pyridoxal-5 '-phosphate dependent 
5 cystathionine beta-lyase from Escherichia coli at 1.83 A. 

Clausen T, Huber R, Laber B, Pohlenz HD, Messerschmidt A; 
J Mol Biol 1996;262:202-224. 
[1] Medline: 99059720 

Crystal structure of Escherichia coli cystathionine 
1 0 gamma- synthase at 1 .5 A resolution. 

Clausen T, Huber R, Prade L, Wahl MC, Messerschmidt A; 
EMBO J 1998;17:6827-6838. 
Database Reference: SCOP; lcsl; fa; [SCOP-USA] [CATH-PDBSUM] 
This family includes enzymes involved in cysteine and 
1 5 methionine metabolism. The following are members: 
Cystathionine gamma-lyase, 
Cystathionine gamma-synthase, 
Cystathionine beta-lyase, 
Methionine gamma-lyase, 
2 0 OAH/OAS sulfhydrylase, 

O-succinylhomoserine sulphhydrylase 

All of these members participate is slightly different reactions. 
All these enzymes use PLP (pyridoxal-5 -phosphate) as a cof actor. 
Number of members: 52 

25 

A number of pyridoxal-dependent enzymes involved in the metabolism of 
cysteine, homocysteine and methionine have been shown [1,2] to be evolutionary 
related. These are: 

30 -Cystathionine gamma-lyase (EC 4.4.1.1) (gamma-cystathionase), which 
catalyzes the transformation of cystathionine into cysteine, oxobutanoate 
and ammonia. This is the final reaction in the transulfuration pathway that 
leads from methionine to cysteine in eukaryotes. 
- Cystathionine gamma-synthase (EC 4.2.99.9), which catalyzes the conversion 
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of cysteine and succinyl-homoserine into cystathionine and succinate: the 
first step in the biosynthesis of methionine from cysteine in bacteria 
(gene metB). 

- Cystathionine beta-lyase (EC 4.4.1.8) (beta-cystathionase), which catalyzes 
5 the conversion of cystathionine into homocysteine, pyruvate and ammonia: 

the second step in the biosynthesis of methionine from cysteine in bacteria 
(gene metC). 

- Methionine gamma-lyase (EC 4.4.1.11) (L-methioninase) which catalyzes the 
transformation of methionine into methanethiol, oxobutanoate and ammonia. 

1 0 - OAH/OAS sulfhydrylase, which catalyzes the conversion of acetylhomoserine 
into homocysteine and that of acetylserine into cysteine (gene MET17 or 
MET25 in yeast). 

- O-succinylhomoserine sulfhydrylase (EC 4.2.99.-). 

- Yeast hypothetical protein YGL184c. 
1 5 - Yeast hypothetical protein YHR1 12c. 

These enzymes are proteins of about 400 amino-acid residues. The pyridoxal-P 
group is attached to a lysine residue located in the central section of these 
enzymes; the sequence around this residue is highly conserved and can be used 
2 0 as a signature pattern to detect this class of enzymes. 

-Consensus pattern: [DQ]-[LIVMF]-x(3)-[STAGC]-[STAGCI]-T-K-[FYWQ]-[LIVMF]-x-G- 
[HQ]-[SGNH] [K is the pyridoxal-P attachment site] 

25 [1] Ono B.L, Tanaka K., Naito K. ? Heike C, Shinoda S. ? Yamamoto S., 

Ohmori S., Oshima T., Toh-E A. 

J. Bacteriol. 174:3339-3347(1992). 
[ 2] Barton A.B. ? Kaback D.B., Clark M.W., Keng T., Ouellette B.F.F., 

Storms R.K., Zeng B. ? Zhong W.W., Fortin N., Delaney S., Bussey H. 
30 Yeast 9:363-369(1993). 



105. Cyt_reductase 

FAD/NAD-binding Cytochrome reductase 
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Number of members: 60 
[1] Medline: 95111952 

Crystal structure of the FAD-containing fragment of corn 
nitrate reductase at 2.5 A resolution: relationship to other 
flavoprotein reductases. 
Lu G, Campbell WH, Schneider G, Lindqvist Y; 

Structure 1994;2:809-821. 
[2] Medline: 92084635 

The sequence of squash NADH:nitrate reductase and its 
relationship to the sequences of other flavoprotein 
oxidoreductases. A family of flavoprotein pyridine 
nucleotide cytochrome reductases. 
Hyde GE, Crawford NM, Campbell WH; 
J Biol Chem 1991;266:23542-23547. 



106. Cytidylyltrans 
Phosphatidate cytidylyltransferase 
Number of members: 21 

Phosphatidate cytidylyltransferase (EC 2.7.7.41) [1,2,3] (also known as CDP- 
diacylglycerol synthase) (CDS) is the enzyme that catalyzes the synthesis of 
CDP-diacylglycerol from CTP and phosphatidate (PA). CDP-diacylglycerol is an 
important branch point intermediate in both prokaryotic and eukaryotic 
organisms. CDS is a membrane-bound enzyme. A conserved region located in the 
C-terminal part has been selected as a signature pattern. 

-Consensus pattern: S-x-[LIVMF]-K-R-x(4)-K-D-x-[GSA]-x(2)-[LI]-[PG]-x-H-G-G- 
[LIVM]-x-D-R-[LIVMF]-D 

[ 1] Sparrow CP., Raetz C.R.H. 

J. Biol. Chem. 260:12084-12091(1985). 
[ 2] Shen H., Heacock P.N., Clancey C.J., Dowhan W. 

J. Biol. Chem. 271:789-795(1996). 
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[ 3] Saito S. ? Goto K. ? Tonosaki A., Kondo H. 
J. Biol. Chem. 272:9503-9509(1997). 

107. (Cytidylyltransf) Cytidylyltransferase. This family includes: Cholinephosphate 
cy tidy ly transferase. Glycerol-3-phosphate cytidylyltransferase. 

Number of members: 64 

[1] Medline: 10208837 CTP:Phosphocholine Cytidylyltransferase: Insights into Regulatory 
Mechanisms and Novel Functions. Clement JM, Kent C; Biochem Biophys Res Commun 
1999;257:643-650. 

108. (cNMP binding) Cyclic nucleotide-binding domain signatures and profile 

Proteins that bind cyclic nucleotides (cAMP or cGMP) share a structural domainof about 120 
residues [1-3]. The best studied of these proteins is theprokaryotic catabolite gene activator 
(also known as the cAMP receptorprotein) (gene crp) where such a domain is known to be 
composed of threealpha-helices and a distinctive eight-stranded, antiparallel beta- 
barrelstructure. Such a domain is known to exist in the following proteins: - Prokaryotic 
catabolite gene activator protein (CAP). - cAMP- and cGMP-dependent protein kinases 
(cAPK and cGPK). Both types of kinases contains two tandem copies of the cyclic 
nucleotide-binding domain. The cAPK's are composed of two different subunits: a catalytic 
chain and a regulatory chain which contains both copies of the domain. The cGPK's are 
single chain enzymes that include the two copies of the domain in their N- terminal section. 
The nucleotide specificity of cAPK and cGPK is due to an amino acid in the conserved 
region of beta-barrel 7: a threonine that is invariant in cGPK is an alanine in most cAPK. - 
Vertebrate cyclic nucleotide-gated ion-channels. Two such cations channels have been fully 
characterized. One is found in rod cells where it plays a role in visual signal transduction. It 
specifically binds to cGMP leading to an opening of the channel and thereby causing a 
depolarization of rod photoreceptors. In olfactory epithelium a similar, cAMP-binding, 
channel plays a role in odorant signal transduction. There are six invariant amino acids in 
this domain, three of which are glycine residues that are thought to be essential for 
maintenance of the of the beta-barrel. Two signature patterns for this domain have been 
developed. The first pattern is located within beta-barrels 2 and 3 and contains the first two 
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conserved Gly. The second pattern is located within beta-barrels 6 and 7 and contains the 
third conserved Gly as well as the three other invariant residues.- 

First consensus pattern: [LIVM]-[VIC]-x(2)-G-[DENQTA]-x-[GAC]-x(2)-[LIVMFY](4)- 
x(2)-G 

Second consensus pattern: [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,ll)-R-[STAQ]-A-x- 
[LIVMA]-x- [STACV]- 

[ 1] Weber I.T., Shabb J.B., Corbin J.D. Biochemistry 28:6122-6127(1989). 

[ 2] Kaupp U.B. Trends Neurosci. 14:150-157(1991). 

[ 3] Shabb J.B., Corbin J.D. J. Biol. Chem. 267:5723-5726(1992). 

109. (cadherin) 

Cadherins extracellular repeated domain signature 

Cadherins [1,2] are a family of animal glycoproteins responsible for calcium-dependent cell- 
cell adhesion. Cadherins preferentially interact with themselves in a homophilic manner in 
connecting cells; thus acting as both receptor and ligand. A wide number of tissue-specific 
forms of cadherins are known: 

- Epithelial (E-cadherin) (also known as uvomorulin or L-CAM) (CDH1). 

- Neural (N-cadherin) (CDH2). 

- Placental (P-cadherin) (CDH3). 

- Retinal (R-cadherin) (CDH4). 

- Vascular endothelial (VE-cadherin) (CDH5). 

- Kidney (K-cadherin) (CDH6). 

- Cadherin-8 (CDH8). 

- Osteoblast (OB-cadherin) (CDH11). 

- Brain (BR-cadherin) (CDH12). 

- T-cadherin (truncated cadherin) (CDH13). 

- Muscle (M-cadherin) (CDH14). 

- Liver-intestine (Ll-cadherin). 

- EP-cadherin. 
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Structurally, cadherins are built of the following domains: a signal sequence, followed by a 
propeptide of about 130 residues, then an extracellular domain of around 600 residues, then a 
transmembrane region, and finally a C-terminal cytoplasmic domain of about 150 residues. 
The extracellular domain can be sub- divided into five parts: there are four repeats of about 
5 110 residues followed by a region that contains four conserved cysteines. It is suggested that 
the calcium-binding region of cadherins is located in the extracellular repeats. 

Cadherins are evolutionary related to the desmogleins which are component of intercellular 
desmosome junctions involved in the interaction of plaque proteins: 

10 

- Desmoglein 1 (desmosomal glycoprotein I). 

- Desmoglein 2. 

- Desmoglein 3 (Pemphigus vulgaris antigen). 

1 5 The Drosophila fat protein [3] is a huge protein of over 5000 amino acids that contains 34 
cadherin-like repeats in its extracellular domain. 

The signature pattern that was developed for the repeated domain is located in it the C- 
terminal extremity which is its best conserved region. The pattern includes two conserved 
2 0 aspartic acid residues as well as two asparagines; these residues could be implicated in the 
binding of calcium. 

Consensus pattern[LIV]-x-[LIV]-x-D-x-N-D-[NH]-x-P Sequences known to belong to this 
class detected by the pattern ALL. Note this pattern is found in the first, second, and fourth 
2 5 copies of the repeated domain. In the third copy there is a deletion of one residue after the 
second conserved Asp. 

[ 1] Takeichi M. Annu. Rev. Biochem. 59:237-252(1990). 
[ 2] Takeichi M. Trends Genet. 3:213-217(1987). 
30 [3] Mahoney P.A., Weber U., Onofrechuk P., Biessmann H., Bryant P.J., Goodman C.S. Cell 
67:853-868(1991). 



110. Calreticulin family signatures 
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Calreticulin [1] (also known as calregulin, CRP55 or HACBP) is a high-capacitycalcium- 
binding protein which is present in most tissues and located at the periphery of the 
endoplasmic (ER) and the sarcoplamic reticulum (SR)membranes. It probably plays a role in 
the storage of calcium in the lumen ofthe ER and SR and it may well have other important 
functions. Structurally, calreticulin is a protein of about 400 amino acid residues consisting of 
three domains: a) An N-terminal, probably globular, domain of about 180 amino acid 
residues (N-domain); b) A central domain of about 70 residues (P-domain) which contains 
three repeats of an acidic 17 amino acid motif. This region binds calcium with a low- 
capacity, but a high-affinity; c) A C-terminal domain rich in acidic residues and in lysine (C- 
domain). This region binds calcium with a high-capacity but a low-affinity. Calreticulin is 
evolutionary related to the following proteins: - Onchocerca volvulus antigen RAL-1. RAL-1 
is highly similar to calreticulin, but possesses a C-terminal domain rich in lysine and arginine 
and lacks acidic residues and is therefore not expected to bind calcium in that region. - 
Calnexin [2]. A calcium-binding protein that interacts with newly synthesized glycoproteins 
in the endoplasmic reticulum. It seems to play a major role in the quality control apparatus of 
the ER by the retention of incorrectly folded proteins. - Calmegin [3] (or calnexin-T), a testis- 
specific calcium-binding protein highly similar to calnexin. Three signature patterns have 
been developed for this family of proteins. The first two patterns are based on conserved 
regions in the N-domain; the third pattern corresponds to positions 4 to 16 of the repeated 
motif in the P-domain. 

Consensus pattern: [KRHN]-x-[DEQN]-[DEQNK]-x(3)-C-G-G-[AG]-[FY]-[LIVM]-[KN]- 
[LIVMFY](2)- 

Consensus pattern: [LIVM](2)-F-G-P-D-x-C-[AG]- 

Consensus pattern: [IV]-x-D-x-[DENST]-x(2)-K-P-[DEH]-D-W-[DEN]- 

[ 1] Michalak M., Milner R.E., Burns K., Opas M. Biochem. J. 285:681-692(1992). 
[ 2] Bergeron JJ.M., Brenner M.B., Thomas D.Y., Williams D.B. Trends Biochem. Sci. 
19:124-128(1994). 

[ 3] Watanabe D., Yamada K., Nishina Y., Tajima Y., Koshimizu U., Nagata A., Nishimune 
Y. J. Biol. Chem. 269:7744-7749(1994). 

111. Eukary otic- type carbonic anhydrases signature (carb_anhydrase) 
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Carbonic anhydrases (EC 4.2.1.1 ) (CA) [1,2,3,4] are zinc metalloenzymes which catalyze the 
reversible hydration of carbon dioxide. Eight enzymatic and evolutionary related forms of 
carbonic anhydrase are currently known to exist in vertebrates: three cytosolic isozymes (CA- 
I, CA-II and CA-III); two membrane-bound forms (CA-IV and CA-VII); a mitochondrial 
form (CA-V); a secreted salivary form (CA-VI); and a yet uncharacterized isozyme [5]. In the 
alga Chlamydomonas reinhardtii, two CA isozymes have been sequenced[6]. They are 
periplasmic glycoproteins evolutionary related to vertebrate CAs. Some bacteria, such as 
Neisseria gonorrhoeae [7] also have a eukaryotic-type CA.CAs contain a single zinc atom 
bound to three conserved histidine residues. As a signature for CAs, a pattern has been 
developed which includes one of these zinc-binding histidines. Protein D8 from Vaccinia and 
other poxviruses is related to CAs but has lost two of the zinc-binding histidines as well as 
many otherwise conserved residues. This is also true of the N-terminal extracellular domain 
of some receptor-type tyrosine-protein phosphatases (see <PDOC0Q323>). 
Consensus pattern: S-E-[HN]-x-[LIVM]-x(4)-[FYH]-x(2)-E-[LIVMGA]-H-[LIVMFA](2) 
[The second H is a zinc ligand]- 

Note: most prokaryotic CA's as well as plant chloroplast CA's belong to another, evolutionary 
distinct family of proteins (see < FDOC00586 

[ 1] Deutsch H.F. Int. J. Biochem. 19:101-113(1987). 

[ 2] Fernley R.T. Trends Biochem. Sci. 13:356-359(1988). 

[ 3] Tashian R.E. BioEssays 10:186-192(1989). 

[ 4] Edwards Y. Biochem. Soc. Trans. 18:171-175(1990). 

[ 5] Skaggs L.A., Bergenhem N.C.H., Venta P.J., Tashian R.E. Gene 126:291-292(1993). 

[ 6] Fujiwara S., Fukuzawa H., Tachiki A., Miyachi S. Proc. Natl. Acad. Sci. U.S.A. 87:9779- 

9783(1990). 

[ 7] Huang S. ? Xue Y., Sauer-Eriksson E., Chirica L., Lindskog S.Jonsson B.H. 2.3.CO;2-"J. 
MoL Biol. 283:301-310(1998). 

112. Caseins alpha/beta signature 

Caseins [1] are the major protein constituent of milk. Caseins can be classified into two 
families; the first consists of the kappa-caseins, and the second groups the alpha-sl, alpha-s2, 
and beta-caseins. The alpha/beta caseins are a rapidly diverging family of proteins. However 
two regions are conserved: a cluster of phosphorylated serine residues and the signal 
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sequence. The signature pattern has been developed for this family of proteins based upon 
the last eight residues of the signal sequence. 
Consensus pattern: C-L~[LV]-A-x-A-[LVF]-A- 

[ 1] Holt C, Sawyer L. Protein Eng. 2:251-259(1988). 
113. Catalase signatures 

Catalase (EC 1.11.1.6 ) [1,2,3] is an enzyme, present in all aerobic cells,that decomposes 
hydrogen peroxide to molecular oxygen and water. Its main function is to protect cells from 
the toxic effects of hydrogen peroxide. In eukaryotic organisms and in some prokaryotes 
catalase is a molecule composed of four identical subunits. Each of the subunits binds one 
protoheme IX group. A conserved tyrosine serves as the heme proximal side ligand. The 
region around this residue has been used as a first signature pattern; it also includes a 
conserved arginine that participates in heme-binding. A conserved histidine has been shown 
to be important for the catalytic mechanism of the enzyme. The region around this residue 
has been selected as a second signature pattern.- 

Consensus pattern: R-[LIVMFSTAN]-F-[GASTNP]-Y-x-D-[AST]-[QEH] [Y is the proximal 
heme-binding ligand] 

Consensus pattern: [IF]-x-[RH]-x(4)-[EQ]-R-x(2)-H-x(2)-[GAS]-[GASTF]-[GAST] [H is an 
active site residue] 

Note: some prokaryotic catalases belong to the peroxidase family (see <PDOC00394>). 

[ 1] Murthy M.R.N., Reid T.J. III, Sicignano A., Tanaka N., Rossmann M.G. J. MoL Biol. 
152:465-499(1981). 

[ 2] Melik-Adamyan W.R., Barynin V.V., Vagin A.A., Borisov V.V., Vainshtein B.K., Fita 

L, Murthy M.R.N., Rossmann M.G. J. MoL Biol. 188:63-72(1986). 

[ 3] von Ossowki I., Hausner G., Loewen P.C. J. Mol. Evol. 37:71-76(1993). 

114. (chitin binding) Chitin recognition or binding domain signature 

A conserved domain of 43 amino acids is found in several plant and fungal proteins that have 
a common binding specificity for oligosaccharides of N-acetylglucosamine [1]. This domain 
may be involved in the recognition or binding of chitin subunits. It has been found in the 
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proteins listed below. - A number of non-leguminous plant lectins. The best characterized of 
these lectins are the three highly homologous wheat germ agglutinins (WGA-1, 2 and 3). 
WGA is an N-acetylglucosamine/N-acetylneuraminic acid binding lectin which structurally 
consists of a fourfold repetition of the 43 amino acid domain. The same type of structure is 
found in a barley root-specific lectin as well as a rice lectin. - Plants endochitinases (EC 
3.2.1,14 ) from class IA (see <PDOC00620>). Endochitinases are enzymes that catalyze the 
hydrolysis of the beta-1,4 linkages of N-acetyl glucosamine polymers of chitin. Plant 
chitinases function as a defense against chitin containing fungal pathogens. Class IA 
chitinases generally contain one copy of the chitin-binding domain at their N-terminal 
extremity. An exception is agglutinin/chitinase [2] from the stinging nettle Urtica dioica 
which contains two copies of the domain. - Hevein [5], a wound-induced protein found in the 
latex of rubber trees. - Winl and win2, two wound-induced proteins from potato. - 
Kluyveromyces lactis killer toxin alpha subunit [3]. The toxin encoded by the linear plasmid 
pGKLl is composed of three subunits: alpha, beta, and gamma. The gamma subunit harbors 
toxin activity and inhibits growth of sensitive yeast strains in the Gl phase of the cell cycle; 
the alpha subunit, which is proteolytically processed from a larger precursor that also 
contains the beta subunit, is a chitinase (see <PDOC00839>). In chitinases, as well as in the 
potato wound-induced proteins, the 43-residuedomain directly follows the signal sequence 
and is therefore at the N-terminal of the mature protein; in the killer toxin alpha subunit it is 
located in the central section of the protein. The domain contains eight conserved cysteine 
residues which have all been shown, in WGA, to be involved in disulfide bonds. The 

topological arrangement of the four disulfide bonds is shown in the following figure: + 

+ +___.( + I 1 1 I J xxCgxxxxxxxCxxxxCCsxxgxCgxxxxxCxxxCxxxxC | 

1 1 1 j _j _|_ _|_ conserved cysteine involved in a 

disulfide bond/* 1 : position of the pattern. 

-Consensus pattern: C-x(4,5)-C-C-S-x(2)-G-x-C-G-x(4)-[FYW]-C [The five Cs are involved 
in disulfide bonds] 

[ 1] Wright H.T., Sandrasegaram G., Wright CS. J. Mol. Evol. 33:283-294(1991). 

[ 2] Lerner D.R., Raikhel N.V. J. Biol. Chem. 267:11085-11091(1992). 

[ 3] Butler A.R., ODonnel R.W., Martin V.J., Gooday G.W., Stark M.J.R. Eur. J. Biochem. 

199:483-488(1991). 
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115. (Chitinase 1) Chitinases family 19 signatures 

Chitinases (EC 3.2.1.14 ) [1] are enzymes that catalyze the hydrolysis of thebeta-l,4-N-acetyl- 
D-glucosamine linkages in chitin polymers. From the viewpoint of sequence similarity 
chitinases belong to either family 18 or 19 in the classification of glycosyl hydrolases [2,E1]. 
Chitinases of family 19(also known as classes 1A or I and IB or II) are enzymes from plants 
that function in the defense against fungal and insect pathogens by destroying their chitin- 
containing cell wall. Class IA/I and IB/II enzymes differ in the presence (IA/I) or absence 
(IB/II) of a N-terminal chitin-binding domain (seethe relevant entry <PDOC00025>). The 
catalytic domain of these enzymes consist of about 220 to 230 amino acid residues. Two 
highly conserved regions have been selected as signature patterns, the first one is located in 
the N-terminal section and contains one of the six cysteines which are conserved in most, if 
not all, of these chitinases and which is probably involved in a disulfide bond. 

Consensus pattern: C-x(4,5)-F-Y-[ST]-x(3)-[FY]-[LIVMF]-x-A-x(3)-[YF]-x(2)-F- [GSA] 
Consensus pattern: [LIVM]-[GSA]-F-x-[STAG](2)-[LIVMFY]-W-[FY]-W-[LIVM] 

[ 1] Flach J., Pilet P.-R, Jolles P. Experientia 48:701-716(1992). 
[ 2] Henrissat B. Biochem. J, 280:309-316(1991). 

116. chloroa_b-bind 

Chlorophyll A-B binding proteins. Number of members: 211 

117. chromo 

The 'chromo* (CHRromatin Organization Modifier) domain [1 to 4] is a conserved 
region of about 60 amino acids which was originally found in Drosophila 
modifiers of variegation, which are proteins that modify the structure of 
chromatin to the condensed morphology of heterochromatin, a cytologically 
visible condition where gene expression is repressed. In protein Polycomb, the 
chromo domain has been shown to be important for chromatin targeting. Proteins 
that contains a chromo domain seem to fall into three classes: 
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a) Proteins which have a N-terminal chromo domain followed by a region which 
is related to but distinct from the chromo domain and which has been 
termed [3] the 'chromo shadow' domain. 

b) Proteins with a single chromo domain. 

5 c) Proteins with paired tandem chromo domains. 

Currently, this domain has been found in the following proteins: 
Class A. 

10 - Drosophila heterochromatin protein Su(var)205 (HP1). 

- Human heterochromatin protein HP1 alpha. 

- Mammalian modifier 1 and modifier 2. 

- Fission yeast swi6, a protein involved in the repression of the silent 
mating-type loci mat2 and mat3. 

15 

Class B. 

- Drosophila protein Polycomb (Pc). 

- Mammalian modifier 3, a homolog of Pc. 

- Drosophila protein Su(var)3-9, a suppressor of position-effect variegation. 
2 0 - Human Mi-2 autoantigen, characterisitic of dermatomyosis. 

-Fungal retrotranposon polyproteins: 'skippy' from Fusarium oxysporum, 
'grasshopper' and 'MAGGY' from Magnaporthe grisea and CfT-1 from 
Cladosporium fulvum. 

- Fission yeast hypothetical protein SpAC18G6.02c. 

2 5 - Caenorhabditis elegans hypothetical protein C29H12.5 

- Caenorhabditis elegans hypothetical protein ZK1236.2. 

- Caenorhabditis elegans hypothetical protein T09A5.8. 

Class C. 

30 - Mammalian DNA-binding/helicase proteins CHD-1 to CHD-4. 

- Yeast protein CHD1. 

The signature pattern for this domain corresponds to its best conserved section, which is 
located in its central part. 



Attorney No. 2750-1237P 

173 

-Consensus pattern: [FYL]-x-[LIVMC]-[KR]-W-x-[GDNR]-[FYWLME]-x(5 ? 6)-[ST]-W- 
[ESV]-[PSTDEN]-x(2 ? 3)-[LIVMC] 

[ 1] Paro R. Trends Genet. 6:416-421(1990). 

[ 2] Singh P.B., Miller J.R., Pearce J., Kothary R., Burton R.D., Paro R., James T.C., Gaunt 
SJ. Nucleic Acids Res. 19:789-794(1991). 

[ 3] Aasland R., Stewart A.F. Nucleic Acids Res. 23:3168-3173(1995). 

[ 4] Koonin E.V., Zhou S., Lucchesis J.C. Nucleic Acids Res. 23:4229-4233(1995). 

118. citrate_synt 

Citrate synthase (EC 4.1.3.7) (CS) is the tricarboxylic acid cycle enzyme that 
catalyzes the synthesis of citrate from oxaloacetate and acetyl-CoA in an 
aldol condensation. CS can directly form a carbon-carbon bond in the absence 
of metal ion cof actors. 

In prokaryotes, citrate synthase is composed of six identical subunits. In 
eukaryotes, there are two isozymes of citrate synthase: one is found in the 
mitochondrial matrix, the second is cytoplasmic. Both seem to be dimers of 
identical chains. 

There are a number of regions of sequence similarity between prokaryotic and 
eukaryotic citrate synthases. One of the best conserved contains a histidine 
which is one of three residues shown [1] to be involved in the catalytic 
mechanism of the vertebrate mitochondrial enzyme. This region has been used as a 
signature pattern. 

-Consensus pattern: G-[FYA]~[GA]-H-x-[IV]-x(l ? 2)-[RKT]-x(2)-D-[PS]-R [H is an active 
site residue] 



[ 1] Karpusas M. ? Branchaud B., Remington SJ. Biochemistry 29:2213-2219(1990). 
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119. clpA^B 
Chaperonin clpA/B 

CAUTION! This family is a subfamily of the AAA 
superfamily. The threshold has been set very high to 
stop overlaps with the AAA superfamily. This 
entry will be subsumed by AAA in the future. 
Number of members: 39 

A number of ATP-binding proteins that are are thought to protect cells from 
extreme stress by controlling the aggregation of denaturation of vital 
cellular structures have been shown [1,2] to be evolutionary related. These 
proteins are listed below. 

-Escherichia coli clpA, which acts as the regulatory subunit of the ATP- 
dependent protease clp. 

- Rhodopseudomonas blastica clpA homolog. 

- Escherichia coli heat shock protein clpB and homologs in other bacteria. 

- Bacillus subtilis protein mecB. 

- Yeast heat shock protein 104 (gene HSP104), which is vital for tolerance to 
heat, ethanol and other stresses. 

- Neurospora heat shock protein hsp98. 

- Yeast mitochondrial heat shock protein 78 (gene HSP78) [3], 

- CD4A and CD4b ? two highly related tomato proteins that seem to be located 
in the chloroplast. 

- Trypanosoma brucei protein clp. 

- Porphyra purpurea chloroplast encoded clpC. 

The size of these proteins range from 84 Kd (clpA) to slightly more than 100 
Kd(HSP104). They all share two conserved regions of about 200 amino acids 
that each contains an ATP-binding site. In addition to the ATP-binding A and 
B motifs there are many parts in these two domains that are also conserved. Two 
of these regions have been selected as signature patterns. The first signature 
is located in the first domain, some ten residues to the C-terminal of the 
ATP-binding B motif. The second pattern is located in the second domain in- 
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-Consensus pattern: D-[AI]-[SGA]-N-[LIVMF](2)-K-[PT]-x-L-x(2)-G 

-Consensus pattern: R-[LIVMFY]-D-x-S-E-[LIVMFY]-x-E-[KRQ]-x-[STA]-x-[STA]-[KR]- 

[LIVM]-x-G-[STA] 

[ 1] Gottesman S., Squires C, Pichersky E., Carrington M., Hobbs M., Mattick J.S., 

Dalrymple B., Kuramitsu H., Shiroza T., Foster T., Clark W.P., Ross B., Squires C.L., 

Maurizi M.R. Proc. Natl. Acad. Sci. U.S.A. 87:3513-3517(1990). 

[ 2] Parsell D.A., Sanchez Y., Stitzel J.D., Lindquist S. Nature 353:270-273(1991). 

[ 3] Leonhardt S.A., Fearon K., Danese P.N., Mason T.L. Mol. Cell. Biol. 13:6304- 

6313(1993). 

120. cofilin_ADF 

Cofilin/tropomyosin-type actin-binding proteins 
[1] 

Medline: 97290449 
Structure determination of yeast cofilin. 
Fedorov AA, Lappalainen P, Fedorov EV, Drubin DG, Almo SC; 

Nat Struct Biol 1997;4:366-369. 

[2] 

Medline: 97290450 

Crystal structure of the actin-binding protein actophorin 
from Acanthamoeba. 
Leonard SA, Gittis AG, Petrella EC, Pollard TD, Lattman EE; 

Nat Struct Biol 1997;4:369-373. 

[3] 

Medline: 97420794 

F-actin and G-actin binding are uncoupled by mutation of 
conserved tyrosine residues in maize actin depolymerizing 
factor. 

Jiang CJ, Weeds AG, Khan S, Hussey PJ; 
Proc Natl Acad Sci U S A 1997;94:9973-9978. 
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[4] 

Medline: 97357155 

Cofilin promotes rapid actin filament turnover in vivo. 
Lappalainen P, Drubin DG; 
5 Nature 1997;388:78-82. 

Severs actin filaments and binds to actin monomers. 
Number of members: 44 

Actin-depolymerizing proteins sever actin filaments (F-actin) and/or bind to 
10 actin monomers, or G-actin, thus preventing actin-polymerization by 

sequestering the monomers. The following proteins are evolutionary related 
and belong to a family of low molecular weight (137 to 166 residues) actin- 
depolymerizing proteins [1,2,3,4]: 

1 5 - Cofilin from vertebrates, slime mold and yeast. Cofilin binds to F-actin 
and acts as a pH-dependent actin-depolymerizing protein. 

- Destrin from vertebrates. Destrin binds to G-actin in a pH-independent 
manner and prevents polymerization. 

- Caenorhabditis elegans unc-60. 

2 0 - Acanthamoeba castellanii actophorin. 

- Plants actin depolymerizing factor (ADF). 

The most conserved region of these proteins is a twenty amino-acid segment 
that ends some 30 residues from their C-terminal extremity. This segment has 
2 5 been shown [5] to be important for actin-binding. 

-Consensus pattern: P-[DE]-x-[SA]-x-[LIW^ 
x(3)-[LIVMF]-[KR] 

30 [1] Hawkins M., Pope B., Maclver S.K., Weeds A.G. Biochemistry 32:9985-9993(1993). 

[ 2] Iida K., Moriyama K., Matsumoto S., Kawasaki H., Nishida E., Yahara I. Gene 124:115- 
120(1993). 

[ 3] Quirk S., Maclver S.K., Ampe C, Doberstein S.K., Kaiser DA., van Damme J. ? 
Vandekerckhove J., Pollard T.D. Biochemistry 32:8525-8533(1993). 
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[ 4] McKim K.S., Matheson C, Marra MA, Wakarchuk MR, Baillie D.L. MoL Gen. Genet. 
242:346-357(1994). 

[ 5] Moriyama K. ? Yonezawa N., Sakai H. 5 Yahara L, Nishida E. J. Biol. Chem. 267:7240- 
7244(1992). 

5 

121. (Complex 24kd) Respiratory-chain NADH dehydrogenase 24 Kd subunit signature 
Respiratory-chain NADH dehydrogenase (EC 1.6.5.3 ) [1,2] (also known as complexl or 
NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex located in the inner 

1 0 mitochondrial membrane which also seems to exist inthe chloroplast and in cyanobacteria (as 
a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide subunits of this 
bioenergetic enzyme complex there is one with a molecular weight of 24 Kd (in mammals), 
which is a component of the iron-sulfur (IP) fragment of the enzyme. It seems to bind a2Fe- 
2S iron-sulfur cluster. The 24 Kd subunit is nuclear encoded, as aprecursor form with a 

1 5 transit peptide in mammals, and in Neurospora crassa.The 24 Kd subunit is highly similar to 
[3,4]: - Subunit E of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoE). - 
Subunit NQ02 of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. A highly 
conserved region, located in the central section of this subunit containing two conserved 
cysteines that are probably involved in the binding of the 2Fe-2S center has been selected as a 

2 0 signature pattern. 

-Consensus pattern: D-x(2)-F-[ST]-x(5)-C-L-G-x-C-x(2) [GA]-P [The two C's are putative 
2Fe-2S ligands] 

[ 1] Ragan CI. Curr. Top. Bioenerg. 15:1-36(1987). 
25 [2] Weiss H., Friedrich T. ? Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991). 
[ 3] Fearnley I.M., Walker J.E. Biochim. Biophys. Acta 1140:105-134(1992). 
[ 4] Weidner U., Geier S., Ptock A., Friedrich T., Leif H., Weiss H. J. Mol. Biol. 233:109- 
122(1993). 



122. copper-bind 

Copper binding proteins, plastocyanin/azurin family 
Number of members: 70 
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Blue or 'type- r copper proteins are small proteins which bind a single 
copper atom and which are characterized by an intense electronic absorption 
band near 600 nm [1,2]. The most well known members of this class of proteins 
are the plant chloroplastic plastocyanins, which exchange electrons with 
5 cytochrome c6, and the distantly related bacterial azurins, which exchange 
electrons with cytochrome c551. This family of proteins also includes all the 
proteins listed below (references are only provided for recently determined 
sequences). 

1 0 - Amicyanin from bacteria such as Methylobacterium extorquens or Thiobacillus 
versutus that can grow on methylamine. Amicyanin appears to be an electron 
receptor for methylamine dehydrogenase. 

- Auracyanins A and B from Chloroflexus aurantiacus [3], These proteins can 
donate electrons to cytochrome c-554. 

1 5 - Blue copper protein from Alcaligenes faecalis. 

- Cupredoxin (CPC) from cucumber peelings [4]. 

- Cusacyanin (basic blue protein; plantacyanin, CBP) from cucumber. 

- Halocyanin from Natrobacterium pharaonis [5], a membrane associated copper- 
binding protein. 

2 0 - Pseudoazurin from Pseudomonas. 

- Rusticyanin from Thiobacillus ferrooxidans. Rusticyanin is an electron 
carrier from cytochrome c-552 to the a-type oxidase [6], 

- Stellacyanin from the Japanese lacquer tree. 

- Umecyanin from horseradish roots. 

25 

- Allergen Ra3 from ragweed. This pollen protein is evolutionary related to 
the above proteins, but seems to have lost the ability to bind copper. 

Although there is an appreciable amount of divergence in the sequence of all 

3 0 these proteins, the copper ligand sites are conserved and a pattern which includes two 

of the ligands (a cysteine and a histidine) has been developed. 



-Consensus pattern: [GA]-x(0,2)-[YSA]-x(0,l)-[VFY]-x-C-x(l,2)-[PG]-x(0,l)-H-x(2,4)- 
[MQ] [C and H are copper ligands] 
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[ 1] Garret T.P J., Clingeleffer DJ., Guss J.M., Rogers S J., Freeman H.C. J. Biol. Chem. 
259:2822-2825(1984). 

[ 2] Ryden L.G., Hunt L.T. J. Mol. EvoL 36:41-66(1993). 

[ 3] McManus J.D., Brune D.C., Han J., Sanders-Loehr J., Meyer T.E., Cusanovich M.A., 
Tollin G., Blankenship R.E. J. Biol. Chem. 267:6531-6540(1992). 

[ 4] Mann K., Schaefer W. ? Thoenes U. ? Messerschmidt A., Mehrabian Z., Nalbandyan R. 
FEBS Lett. 314:220-223(1992). 

[ 5] Mattar S., Scharf B., Kent S.B.H., Rodewald K., Oesterhelt D., Engelhard M. J. Biol. 
Chem. 269:14939-14945(1994). 

[ 6] Yano T., Fukumori Y. ? Yamanaka T. FEBS Lett. 288:159-162(1991). 
123. Chaperonins cpnlO signature 

Chaperonins [1,2] are proteins involved in the folding of proteins or the assembly of 
oligomeric protein complexes. They seem to assist other polypeptides in maintaining or 
assuming conformations which permit their correct assembly into oligomeric structures. They 
are found in abundance in prokaryotes, chloroplasts and mitochondria. Chaperonins form 
oligomeric complexes and are composed of two different types of subunits: a 60 Kd protein, 
known as cpn60 (groEL in bacteria) and a 10 Kd protein, known ascpnlO (groES in 
bacteria).The cpnlO protein binds to cpn60 in the presence of MgATP and suppresses the 
ATPase activity of the latter. CpnlO is a protein of about 100 amino acid residues whose 
sequence is well conserved in bacteria, vertebrate mitochondriaand plants chloroplast [3,4]. 
CpnlO assembles as an heptamer that forms a dome[5]. As a signature pattern for cpnlO , a 
region located in the N-terminal section of the protein was selected. 

Consensus pattern: [LIVMFY]-x-P-[ILT]-x-[DEN]-[KR]-[LIVMFA](3)-[KREQ]-x(8,9)- 
[SG]-x-[LIVMFY](3)- 

Note: this pattern is found twice in the plant chloroplast protein which consist of the tandem 
repeat of a cpnlO domain 

[ 1] Ellis R.J., van der Vies S.M. Annu. Rev. Biochem. 60:321-347(1991). 

[ 2] Zeilsta-Ryalls J., Fayet O., Georgopoulos C Annu. Rev. Microbiol. 45:301-325(1991). 
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[ 3] Hartman D.J., Hoogenraad NJ., Condron R., Hoj P.B. Proc. Natl. Acad. Sci. U.S.A. 
89:3394-3398(1992). 

[ 4] Bertsch U., Soil J., Seetharam R. ? Viitanen P.V. Proc. Natl. Acad. Sci. U.S.A. 89:8696- 
8700(1992). 

5 [ 5] Hunt J.F., Weaver AJ., Landry S.J., Gierasch L., Deisenhofer J. Nature 379:37-45(1996). 
124. Chaperonins cpn60 signature (cpn60_TCPl) 

Chaperonins [1,2] are proteins involved in the folding of proteins or the assembly of 

1 0 oligomeric protein complexes. Their role seems to be to assist other polypeptides to maintain 
or assume conformations which permit their correct assembly into oligomeric structures. 
They are found in abundance in prokaryotes, chloroplasts and mitochondria. Chaperonins 
form oligomeric complexes and are composed of two different types of subunits: a 60 Kd 
protein, known as cpn60 (groEL in bacteria) and a 10 Kd protein, known as cpnlO (groES in 

1 5 bacteria). The cpn60 protein shows weak ATPase activity and is a highly conserved protein of 
about 550 to 580 amino acid residues which has been described by different names in 
different species: - Escherichia coli groEL protein, which is essential for the growth of the 
bacteria and the assembly of several bacteriophages. - Cyanobacterial groEL analogues. - 
Mycobacterium tuberculosis and leprae 65 Kd antigen, Coxiella burnetti heat shock protein B 

2 0 (gene htpB), Rickettsia tsutsugamushi major antigen 58, and Chlamydial 57 Kd 

hypersensitivity antigen (gene hypB). - Chloroplast RuBisCO subunit binding-protein alpha 
and beta chains, which bind ribulose bisphosphate carboxylase small and large subunits and 
are implicated in the assembly of the enzyme oligomer. - Mammalian mitochondrial matrix 
protein PI (mitonin or P60). - Yeast HSP60 protein, a mitochondrial assembly factor. As a 

2 5 signature pattern for these proteins, a rather well-conserved region of twelve residues, located 
in the last third of the cpn60sequence was chosen. 

Consensus pattern: A-[AS]-x-[DEQ]-E-x(4)-G-G-[GA]- 

30 [1] Ellis R.J., van der Vies S.M. Annu. Rev. Biochem. 60:321-347(1991). 

[ 2] Zeilsta-Ryalls J., Fayet O., Georgopoulos C. Annu. Rev. Microbiol. 45:301-325(1991). 



Chaperonins TCP-1 signatures (cpn60_TCPl) 
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The TCP-1 protein [1 ? 2] (Tailless Complex Polypeptide 1) was first identified in mice where 
it is especially abundant in testis but present in all cell types. It has since been found and 
characterized in many other mammalian species, in Drosophila and in yeast. TCP-1 is a 
highly conserved protein of about 60 Kd (556 to 560 residues) which participates in a hetero- 
5 oligomeric900 Kd double-torus shaped particle [3] with 6 to 8 other different subunits. These 
subunits, the chaperonin containing TCP-1 (CCT) subunit beta, gamma,delta, epsilon, zeta 
and eta are evolutionary related to TCP-1 itself [4,5]. The CCT is known to act as a molecular 
chaperone for tubulin, actin and probably some other proteins. The CCT subunits are highly 
related to archebacterial counterparts: - TF55 and TF56 [6], a molecular chaperone from 

1 0 Sulfolobus shibatae. TF55 has ATPase activity, is known to bind unfolded polypeptides and 
forms a oligomeric complex of two stacked nine-membered rings. - Thermosome [7], from 
Thermoplasma acidophilum. The thermosome is composed of two subunits (alpha and beta) 
and also seems to be a chaperone with ATPase activity. It forms an oligomeric complex of 
eight-membered rings. The TCP-1 family of proteins are weakly, but significantly [8], related 

15 to thecpn60/groEL chaperonin family (see < PDOC00268 >) .As signature patterns of this 
family of chaperonins, three conserved regions located in the N-terminal domain were 
chosen. 

Consensus pattern: [RKEL]-[ST]-x-[LMFY]-G-P-x-[GSA]-x-x-K-[LIVMF](2)~ 
2 0 Consensus pattern: [LIVM]-[TS]-[NK]-D-[GA]-[AVNHK]-[TAV]-[LIVM](2)-x(2)- 
[LIVM]-x-[LIVM]-x-[SNH]-[PQH]- 

Consensus pattern: Q-[DEK]-x-x-[LIVMGTA]-[GA]-D-G-T- 

[ 1] Ellis J. Nature 358:191-192(1992). 
25 [2] Nelson R.L, Craig E.A. Curr. Biol. 2:487-489(1992). 

[ 3] Lewis VA, Hynes G.M., Zheng D, Saibil H. ? Willison ICR. Nature 358:249-252(1992). 

[ 4] Kubota H., Hynes G., Carne A., Ashworth A., Willison K.R. Curr. Biol. 4:89-99(1994) 

[ 5] Kim S., Willison K.R., Horwich A.L. Trends Biochem. Sci. 20:543-548(1994). 

[ 6] Trent J.D., Nimmesgern R, Wall J.S., Hartl F.U., Horwich A.L. Nature 354:490- 
30 493(1991). 

[ 7] Waldmann T., Lupas A., Kellermann J., Peters J., Baumeister W. Biol. Chem. Hoppe- 
Seyler 376:119-126(1995). 

[ 8] Hemmingsen S.M. Nature 357:650-650(1992). 
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125. cyclin (Cyclins) 

The cyclins include an internal duplication, which is related 
to that found in TFIIB and the RB protein. 
5 [1] 

Medline: 94203808 

Evidence for a protein domain superfamily shared by the cyclins, 
TFIIB and RB/pl07. 
Gibson TJ, Thompson JD, Blocker A, Kouzarides T; 
1 0 Nucleic Acids Res 1994;22:946-952. 

[2] 

Medline: 96164440 
The crystal structure of cyclin A 
Brown NR, Noble MEM, Endicott JA, Garman EF, Wakatsuki S, 
1 5 Mitchell E ? Rasmussen B, Hunt T, Johnson LN; 
Structure. 1995;3:1235-1247. 
Complex of cyclin and cyclin dependant kinase. 

[3] 

Medline: 96313126 
2 0 Structural basis of cyclin-dependant kinase activation by 
phosphorylation. 
Russo AA, Jeffrey PD, Pavletich NP; 

Nat Struct Biol. 1996;3:696-700. 
Cyclins regulate cyclin dependant kinases (CDKs). 
2 5 The most divergent prosite members have been included. Swiss:P22674 
the Uracil-DNA glycosylase 2 is the highest noise and may be related 
but has not been included. 
Number of members: 189 

30 Cyclins [1,2,3] are eukaryotic proteins which play an active role in 
controlling nuclear cell division cycles. Cyclins, together with the p34 
(cdc2) or cdk2 kinases, form the Maturation Promoting Factor (MPF). There are 
two main groups of cyclins: 
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- G2/M cyclins, essential for the control of the cell cycle at the G2/M 
(mitosis) transition. G2/M cyclins accumulate steadily during G2 and are 
abruptly destroyed as cells exit from mitosis (at the end of the M-phase). 

- Gl/S cyclins, essential for the control of the cell cycle at the Gl/S 
5 (start) transition. 

In most species, there are multiple forms of Gl and G2 cyclins. For example, 
in vertebrates, there are two G2 cyclins, A and B, and at least three Gl 
cyclins, C, D, and E. 

10 

A cyclin homolog has also been found in herpesvirus saimiri [4]. 

The best conserved region is in the central part of the cyclins' sequences, 
known as the 'cyclin-box*. From this, a 32 residue pattern has been derived. 

15 

-Consensus pattern: R-x(2)-[LIVMSA]-x(2)-[FYWS]-[LIVM]-x(8)-[LIVMFC]-x(4)- 

[LIVMFYA]-x(2)-[STAGC]-[LIVMFYQ]-x-[LIVMFYC]-[LIVMFY]-D-[RKH]- 

[LIVMFYW] 

20 [1] Nurse P. Nature 344:503-508(1990). 

[ 2] Norbury C. ? Nurse P. Curr. Biol. 1:23-24(1991). 

[ 3] Lew D.J., Reed S.I. Trends Cell Biol. 2:77-81(1992). 

[ 4] Nicholas J., Cameron K.R., Honess R.W. Nature 355:362-365(1992). 

25 

126. Cystatin domain 

This is a very diverse family. Attempts to define separate subfamilies have failed. Typically, 
either the N-terminal or C-terminal end is very divergent. But splitting into two domains 
would make very short families. Cathelicidins are related to this family but have not been 
3 0 included. Number of members: 147 

Inhibitors of cysteine proteases [1,2,3], which are found in the tissues and body fluids 
of animals, in the larva of the worm Onchocerca volvulus [4], as well as in plants, can be 
grouped into three distinct but related families: 
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- Type 1 cystatins (or stefins), molecules of about 100 amino acid residues with 
neither disulfide bonds nor carbohydrate groups. 

- Type 2 cystatins, molecules of about 115 amino acid residues which contain one 
or two disulfide loops near their C-terminus. 

5 - Kininogens, which are multifunctional plasma glycoproteins. 

They are the precursor of the active peptide bradykinin and play a role in blood 
coagulation by helping to position optimally prekallikrein and factor XI next to factor XII. 
They are also inhibitors of cysteine proteases. Structurally, kininogens are made of three 
contiguous type-2 cystatin domains, followed by an additional domain (of variable length) 
1 0 which contains the sequence of bradykinin. The first of the three cystatin domains seems to 
have lost its inhibitory activity. 

In all these inhibitors, there is a conserved region of five residues which has been 
proposed to be important for the binding to the cysteine proteases. The consensus pattern 
starts one residue before this conserved region. 

15 

-Consensus pattern: [GSTEQKRV]-Q-[LIVT]-[VAF]-[SAGQ]-G-x-[LIVMNK]-x(2)- 
[LIVMFY] -x- [LIVMFYA]- [DENQKRHSI V] 

[1] Barrett AJ. Trends Biochem. Sci. 12:193-196(1987). 
2 0 [2] Rawlings N.D., Barrett A.J. J. Mol. Evol. 30:60-71(1990). 
[3] Turk V., Bode W. FEBS Lett. 285:213-219(1991). 

[4] Lustigman S., Brotman B. ? Huima T., Prince A.M. Mol. Biochem. Parasitol. 45:65- 
76(1991). 



127. cytochrome^ (Cytochrome c) 
The Pfam entry does not include all prosite members. 
The cytochrome 556 and cytochrome c' families are 
not included. 
30 Number of members: 259 



In proteins belonging to cytochrome c family [1], the heme group is covalently 
attached by thioether bonds to two conserved cysteine residues. The consensus 
sequence for this site is Cys-X-X-Cys-His and the histidine residue is one of 
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the two axial ligands of the heme iron. This arrangement is shared by all 
proteins known to belong to cytochrome c family, which presently includes 
cytochromes c, c', cl to c6, c550 to c556, cc3/Hmc, cytochrome f and reaction 
center cytochrome c. 

5 

-Consensus pattern: C-{CPWHF}-{CPWR}-C-H-{CFYW} 
[ 1] Mathews F.S. Prog. Biophys. Mol. Biol. 45:1-56(1985). 

10 

128. (DAGKa) Diacylglycerol kinase accessory domain (presumed) 

Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. 
This domain is assumed to be an accessory domain: its function is unknown. 

[1] Sakane F, Yamada K, Kanoh H, Yokoyama C, Tanabe T, Nature 1990;344:345- 
1 5 348.[2] Sakane F, Imai S, Kai M, Wada I, Kanoh H, J Biol Chem 1996;271:8394-8401.[3] 
Schaap D, de Widt J, van der Wal J, Vandekerckhove J, van, Damme J, Gussow D, Ploegh 
HL, van Blitterswijk WJ, van der, Bend RL, FEBS Lett 1990;275:151-158. [4] Kanoh H, 
Yamada K, Sakane F, Trends Biochem Sci 1990;15:47-50. 

20 

129. (DAGKc) Diacylglycerol kinase catalytic domain (presumed) 

Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. 
The catalytic domain is assumed from the finding of bacterial homologues. 

[1] Sakane F, Yamada K, Kanoh H, Yokoyama C, Tanabe T, Nature 1990;344:345- 
2 5 348. [2] Sakane F, Imai S, Kai M, Wada I, Kanoh H, J Biol Chem 1996;271:8394-8401. [3] 
Schaap D, de Widt J, van der Wal J, Vandekerckhove J, van, Damme J, Gussow D, Ploegh 
HL, van Blitterswijk WJ, van der, Bend RL, FEBS Lett 1990;275:151-158. [4] Kanoh H, 
Yamada K, Sakane F, Trends Biochem Sci 1990;15:47-50. 

30 

130. D-amino acid oxidases signature(DAO) 

D-amino acid oxidase (EC 1.4.3.3 ) (DAMOX or DAO) is an FAD flavoenzyme that catalyzes 
the oxidation of neutral and basic D-amino acids into their corresponding keto acids. DAOs 
have been characterized and sequenced in fungi and vertebrates where they are known to be 
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located in the peroxisomes. D-aspartate oxidase (EC 1.4.3.1 ) (DASOX) [1] is an enzyme, 
structurally related to DAO, which catalyzes the same reaction but is active only toward 
dicarboxylic D-amino acids. In DAO, a conserved histidine has been shown [2] to be 
important for the enzyme's catalytic activity. The conserved region around this residue has 
5 been developed as a signature pattern for these enzymes. 

Consensus pattern: [LIVM](2)-H-[NHA]-Y-G-x-[GSA](2)-x-G-x(5)-G-x-A [H is a probable 
active site residuejo- 

10 [1] Negri A., Ceciliani R, Tedeschi G. ? Simonic T., Ronchi S. J. Biol. Chem. 267:11865- 
11871(1992). 

[ 2] Miyano M., Fukui K., Watanabe F., Takahashi S., Tada M, Kanashiro M. ? Miyake Y. J. 
Biochem. 109:171-177(1991). 

15 

131. DEAD and DEAH box families ATP-dependent helicases signatures 
A number of eukaryotic and prokaryotic proteins have been characterized [1 ? 2 ? 3] on the basis 
of their structural similarity. They all seem to be involved in ATP-dependent, nucleic-acid 
unwinding. Proteins currently known to belong to this family are: - Initiation factor eIF-4A. 

2 0 Found in eukaryotes, this protein is a subunit of a high molecular weight complex involved in 
5 'cap recognition and the binding of mRNA to ribosomes. It is an ATP-dependent RNA- 
helicase. - PRP5 and PRP28. These yeast proteins are involved in various ATP-requiring 
steps of the pre-mRNA splicing process. - P110 ? a mouse protein expressed specifically 
during spermatogenesis. - An3 ? a Xenopus putative RNA helicase, closely related to P110. - 

2 5 SPP81/DED1 and DBP1, two yeast proteins probably involved in pre-mRNA splicing and 
related to P110. - Caenorhabditis elegans helicase glh-1. - MSS116, a yeast protein required 
for mitochondrial splicing. - SPB4, a yeast protein involved in the maturation of 25S 
ribosomal RNA. - p68, a human nuclear antigen. p68 has ATPase and DNA-helicase 
activities in vitro. It is involved in cell growth and division. - Rm62 (p62), a Drosophila 

30 putative RNA helicase related to p68. - DBP2, a yeast protein related to p68. - DHH1, a yeast 
protein. - DRS1, a yeast protein involved in ribosome assembly. - MAK5, a yeast protein 
involved in maintenance of dsRNA killer plasmid. - ROK1, a yeast protein. - stel3, a fission 
yeast protein. - Vasa, a Drosophila protein important for oocyte formation and specification 
of embryonic posterior structures. - Me31B, a Drosophila maternally expressed protein of 
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unknown function. - dbpA, an Escherichia coli putative RNA helicase. - deaD, an Escherichia 
coli putative RNA helicase which can suppress a mutation in the rpsB gene for ribosomal 
protein S2. - rhlB, an Escherichia coli putative RNA helicase. - rhlE, an Escherichia coli 
putative RNA helicase. - srmB, an Escherichia coli protein that shows RNA-dependent 
5 ATPase activity. It probably interacts with 23S ribosomal RNA. - Caenorhabditis elegans 
hypothetical proteins T26G10.1, ZK512.2 and ZK686.2. - Yeast hypothetical protein 
YHR065c. - Yeast hypothetical protein YHR169w. - Fission yeast hypothetical protein 
SpAC31A2.07c. - Bacillus subtilis hypothetical protein yxiN. All these proteins share a 
number of conserved sequence motifs. Some of them are specific to this family while others 

1 0 are shared by other ATP-binding proteins or by proteins belonging to the helicases 

v superfamily T [4 5 E1]. One of these motifs, called the T)-E-A-D-box ! ? represents a special 
version of the B motif of ATP-binding proteins. Some other proteins belong to a subfamily 
which have His instead of the second Asp and are thus said to be 'D-E-A-H-box' proteins 
[3,5,6,E1]. Proteins currently known to belong to this subfamily are: - PRP2, PRP16, PRP22 

1 5 and PRP43. These yeast proteins are all involved in various ATP-requiring steps of the pre- 
mRNA splicing process. - Fission yeast prhl, which my be involved in pre-mRNA splicing. - 
Male-less (mle), a Drosophila protein required in males, for dosage compensation of X 
chromosome linked genes. - RAD3 from yeast. RAD3 is a DNA helicase involved in excision 
repair of DNA damaged by UV light, bulky adducts or cross-linking agents. Fission yeast 

2 0 radl5 (rhp3) and mammalian DNA excision repair protein XPD (ERCC-2) are the homologs 
of RAD3. - Yeast CHL1 (or CTF1), which is important for chromosome transmission and 
normal cell cycle progression in G(2)/M. - Yeast TPS1. - Yeast hypothetical protein 
YKL078w. - Caenorhabditis elegans hypothetical proteins C06E1.10 and K03H1.2. - 
Poxviruses 1 early transcription factor 70 Kd subunit which acts with RNA polymerase to 

2 5 initiate transcription from early gene promoters. - 18 ? a putative vaccinia virus helicase. - 

hrpA, an Escherichia coli putative RNA helicase. Signature patterns for both subfamilies 
were developed. 

Consensus pattern: [LIVMF](2)-D-E-A-D-[RKEN]-x-[LIVMFYGSTN 

3 0 Consensus pattern: [GSAH]-x-[LIVMF](3)-D-E-[ALIV]-H-[NECR] 

Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 
A 1 (P-loop) (see the relevant entry < PDOC00017 



[ 1] Schmid S.R., Linder P. Mol. Microbiol. 6:283-292(1992). 
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[ 2] Under P., Lasko P., Ashburner M., Leroy P., Nielsen P.J., Nishi K. ? Schnier J. ? Slonimski 
P.P. Nature 337:121-122(1989). 

[ 3] Wassarman D.A., Steitz J.A. Nature 349:463-464(1991). 

[ 4] Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 
5 [ 5] Harosh L, Deschavanne P. Nucleic Acids Res. 19:6331-6331(1991). 
[ 6] Koonin E.V., Senkevich T.G. J. Gen. Virol. 73:989-993(1992). 

132. (DHBP_synthase) 3 ? 4-dihydroxy-2-butanone 4-phosphate synthase 

1 0 3,4-Dihydroxy-2-butanone 4-phosphate is biosynthesized from ribulose 5 -phosphate 

and serves as the biosynthetic precursor for the xylene ring of riboflavin. Sometimes found as 
a bifunctional enzyme with GTP_cyclohydro2 . 

Richter G ? Krieger C, Volk R, Kis K, Ritz H, Gotze E, Bacher A, Methods Enzymol 
1997;280:374-382, 

15 

133. (DHDPS) Dihydrodipicolinate synthetase signatures 

Dihydrodipicolinate synthetase (EC 4.2.1.52) (DHDPS) [1] catalyzes, in higher plants 
chloroplast and in many bacteria (gene dapA), the first reaction specific to the biosynthesis of 

2 0 lysine and of diaminopimelate. DHDPS is responsible for the condensation of aspartate 

semialdehyde and pyruvate by aping-pong mechanism in which pyruvate first binds to the 
enzyme by forming a Schiff-base with a lysine residue. Three other proteins are structurally 
related to DHDPS and probably also act via a similar catalytic mechanism: - Escherichia coli 
N-acetylneuraminate lyase (EC 4.1.3.3 ) (gene nanA), which catalyzes the condensation of N- 

25 acetyl-D-mannosamine and pyruvate to form N-acetylneuraminate. - Rhizobium meliloti 

protein mosA [3] ? which is involved in the biosynthesis of the rhizopine 3-o-methyl-scyllo- 
inosamine. - Escherichia coli hypothetical protein yjhH. Two signature patterns for these 
enzymes were developed . The first one is centered on highly conserved region in the N- 
terminal part of these proteins. The second signature contains a lysine residue which has been 

30 shown, in Escherichia coli dapA [2], to be the one that forms a Schiff-base with the substrate. 

Consensus pattern: [GSA]-[LIVM]-[LIVMFY]-x(2)-G-[ST]-[TG]-G-E-[GASNF]-x(6)- [EQ] 
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Consensus pattern: Y-[DNS]-[LIVMFA]-P-x(2)-[ST]-x(3)-[LIVMG]-x(13 ? 14)-[LIVM]- x- 
[SGA]-[LIVMF]-K-[DEQAF]-[STAC] [Kis involved in Schiff-base formation]- 

[ 1] Kaneko T., Hashimoto T., Kumpaisal R., Yamada Y. J. Biol. Chem. 265:17451- 
5 17455(1990). 

[ 2] Laber B. ? Gomis-Rueth F.-X., Romao M J., Huber R. Biochem. J. 288:691-695(1992). 
[ 3] Murphy P.J., Trenz S.P., Grzemski W., de Bruijn F.J., Schell J. J. Bacteriol. 175:5193- 
5204 (1993). 

10 

134. (DHOdehase) Dihydroorotate dehydrogenase signatures 

Dihydroorotate dehydrogenase (EC 1.3.3.1) (DHOdehase) catalyzes the fourth step in the de 
novo biosynthesis of pyrimidine, the conversion of dihydroorotate into orotate. DHOdehase 
is a ubiquitous FAD flavoprotein. In bacteria (gene pyrD), DHOdease is located on the inner 

1 5 side of the cytosolic membrane. In some yeasts, such as in Saccharomyces cerevisiae (gene 
URA1), it is a cytosolic protein while in other eukaryotes it is found in the mitochondria [1], 
The sequence of DHOdease is rather well conserved and two signature patterns were 
developed specific to this enzyme. The first corresponds to a region in the N-terminal section 
of the enzyme while the second is located in the C-terminal section and seems to be part of 

2 0 the FAD-binding domain. 

Consensus pattera[GS]-x(4)-[GK]-[GSTA]-[LIVFSTA]-[GT]-x(3)-[NQR]-x-G-[NHY]-x(2)- 
P~[RT] 

Consensus pattern[LIVM](2)-[GSA]-x-G-G-[IV]-x-[STGDN]-x(3)-[ACV]-x(6)-G-A 

25 

[ 1] Nagy M., Lacroute F., Thomas D. Proc. Natl. Acad. ScL U.S.A. 89:8966-8970(1992). 

135. (DMRL_synthase) 6,7-dimethyl-8-ribityllumazine synthase 

30 

136. (DNA_methylase) C-5 cytosine-specific DNA methylases signatures 

C-5 cytosine-specific DNA methylases (EC 2.1.1.73 ) (C5 Mtase) are enzymes that 
specifically methyl ate the C-5 carbon of cytosines in DNA [1,2,3]. Such enzymes are found 
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in the proteins described below. - As a component of type II restriction-modification systems 
in prokaryotes and some bacteriophages. Such enzymes recognize a specific DNA sequence 
where they methylate a cytosine. In doing so, they protect DNA from cleavage by type II 
restriction enzymes that recognize the same sequence. The sequences of a large number of 
5 type II C-5 Mtases are known. - In vertebrates, there are a number of C-5 Mtases that 

methylate CpG dinucleo tides. The sequence of the mammalian enzyme is known.C-5 Mtases 
share a number of short conserved regions. Two of them were selected. The first is centered 
around a conserved Pro-Cys dipeptide in which the cysteine has been shown [4] to be 
involved in the catalytic mechanism; it appears to form a covalent intermediate with the C6 
1 0 position of cytosine. The second region is located at the C-terminal extremity in type-II 
enzymes 

Consensus pattern: [DENKS]-x-[FLIV]-x(2)-[GSTC]-x-P-C-x(2)-[FYWLIM]-S [C is the 
active site residue] - 

1 5 Consensus pattern: [RKQGTF]-x(2)-G-N-[STAG]-[LIVMF]-x(3)-[LIVMT]-x(3)»[LIVM]- 
x(3)-[LIVM]- 

[ 1] Posfai J., Bhagwat A.S., Roberts R J. Gene 74:261-263(1988). 
[ 2] Kumar S., Cheng X., Klimasauskas S., Mi S., Posfai J. ? Roberts R.J., Wilson G.G„ 
2 0 Nucleic Acids Res. 22:1-10(1994). 

[ 3] Lauster R., Trautner T.A., Noyer-Weidner M. J. Mol. Biol. 206:305-312(1989). 
[ 4] Chen L. ? McMillan A.M., Chang W., Ezak-Nipkay K., Lane W.S., Verdine G.L. 
Biochemistry 30:1 1018-1 1025(1991). 

25 

137. (DNAphotolyase) DNA photolyases class 2 signatures 

Deoxyribodipyrimidine photolyase (EC 4.1.99.3 ) (DNA photolyase) [1,2] is a DNArepair 
enzyme. It binds to UV-damaged DNA containing pyrimidine dimers and, upon absorbing a 
near-UV photon (300 to 500 nm), breaks the cyclobutane ring joining the two pyrimidines of 
30 the dimer. DNA photolyase is an enzyme that requires two choromophore-cof actors for its 
activity: a reduced FADH2 and either 5 ? 10-methenyltetrahydrofolate (5,10-MTFH) or an 
oxidized 8-hydroxy-5-deazaflavin (8-HDF) derivative (F420). The folate or deazaflavin 
chromophore appears to function as an antenna, while the FADH2 chromophore is thought to 
be responsible for electron transfer. On the basis of sequence similarities[3] DNA 
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photolyases can be grouped into two classes. The second class contains enzymes from 
Myxococcus xanthus, methanogenic archaebacteria, insects, fish and marsupial mammals. It 
is not yet known what second cof actor is bound to class 2 enzymes. There are a number of 
conserved sequence regions in all known class 2 DNAphotolyases, especially in the C- 
5 terminal part. Two of these regions were selected as signature patterns. 
Consensus pattern: F-x-E-E-x-[LIVM](2)-R-R-E-L-x(2)-N-F- 

Consensus pattern: G-x-H-D-x(2)-W-x-E-R-x-[LIVM]-F-G-K-[LIVM]-R-[FY]-M-N- 

[ 1] Sancar G.B., Sancar A. Trends Biochem. ScL 12:259-261(1987). 
10 [2] Jorns M.S. Biofactors 2:207-211(1990). 

[ 3] Yasui A., Eker A.P.M., Yasuhira S., Yajima H., Kobayashi T., Takao M, Oikawa A. 
EMBO J. 13:6143-6151(1994). 

(DNAphotolyase2) DNA photolyases class 1 signatures 
15 Deoxyribodipyrimidine photolyase (EC 4.1.99.3 ) (DNA photolyase) [1,2] is a DNA repair 
enzyme. It binds to UV-damaged DNA containing pyrimidine dimers and ,upon absorbing a 
near-UV photon (300 to 500 nm), breaks the cyclobutane ring joining the two pyrimidines of 
the dimer. DNA photolyase is an enzyme that requires two choromophore-cof actors for its 
activity: a reduced FADH2 and either 5,10-methenyltetrahydrofolate (5,10-MTFH) or an 

2 0 oxidized 8-hydroxy-5-deazaflavin (8-HDF) derivative (F420). The folate or deazaflavin 

chromophore appears to function as an antenna, while the FADH2 chromophore is thought to 
be responsible for electron transfer. On the basis of sequence similarities[3] DNA 
photolyases can be grouped into two classes. The first class contains enzymes from Gram- 
negative and Gram-positive bacteria, the halophilic archaebacteria Halobacterium halobium, 
25 fungi and plants. Class 1 enzymes bind either 5,10-MTHF (E.coli, fungi, etc.) or 8-HDF 

(S.griseus, H.halobium).This family also includes Arabidopsis cryptochromes 1 (CRY1) and 
2 (CRY2),which are blue light photoreceptors that mediate blue light-induced gene 
expression. There are a number of conserved sequence regions in all known class 1 DNA 
photolyases, especially in the C-terminal part. Two of these regions were selected as 

3 0 signature patterns 

Consensus pattern: T-G-x-P-[LIVM](2)-D-A-x-M-[RA]-x-[LIVM]- 

Consensus pattern: [DN]-R-x-R-[LIVM](2)-x-[STA](2)-F-[LIVMFA]-x-K-x-L-x(2,3)- W- 

[KRQ]- 
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[ 1] Sancar G.B., Sancar A. Trends Biochem. Sci. 12:259-261(1987). 
[ 2] Jorns M.S. Biofactors 2:207-211(1990). 

[ 3] Yasui A., Eker A.P.M., Yasuhira S., Yajima H., Kobayashi T., Takao M., Oikawa A. 
5 EMBO J. 13:6143-6151(1994). 

[ 4] Lin C, Ahmad M., Cashmore A.R. Plant J. 10:893-902(1996). 

138. (DNAj5ol_A) 

1 0 DNA polymerase family A signature 

Replicative DNA polymerases (EC 2.7.7.7) are the key enzymes catalyzing the accurate 
replication of DNA. They require either a small RNA molecule or a protein as a primer for 
the de novo synthesis of a DNA chain. On the basis of sequence similarities a number of 
DNA polymerases have been grouped together [1,2,3] under the designation of DNA 

1 5 polymerase family A. The polymerases that belong to this family are listed below. 

- Escherichia coli and various other bacterial polymerase I (gene polA). 

- Thermus aquaticus Taq polymerase. 

- Bacteriophage spOl polymerase. 

2 0 - Bacteriophage sp02 polymerase. 

- Bacteriophage T5 polymerase. 

- Bacteriophage T7 polymerase. 

- Mycobacteriophage L5 polymerase. 

- Yeast mitochondrial polymerase gamma (gene MIP1). 

25 

Five regions of similarity are found in all the above polymerases. One of these conserved 
regions, known as 'motif B T [1], is located in a domain which, in Escherichia coli polA, has 
been shown to bind deoxynucleotide triphosphate substrates; it contains a conserved tyrosine 
which has been shown, by photo- affinity labelling, to be in the active site; a conserved 

3 0 lysine, also part of this motif, can be chemically labelled, using pyridoxal phosphate. This 

conserved region was used as a signature for this family of DNA polymerases. 

Consensus patternR-x(2)-[GSAV]-K-x(3)-[LIVMFY]-[AGQ]-x(2)-Y-x(2)-[GS]-x(3)- 
[LIVMA] Sequences known to belong to this class detected by the pattern ALL. 
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[ 1] Delarue M., Poch O., Todro N., Moras D. ? Argos P. Protein Eng. 3:461-467(1990). 
[ 2] Ito J., Braithwaite D.K. Nucleic Acids Res. 19:4045-4057(1991). 
[ 3] Braithwaite D.K., Ito J. Nucleic Acids Res. 21:787-802(1993). 

5 

139. DNA_pol_viraLC 

DNA polymerase (viral) C-terminal domain 
Number of members: 128 

10 

140. (DNA_topoisoII) 

DNA topoisomerase II signature 

DNA topoisomerase I (EC 5.99.1.2) [1,2,3,4,E1] is one of the two types of enzyme that 
1 5 catalyze the interconversion of topological DNA isomers. Type II topoisomerases are ATP- 
dependent and act by passing a DNA segment through a transient double-strand break. 
Topoisomerase II is found in phages, archaebacteria, prokaryotes, eukaryotes, and in African 
Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three subunits 
(the product of genes 39, 52 and 60). In prokaryotes and in archaebacteria the enzyme, 
20 known as DNA gyrase, consists of two subunits (genes gyrA and gyrB [E2]). In some 

bacteria, a second type II topoisomerase has been identified; it is known as topoisomerase IV 
and is required for chromosome segregation, it also consists of two subunits (genes parC and 
parE). In eukaryotes, type II topoisomerase is a homodimer. 

2 5 There are many regions of sequence homology between the different subtypes of 

topoisomerase II. The relation between the different subunits is shown in the following 
representation: 

< About- 1400-residues > 

30 

[ Protein 39-* ] [--Protein 52—-] Phage T4 

[ gyrB * ] [ gyrA ] Prokaryote II 

Archaebacteria 

[ --parE * ][ parD -] Prokaryote IV 
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[ * ] Eukaryote and 

ASF 

T * ! : Position of the pattern. 

5 As a signature pattern for this family of proteins, a region that contains a highly conserved 
pentapeptide was selected. The pattern is located in gyrB, in parE ? and in protein 39 of phage 
T4 topoisomerase. 

Consensus pattern[LIVMA]-x-E-G-[DN]-S-A-x-[STAG] Sequences known to belong to this 
1 0 class detected by the pattern ALL. 

[ 1] Sternglanz R. Curr. Opin. Cell Biol. 1:533-535(1990). 
[ 2] Bjornsti M.-A. Curr. Opin. Struct. Biol. 1:99-103(1991). 
[ 3] Sharma A., Mondragon A. Curr. Opin. Struct. Biol. 5:39-47(1995). 
15 [4] Roca J. Trends Biochem. ScL 20:156-160(1995). 

141. (DSPc) Tyrosine specific protein phosphatases signature and profiles 

Tyrosine specific protein phosphatases (EC 3.1.3.48 ) (PTPase) [1 to 5] are enzymes that 

2 0 catalyze the removal of a phosphate group attached to a tyrosine residue. These enzymes are 
very important in the control of cell growth, proliferation, differentiation and transformation. 
Multiple forms of PTPase have been characterized and can be classified into two categories: 
soluble PTPases and transmembrane receptor proteins that contain PTPase domain(s). The 
currently known PTPases are listed below: Soluble PTPases. - PTPN1 (PTP-1B). - PTPN2 

2 5 (T-cell PTPase; TC-PTP). - PTPN3 (HI) and PTPN4 (MEG), enzymes that contain an N- 
terminal band 4.1- like domain (see < PDOCQ0566 >) and could act at junctions between the 
membrane and cytoskeleton. - PTPN5 (STEP). - PTPN6 (PTP-1C; HCP; SHP) and PTPN11 
(PTP-2C; SH-PTP3; Syp), enzymes which contain two copies of the SH2 domain at its N- 
terminal extremity. The Drosophila protein corkscrew (gene csw) also belongs to this 

30 subgroup. - PTPN7 (LC-PTP; Hematopoietic protein-tyrosine phosphatase; HePTP). - 

PTPN8 (70Z-PEP). - PTPN9 (MEG2). - PTPN12 (PTP-G1; PTP-P19). - Yeast PTPL - Yeast 
PTP2 which may be involved in the ubiquitin-mediated protein degradation pathway. - 
Fission yeast pypl and pyp2 which play a role in inhibiting the onset of mitosis. - Fission 
yeast pyp3 which contributes to the dephosphorylation of cdc2. - Yeast CDC14 which may 
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be involved in chromosome segregation. - Yersinia virulence plasmid PTPAses (gene yopH). 

- Autographa californica nuclear polyhedrosis virus 19 Kd PTPase.Dual specificity PTPases. 

- DUSP1 (PTPN10; MAP kinase phosphatase- 1; MKP-1); which dephosphorylates MAP 
kinase on both Thr-183 and Tyr-185. - DUSP2 (PAC-1), a nuclear enzyme that 

5 dephosphorylates MAP kinases ERK1 and ERK2 on both Thr and Tyr residues. - DUSP3 
(VHR). - DUSP4 (HVH2). - DUSP5 (HVH3). - DUSP6 (Pystl; MKP-3). - DUSP7 (Pyst2; 
MKP-X). - Yeast MSGS, a PTPase that dephosphorylates MAP kinase FUS3. - Yeast YVH1. 

- Vaccinia virus HI PTPase; a dual specificity phosphatase. Receptor PTPases. Structurally, 
all known receptor PTPases, are made up of a variable length extracellular domain, followed 

10 by a transmembrane region and a C-terminalcatalytic cytoplasmic domain. Some of the 

receptor PTPases contain fibronectintype III (FN-III) repeats, immunoglobulin-like domains, 
MAM domains orcarbonic anhydrase-like domains in their extracellular region. The 
cytoplasmic region generally contains two copies of the PTPAse domain. The first seems to 
have enzymatic activity, while the second is inactive but seems to affect substrate specificity 

15 of the first. In these domains, the catalytic cysteine is generally conserved but some other, 

presumably important, residues are not. In the following table, the domain structure of known 

receptor PTPases is shown: Extracellular Intracellular Ig FN-3 

CAH MAM PTPaseLeukocyte common antigen (LCA) (CD45) 0 2 0 0 2Leukocyte antigen 
related (LAR) 3 8 0 0 2 Drosophila DLAR 3 9 0 0 2Drosophila DPTP 2 2 0 0 2PTP-alpha 

2 0 (LRP) 0 0 0 0 2PTP-beta 0 16 0 0 lPTP-gamma 0 110 2PTP-delta 0 >7 0 0 2 PTP-epsilon 0 
0 0 0 2PTP-kappa 14 0 1 2PTP-mu 14 0 1 2PTP-zeta 0110 2PTPase domains consist of 
about 300 amino acids. There are two conserved cysteines, the second one has been shown to 
be absolutely required for activity. Furthermore, a number of conserved residues in its 
immediate vicinity have also been shown to be important. A signature pattern for PTPase 

2 5 domains was derived centered on the active site cysteine. There are three profiles for 

PTPases, the first one spans the complete domain and is not specific to any subtype. The 
second profile is specific to dual-specificity PTPases and the third one to the PTP subfamily 

Consensus pattern: [LIVMF]-H-C-x(2)-G-x(3)-[STC]-[STAGP]-x-[LIVMFY] [C is the 

3 0 active site residue] - 

[ 1] Fischer E.H., Charbonneau H., Tonks N.K. Science 253:401-406(1991). 
[ 2] Charbonneau H. ? Tonks N.K. Annu. Rev. Cell Biol. 8:463-493(1992). 
[ 3] Trowbridge I.S. J. Biol. Chem. 266:23517-23520(1991). 
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[ 4] Tonks N.K., Charbonneau H. Trends Biochem. Sci. 14:497-500(1989). 
[ 5] Hunter T. Cell 58:1013-1016(1989). 

5 142. (DUF10) Uncharacterized protein family UPF0076 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Goat antigen UK114, a human homolog and the rat corresponding protein which is known as 
perchloric acid soluble protein (PSP1). PSP1 [2] may inhibit an initiation stage of cell-free 
protein synthesis. - Mouse heat-responsive protein HRSP12. - Yeast chromosome V 

10 hypothetical protein YER057c. - Yeast chromosome IX hypothetical protein YILOSlc. - 
Caenorhabditis elegans hypothetical protein C23G10.2. - Escherichia coli hypothetical 
protein ycdK. - Escherichia coli hypothetical protein yhaR. - Escherichia coli hypothetical 
protein yjgF and HI0719, the corresponding Haemophilus influenzae protein. - Escherichia 
coli hypothetical protein yoaB. - Bacillus subtilis hypothetical protein yabJ. - Haemophilus 

15 influenzae hypothetical protein HI1627. - Helicobacter pylori hypothetical protein HP0944. - 
Lactococcus lactis aldR. - Myxococcus xanthus dfrA. - Synechocystis strain PCC 6803 
hypothetical protein slr0709. - Rhizobium strain NGR234 symbiotic plasmid hypothetical 
protein y4sK. - Pyrococcus horikoshii hypothetical protein PH0854.These are small proteins 
of around 15 Kd whose sequence is highly conserved .As a signature pattern, a well conserved 

2 0 region located in the C-terminal part of these proteins was selected. 

Consensus pattern: [PA]-[ASTPV]-R-[SACVF]-x-[LIVMFY]-x(2)-[GSAKR]-x-[LMVA]- 
x(5,8)-[LIVM]-E-[MI]~ 

25 [1] Bairoch A. Unpublished observations (1995). 

[ 2] Oka T., Tsuji H., Noda C, Sakai K„ Hong Y.-M., Suzuki L, Munoz S., Natori Y. J. Biol. 
Chem. 270:30060-30067(1995). 

3 0 143. (DUF3)Domain of Unknown Function 3 

Domain apparently occurring exclusively in eubacteria. Unknown 
function. 
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144. (DUF6) Integral membrane protein 

This family includes many hypothetical membrane proteins of unknown function. 
Many of the proteins contain two copies of the aligned region. 

5 

145. (DUF7) Integral membrane protein 

This family includes many hypothetical membrane proteins of unknown function. 
Swiss:P145Q2 has been implicated in resistance to ethidium bromide. 



146. (DapB) Dihydrodipicolinate reductase signature 

Dihydrodipicolinate reductase (EC 1.3.1.26 ) catalyzes the second step in the biosynthesis of 
diaminopimelic acid and lysine, the NAD or NADP-dependent reduction of 2,3- 
dihydrodipicolinate into 2,3,4,5-tetrahydrodipicolinate. This enzyme is present in bacteria 
15 (gene dapB) and higher plants. As a signature pattern the best conserved region in this 

enzyme was selected. It is located in the central section and is part of the substrate-binding 
region [1]. 



20 



Consensus pattern: E-[IV]-x-E-x-H-x(3)-K-x-D-x-P-S-G-T-A- 

[ 1] Scapin G. ? Blanchard J.S., Sacchettini J.C. Biochemistry 34:3502-3512(1995). 



147. DedA family 

2 5 This family combines the DedA related proteins and YIAN/YGIK family. Members 

of this family are not functionally characterised. These proteins contain multiple predicted 
transmembrane regions. 



30 148. DegT/DnrJ/EryCl/StrS family 

The members of this family exhibit some characteristics of the sensor protein of two- 
component signal transduction systems, however none of the members show any sequence 
similarity to these protein kinases. The members of this family do have the typical helix-turn- 
helix motif of DNA binding proteins. 
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[1] Stutzman-Engwall KJ, Otten SL ? Hutchinson CR, J Bacteriol 1992;174:144-154. 

149. (Desaturase) Fatty acid desaturases signatures 
5 Fatty acid desaturases (EC 1.14.99.-) are enzymes that catalyze the insertion of a double bond 
at the delta position of fatty acids. There seems to be two distinct families of fatty acid 
desaturases which do not seem to be evolutionary related. Family 1 is composed of: - 
Stearoyl-CoA desaturase (SCD) (EC 1.14.99.5 ) [1]. SCD is a key regulatory enzyme of 
unsaturated fatty acid biosynthesis. SCD introduces a cis double bond at the delta(9) position 

10 of fatty acyl-CoA's such as palmitoleoyl- and oleoyl-CoA. SCD is a membrane-bound 

enzyme that is thought to function as a part of a multienzyme complex in the endoplasmic 
reticulum of vertebrates and fungi. As a signature pattern for this family a conserved region 
in the C-terminal part of these enzymes was selected, this region is rich in histidine residues 
and in aromatic residues. Family 2 is composed of: - Plants stearoyl-acyl-carrier-protein 

15 desaturase (EC 1.14.99.6 ) [2], these enzymes catalyze the introduction of a double bond at 

the delta(9) position of steraoyl-ACP to produce oleoyl-ACP. This enzyme is responsible for 
the conversion of saturated fatty acids to unsaturated fatty acids in the synthesis of vegetable 
oils. - Cyanobacteria desA [3] an enzyme that can introduce a second cis double bond at the 
delta(12) position of fatty acid bound to membranes glycerolipids. DesA is involved in 

2 0 chilling tolerance; the phase transition temperature of lipids of cellular membranes being 

dependent on the degree of unsaturation of fatty acids of the membrane lipids. As a signature 
pattern for this family a conserved region in the C-terminal part of these enzymes was 
selected. 

2 5 Consensus pattern: G-E-x-[FY]-H-N-[FY]-H-H-x-F-P-x-D-Y- 

Consensus pattern: [ST]-[SA]-x(3)-[QR]-[LI]-x(5 ; 6)-D-Y-x(2)-[LIVMFYW]-[LIVM]- [DE]- 

[ 1] Kaestner K.H. ? Ntambi J.M., Kelly T.J. Jr., Lane M.D. J. Biol. Chem. 264:14755- 
14761(1989). 

30 [2] Shanklin J., Somerville C.R. Proc. Natl. Acad. Sci. U.S.A. 88:2510-2514(1991). 
[ 3] Wada H. ? Gombos Z., Murata N. Nature 347:200-203(1990). 



150. Dihydroorotase signatures 
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Dihydroorotase (EC 3.5.2.3 ) (DHOase) catalyzes the third step in the de novo biosynthesis of 
pyrimidine, the conversion of ureidosuccinic acid (N-carbamoyl-L-aspartate) into 
dihydroorotate. Dihydroorotase binds a zinc ion which is required for its catalytic activity [1]. 
In bacteria, DHOase is a dimer of identical chains of about 400 amino-acid residues (gene 
5 pyrC). In higher eukaryotes, DHOase is part of a large multi-functional protein known as 

'rudimentary' in Drosophila and CAD in mammals and which catalyzes the first three steps of 
pyrimidine biosynthesis [2]. The DHOase domain is located in the central part of this 
polyprotein. In yeasts, DHOase is encoded by a monofunctional protein (gene URA4). 
However, a defective DHOase domain [3] is found in a multifunctional protein (gene 

1 0 URA2)that catalyzes the first two steps of pyrimidine biosynthesis. The comparison of 
DHOase sequences from various sources shows [4] that there are two highly conserved 
regions. The first located in the N-terminal extremity contains two histidine residues 
suggested [3] to be involved in binding the zinc ion. The second is found in the C-terminal 
part. Signature patterns for both regions have been developed. Allantoinase (EC 3.5.2.5) is 

15 the enzyme that hydrolyzes allantoin intoallantoate. In yeast (gene DAL1) [5], it is the first 
enzyme in the allanto indegradation pathway; in amphibians [6] and fish it catalyzes the 
second step in the degradation of uric acid. The sequence of allantoinase is evolutionary 
related to that of DHOases. 

2 0 Consensus pattern: D-[LIVMFYWSAP]-H-[LIVA]-H-[LIVF]-[RN]-x-[PGANF] [The two 
H's are probable zinc ligands]- 
Consensus pattern: [GA]-[ST]-D-x-A-P-H-x(4)-K- 

[ 1] Brown D.C., Collins K.D. J. Biol. Chem. 266:1597-1604(1991). 
2 5 [2] Davidson J.N., Chen K.C., Jamison R.S., Musmanno L.A., Kern C.B. BioEssays 15:157- 
164(1993). 

[ 3] Souciet J.L., Nagy ML, Le Gouar ML, Lacroute R, Potier S. Gene 79:59-70(1989). 
[ 4] Guyonvarch A., Nguyen- Juilleret M., Hubert J.-C, Lacroute F. Mol. Gen. Genet. 
212:134-141(1988). 
30 [5] Buckholz R.G., Cooper T.G. Yeast 7:913-923(1991). 

[ 6] Hayashi S. ? Jain S. ? Chu R., Alvares K. 5 Xu B., Erfurth F. ? Usuda N., Rao M.S., Reddy 
S.K., Noguchi T., Reddy J.K., Yeldandi A.Y. J. Biol. Chem. 269:12269-12276(1994). 
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151. dnaJ domains signatures and profile 

The prokaryotic heat shock protein dnaJ interacts with the chaperone hsp70-like dnaK 
protein [1], Structurally, the dnaJ protein consists of an N- terminal conserved domain (called 
'J' domain) of about 70 amino acids, a glycine-rich region ('G' domain) of about 30 residues, 
5 a central domain containing four repeats of a CXXCXGXG motif ( 1 CRR* domain) and a C- 
terminal region of 120 to 170 residues. Such a structure is shown in the following schematic 
representation: 

+ +-+ — - — + + + + | N-terminal 1 1 

Gly-R 1 1 CXXCXGXG | C-terminal | + +-+ +-— -+ + 

10 + 

It has been shown [2] that the T domain as well as the 'CRR' domain are also found in 
other prokaryotic and eukaryotic proteins which are listed below. 

a) Proteins containing both a 'J' and a 'CRR' domain: 

- Yeast protein MAS5/YDJ1 which seems to be involved in mitochondrial protein 
15 import. 

- Yeast protein MDJ1, involved in mitochondrial biogenesis and protein folding. 

- Yeast protein SCJ1, involved in protein sorting. 
Yeast protein XD J 1 . 

Plants dnaJ homologs (from leek and cucumber). 
2 0 - Human HDJ2, a dnaJ homolog of unknown function. 

- Yeast hypothetical protein YNL077w. 

b) Proteins containing a T domain without a 'CRR* domain: 

- Rhizobium fredii nolC, a protein involved in cultivar-specific nodulation of 
soybean. 

2 5 - Escherichia coli cbpA [3], a protein that binds curved DNA. 

- Yeast protein SEC63/NPL1, important for protein assembly into the endoplasmic 
reticulum and the nucleus. 

- Yeast protein SIS1, required for nuclear migration during mitosis. 
Yeast protein CAJ1. 

3 0 - Yeast hypothetical protein YFR041c. 

- Yeast hypothetical protein YIR004w. 

- Yeast hypothetical protein YJL162c. 
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- Plasmodium falciparum ring-infected erythrocyte surface antigen (RES A). RESA, 
whose function is not known, is associated with the membrane skeleton of newly 
invaded erythrocytes. 

- Human HDJ1. 

5 - Human HSJ1, a neuronal protein. 

Drosophila cysteine-string protein (csp). 
A signature pattern for the 'J 1 domain was developed, based on conserved positions in 
the C-terminal half of this domain. A pattern for the 'CRR' domain, based on the first two 
copies of that motif was also developed. A profile for the 'J' domain was also developed. 

10 

Consensus pattern: [FY]-x(2)-[LIVMA]-x(3)-[FYWHNT]-[DENQSA]-x-L-x-[DN]-x(3)- 
[KR]-x(2)-[FYI]- 

Consensus pattern: C-[DEGSTHKR]-x-C-x-G-x-[GK]-[AGSDM]-x(2)-[GSNKR]-x(4,6)-C- 
x(2,3)-C-x-G-x-G- 

15 

[1] Cyr D.M., Langer T., Douglas M.G. Trends Biochem. Sci. 19:176-181(1994). 

[2] Bork P., Sander C, Valencia A., Bukau B. Trends Biochem. Sci. 17:129-129(1992). 

[3] Ueguchi C, Kaneda M., Yamada H., Mizuno T. Proc. Natl. Acad. Sci. U.S.A. 91:1054- 

1058(1994). 

20 

152. 

153. Dwarfin 

2 5 This family known as the dwarfins also includes the drosophila protein MAD. The N- 

terminus of MAD can bind to DNA [2]. 

[1] Yingling JM, Das P, Savage C, Zhang M, Padgett RW, Wang XF, Proc Natl Acad 
Sci U S A 1996;93:8940-8944. [2] Kim J, Johnson K, Chen HJ, Carroll S, Laughon A, 
Nature 1997;388:304-308. 

30 

154. Dynein light chain type 1 signature 

Dynein is a multisubunit microtubule-dependent motor enzyme that acts as the force 
generating protein of eukaryotic cilia and flagella. The cytoplasmic isoform of dynein acts as 
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a motor for the intracellular retrograde motility of vesicles and organelles along microtubules. 
Dynein is composed of a number of ATP-binding large subunits, intermediate size subunits 
and small subunits. Among the small subunits, there is a family [1,2] of highly conserved 
proteins which consist of: - Chlamydomonas reinhardtii flagellar outer arm dynein 8 Kd and 
5 11 Kd light chains. - Higher eukaryotes cytoplasmic dynein light chain 1. - Yeast cytoplasmic 
dynein light chain 1 (gene DYN2 or SLC1). - Caenorhabditis elegans hypothetical dynein 
light chains M18.2 and T26A5.9.These proteins are have from 89 to 120 amino acids. As a 
signature pattern, A highly conserved region was selected. 



1 0 Consensus pattern: H-x-I-x-G-[KR]-x-F-[GA]-S-x-V-[ST]-[HY]-E - 



[ 1] King S.M., Patel-King R.S. J. Biol. Chem. 270:11445-11452(1995). 

[ 2] Dick T., Ray K., Salz H.K., Chia W. MoL Cell. BioL 16:1966-1977(1996). 



15 

155. dUTPase 

dUTPase hydrolyzes dUTP to dUMP and pyrophosphate. 

[1] Cedergren-Zeppezauer ES, Larsson G, Nyman PO, Dauter Z, Wilson KS ? Nature 
1992;355:740-743. [2] Mol CD, Harris JM, Mcintosh EM, Tainer JA, Structure 
2 0 1996;4:1077-1092. 



156. (dCMP cyt deam) Cytidine and deoxycytidylate deaminases zinc-binding region 
signature 

2 5 Cytidine deaminase (EC 3.5.4.5 ) (cytidine aminohydrolase) catalyzes the hydrolysis of 

cytidine into uridine and ammonia while deoxycytidylatedeaminase (EC 3.5.4.12) (dCMP 
deaminase) hydrolyzes dCMP into dUMP. Both enzymes are known to bind zinc and to 
require it for their catalytic activity[l,2]. These two enzymes do not share any sequence 
similarity with the exception of a region that contains three conserved histidine and cysteine 

3 0 residues which are thought to be involved in the binding of the catalytic zincion. Such a 

region is also found in other proteins [3,4]: - Yeast cytosine deaminase (EC 3.5.4.1 ) (gene 
FCY1) which transforms cytosine into uracil. - Mammalian apolipoprotein B mRNA editing 
protein, responsible for the postranscriptional editing of a CAA codon into a UAA (stop) 
codon in the APOB mRNA. - Riboflavin biosynthesis protein ribG, which converts 2,5- 
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diamino-6- (ribosylamino)-4(3H)-pyrimidinone 5 T -phosphate into 5-amino-6- (ribosylamino)- 
2 3 4(lH,3H)-pyrimidinedione S'-phosphate. - Bacillus cereus blasticidin-S deaminase (EC 
3.5.4.23 ). which catalyzes the deamination of the cytosine moiety of the antibiotics 
blasticidin S, cytomycin and acetylblasticidin S. - Bacillus subtilis protein comEB. This 
5 protein is required for the binding and uptake of transforming DNA. - Bacillus subtilis 

hypothetical protein yaaJ. - Escherichia coli hypothetical protein yfhC. - Yeast hypothetical 
protein YJL035c. A signature pattern for this zinc-binding region was derived. 

Consensus pattern: [CH]-[AGV]-E-x(2)-[LIVMFGAT]-[LIVM]-x(17 ? 33)-P-C-x(2 ? 8)-C- 
1 0 x(3)-[LIVM] [The Cs and H are zinc ligands 

[ 1] Yang C. ? Carlow D., Wolfenden R. ? Short S.A. Biochemistry 31:4168-4174(1992). 
[ 2] Moore J.T., Silversmith R.R, Maley G.F., Maley F. J. Biol. Chem. 268:2288- 
2291(1993). 

15 [3] Reizer L, Buskirk S., Bairoch A., Reizer A., Saier MTL Jr. Protein Sci. 3:853-856(1994). 
[ 4] Bhattacharya S., Navaratnam N., Morrison J.R., Scott J. ? Taylow W.R. Trends Biochem. 
Sci. 19:105-106(1994). 

2 0 157. Dehydrins signatures 

A number of proteins are produced by plants that experience water-stress. Water-stress takes 
place when the water available to a plant falls below a critical level. The plant hormone 
abscisic acid (ABA) appears to modulate the response of plant to water-stress. Proteins that 
are expressed during water- stress are called dehydrins [1,2] or LEA group 2 proteins [3]. The 

2 5 proteins that belong to this family are listed below. 

- Arabidopsis thaliana XERO 1, XERO 2 (LTI30), RAB18, ERD10 (LTI45) 
ERD14 and COR47. 

- Barley dehydrins B8, B9 ? B17, and B18. 
Cotton LEA protein D- 1 1 . 

30 - Craterostigma plantagineum dessication-related proteins A and B. 

- Maize dehydrin M3 (RAB-17). 

- Pea dehydrins DHN1, DHN2, and DHN3. 
Radish LEA protein. 

- Rice proteins RAB 16B, 16C, 16D, RAB21 ? and RAB25. 
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- Tomato TAS14. 

Wheat dehydrin RAB 15 and cold-shock protein cor410 ? cs66 and csl20. 

Dehydrins share a number of structural features. One of the most notable 
features is the presence, in their central region, of a continuous run of five to nine 
5 serines followed by a cluster of charged residues. Such a region has been found in all 

known dehydrins so far with the exception of pea dehydrins. A second conserved 
feature is the presence of two copies of alysine-rich octapeptide; the first copy is 
located just after the cluster of charged residues that follows the poly-serine region 
and the second copy is found at the C-terminal extremity. Signature patterns for both 
1 0 regions were derived. 

Consensus pattern: S(5)-[DE]-x-[DE]-G-x(l,2)-G-x(0,l)-[KR](4 
Consensus pattern: [KR]-[LIM]-K-[DE]-K-[LIM]-P-G- 

15 [1] Close TJ. ? Kortt A.A., Chandler P.M. Plant Mol. Biol. 13:95-108(1989). 
[2] Robertson M., Chandler P.M. Plant Mol. BioL 19:1031-1044(1992). 
[3] Dure L. Ill, Crouch M. ? Harada J., Ho T.-H. D. ? Mundy J. ? Quatrano R. ? Thomas T., Sung 
Z.R. Plant Mol. Biol. 12:475-486(1989). 

20 

158. (deoR) Bacterial regulatory proteins, deoR family signature 

The many bacterial transcription regulation proteins which bind DNA through a helix-tum- 
helix' motif can be classified into subfamilies on the basis of sequence similarities. One of 
these subfamilies groups the following proteins[l ? 2]: - accR ? the Agrobacterium tumefaciens 

25 plasmid pTiC58 repressor of opine catabolism and conjugal transfer. - agaR, the Escherichia 
coli aga operon putative repressor. - deoR, the Escherichia coli deoxyribose operon repressor. 
- fucR, the Escherichia coli L-fucose operon activator. - gatR ? the Escherichia coli galactitol 
operon repressor. - glpR, the Escherichia coli glycerol-3-phosphate regulon repressor. - gutR 
(or srlR), the Escherichia coli glucitol operon repressor. - iolR, from Bacillus subtilis. - lacR, 

3 0 the streptococci lactose phosphotransferase system repressor. - spoIIID, the Bacillus subtilis 
transcription regulator of the sigK gene. - yfjR, an Escherichia coli hypothetical protein. - 
ygbl, an Escherichia coli hypothetical protein. - yihW, an Escherichia coli hypothetical 
protein. - yjfQ, an Escherichia coli hypothetical protein. - yjhJ ? an Escherichia coli 
hypothetical protein. The 'helix-turn-helix 1 DNA-binding motif of these proteins is located in 

35 the N-terminal part of the sequence. The pattern used to detect these proteins starts fourteen 
residues before the HTH motif and ends one residue after it. 
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Consensus pattern: R-x(3)-[LIVM]-x(3)-[LIVM]-x(16 ? 17)-[STA]-x(2)-T-[LIVMA]- [RH]- 
[KRNA]-D-[LIVMF]~ 

5 [ 1] von Bodman S. ? Hayman G.T., Farrand S.K. Proc. Natl. Acad. Sci. U.S.A. 89:643- 
647(1992). 

[ 2] Bairoch A. Unpublished observations (1993), 

10 159. dsrm 

Double-stranded RNA binding motif 

[1] Burd CG, Dreyfuss G; Medline: 94310455, Conserved structures and diversity of 
functions of RNA-binding proteins. Science 1994;265:615-621. 

15 

Sequences gathered for seed by HMM_iterative_training Putative motif shared by proteins 
that bind to dsRNA. At least some DSRM proteins seem to bind to specific RNA targets. 
Exemplified by Staufen, which is involved in localization of at least five different mRNAs in 
the early Drosophila embryo. Also by interfer on-induced protein kinase in humans, which is 

2 0 part of the cellular response to dsRNA. 

Number of members: 116 

25 160. Dynamin family signature 

Dynamin [1,2] is a microtubule-associated force-producing protein of 100 Kd which is 
involved in the production of microtubule bundles and which is able to bind and hydrolyze 
GTP. Dynamin is structurally related to the following proteins: - Drosophila shibire protein 
(gene shi) [3]. Shibire is, very probably, the Drosophila cognate of mammalian dynamin. It 

3 0 seems to provide the motor for vesicular transport during endocytosis. - Yeast vacuolar 

sorting protein VPS1 (or SP015) [4], a protein which could also be involved in microtubule- 
associated motility. - Yeast protein MGM1 [5], which is required for mitochondrial genome 
maintenance. - Yeast protein DNM1, which is involved in endocytosis. - Interferon induced 
Mx proteins [6,7], Interferon alpha or beta induce the synthesis of a family of closely related 
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proteins. Most of these proteins are known to confer resistance to influenza viruses and/or 
rhabdoviruses on transfected mammalian cell in culture. The three motifs found in all GTP- 
binding proteins are located in the N-terminal part of these proteins. The signature pattern 
that was developed for these proteins is based on a highly conserved region downstream of 
5 the ATP/GTP-binding motif A' (P-loop) (see < PDOC00017 >).- 

Consensus pattern: L-P-[RK]-G-[STN]-[GN]-[LIVM]-V-T-R- 

[ 1] Vallee R.B., Shpetner H.S. Annu. Rev. Biochem. 59:909-932(1990). 
10 [2] Obar R.A., Collins C.A., Hammarback J.A., Shpetner H.S., Vallee R.B. Nature 347:256- 
261(1990). 

[ 3] van der Bliek A., Meyerowitz E.M. Nature 351:411-414(1991). 

[ 4] Rothman J.H., Raymond C.K., Gilbert T., O'Hara P.J., Stevens T.H. Cell 61:1063- 

1074Q990V 

15 [5] Jones B.A., Fangman W.L. Genes Dev. 6:380-389(1992). 
[ 6] Arnheiter H., Meier E. New Biol. 2:851-857(1990). 
[ 7] Staeheli P., Pitossi F., Pavlovic J. Trends Cell Biol. 3:268-272(1993). 



2 0 161. (dynaminJ2) Dynamin central region 

This region lies between the GTPase domain, see dynamin . and the pleckstrin 
homology (PH) domain. 



2 5 162. E1-E2 ATPases phosphorylation site 

E1-E2 ATPases (also known as P-type) are cation transport ATPases which form an aspartyl 
phosphate intermediate in the course of ATP hydrolysis. ATPases which belong to this family 
are listed below [1,2,3]. - Fungal and plant plasma membrane (H+) ATPases [reviewed in 4]. 
- Vertebrate (Na+, K+) ATPases (sodium pump) [reviewed in 5,6]. - Gastric (K+, H+) 

3 0 ATPases (proton pump). - Calcium (Ca++) ATPases (calcium pump) from the sarcoplasmic 

reticulum (SR), the endoplasmic reticulum (ER) and the plasma membrane. - Copper (Cu++) 
ATPases (copper pump) which are involved in two human genetic disorders: Menkes 
syndrome and Wilson disease [7]. - Bacterial potassium (K+) ATPases. - Bacterial cadmium 
efflux (Cd++) ATPases [reviewed in 8]. - Bacterial magnesium (Mg++) ATPases. - A 
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probable cation ATPase from Leishmania. - fixl, a probable cation ATPase from Rhizobium 
meliloti, involved in nitrogen fixation. The region around the phosphorylated aspartate 
residue is perfectly conserved in all these ATPases and can be used as a signature pattern. 



5 Consensus pattern: D-K-T-G-T-[LI]-[TI] [D is phosphorylated] 



1] Green N.M., McLennan D.H. Biochem. Soc. Trans. 17:819-822(1989). 

2] Green N.M. Biochem. Soc. Trans. 17:970-972(1989). 

3] Fagan M.J., Saier M.H. Jr. J. Mol. EvoL 38:57-99(1994). 

4] Serrano R. Biochim. Biophys. Acta 947:1-28(1988). 

5] Fambrough D.M. Trends Neurosci. 11:325-328(1988), 

6] Sweadner KJ. Biochim. Biophys. Acta 988:185-220(1989). 

7] Bull P.C., Cox D.W. Trends Genet. 10:246-251(1994). 

8] Silver S., Nucifora G., Chu L.> Misra T.K. Trends Biochem. Sci. 14:76-80(1989). 



15 



163. E1_N 

El Protein, N terminal domain 
Number of members: 90 



164. (El_dehydrog) Dehydrogenase El component 

This family uses thiamine pyrophosphate as a cofactor. This family includes pyruvate 
dehydrogenase, 2-oxoglutarate dehydrogenase and 2-oxoisovalerate dehydrogenase. 

25 



165. (ECH) Enoyl-CoA hydratase/isomerase signature 

Enoyl-CoA hydratase (EC 4.2.1.17 ) (ECH) [1] and 3-2trans-enoyl-CoA isomerase(EC 
5.3.3.8 ) (ECI) [2] are two enzymes involved in fatty acid metabolism. ECH catalyzes the 
3 0 hydratation of 2-trans-enoyl-CoA into 3-hydroxyacyl-CoA and ECI shifts the 3- double bond 
of the intermediates of unsaturated fatty acid oxidation to the 2-trans position. Most 
eukaryotic cells have two fatty-acid beta-oxidation systems, one located in mitochondria and 
the other in peroxisomes. In mitochondria, ECH and ECI are separate yet structurally related 
monofunctional enzymes. Peroxisomes contain a trifunctional enzyme [3] consisting of an N- 
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terminal domain that bears both ECH and ECI activity, and a C-terminal domain responsible 
for 3-hydroxyacyl-CoA dehydrogenase (HCDH) activity. In Escherichia coli (gene fadB) and 
Pseudomonas fragi (gene f aoA), ECH and ECI are also part of a multifunctional enzyme 
which contains both a HCDH and a3-hydroxybutyryl-CoA epimerase domain [4] .A number 
5 of other proteins have been found to be evolutionary related to the ECH/ECI enzymes or 
domains: - 3-hydroxbutyryl-coa dehydratase (EC 4.2.1.55 ) (crotonase), a bacterial enzyme 
involved in the butyrate/butanol-producing pathway. - Naphthoate synthase (EC 4.1.3.36 ) 
(DHNA synthetase) (gene menB) [5], a bacterial enzyme involved in the biosynthesis of 
menaquinone (vitamin K2). DHNA synthetase converts O-succinyl-benzoyl-CoA (OSB- 

1 0 CoA) to l ? 4-dihydroxy- 2-naphthoic acid (DHNA). - 4-chlorobenzoate dehalogenase (EC 
3.8.1.6 ) [6], a Pseudomonas enzyme which catalyzes the conversion of 4-chlorobenzoate- 
CoA to 4-hydroxybenzoate-CoA. - A Rhodobacter capsulatus protein of unknown function 
(ORF257) [7]. - Bacillus subtilis putative polyketide biosynthesis proteins pksH and pksl. - 
Escherichia coli carnitine racemase (gene caiD) [8]. - Escherichia coli hypothetical protein 

1 5 ygfG. - Yeast hypothetical protein YDR036c.As a signature pattern for these enzymes, a 
conserved region richin glycine and hydrophobic residues was selected. 

Consensus pattern: [LIVM]-[STA]-x-[LIVM]-[DENQRHSTA]-G-x(3)-[AG](3)-x(4)- 
[LIVMST]-x-[CSTA]-[DQHP]-[LIVMFY]- 

20 

[ 1] Minami-Ishii N., Taketani S., Osumi T. 9 Hashimoto T. Eur. J. Biochem. 185:73- 
78(1989). 

[ 2] Mueller-Newen G. ? Stoffel W. Biol. Chem. Hoppe-Seyler 372:613-624(1991). 
[ 3] Palosaari P.M., Hiltunen J.K. J. Biol. Chem. 265:2446-2449(1990). 
25 [4] Nakahigashi K., Inokuchi H. Nucleic Acids Res. 18:4937-4937(1990). 
[ 5] Driscoll J.R., Taber H.W. J. Bacteriol. 174:5063-5071(1992). 

[ 6] Babbitt P.C., Kenyon G.L., Matin B.M., Charest H., Sylvestre M. ? Scholten J.D., Chang 
K.-H., Liang P.-H., Dunaway-Mariano D. Biochemistry 31:5594-5604(1992). 
[ 7] Beckman D.L. ? Kranz R.G. Gene 107:171-172(1991). 
30 [8] Eichler K., Bourgis F., Buchet A., Kleber H.-P., Mandrand-Berthelot M.-A. Mol. 
Microbiol. 13:775-786(1994). 

166. (EF1BD) Elongation factor 1 beta/betaVdelta chain signatures 
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Eukaryotic elongation factor 1 (EF-1) is responsible for the GTP-dependent binding of 
aminoacyl-tRNAs to the ribosomes [1]. EF-1 is composed of four subunits: the alpha chain 
which binds GTP and aminoacyl-tRNAs, the gamma chain that probably plays a role in 
anchoring the complex to other cellular components and the beta and delta (or beta 1 ) chains. 
5 The beta and delta chains are highly similar proteins that both stimulate the exchange of GDP 
bound to the alpha chain for GTP [2]. The beta and delta chains are hydrophilic proteins of 
around 23 to 31 Kd. Their C-terminal part seems important for the nucleotide exchange 
activity, while the N-terminal section is probably involved in the interaction with the gamma 
chain. Two signature patterns for this family of proteins were developed. The first 
1 0 corresponds to an acidic region in the central section; the second, to the C-terminal extremity 
of these proteins 

Consensus pattern: [DE]-[DEG]-[DE](2)-[LIVMF]-D-L-F-G- 
Consensus pattern: [IV]-Q-S-x-D-[LIVM]-x-A-[FWM]-[NQ]-K-[LIVM]- 

15 

[ 1] Riis B., Rattan I.S., Clark B.F.C., Merrick W.C. Trends Biochem. Sci. 15:420-424(1990). 
[ 2] van Damme HLT.F., Amons R., Karssies R. ? Timmers C J., Janssen G.M.C., Moeller W. 
Biochim. Biophys. Acta 1050:241-247(1990). 

20 

167. (EFlG_domain) Elongation factor 1 gamma, conserved domain 

168. (EFG_C) Elongation factor G C-terminus 

2 5 This family is always found associated with GTPEFTU . This family includes the 

carboxyl terminal regions of Elongation factor G ? elongation factor 2 and some tetracycline 
resistance proteins. 

30 169. (EFP) Elongation factor P signature 

Elongation factor P (EF-P) [1] is a prokaryotic protein translation factor required for efficient 
peptide bond synthesis on 70S ribosomes from fMet-tRNAfMet. EF-P is a protein of 21 Kd. 
It is evolutionary related to yeiP, an hypothetical protein from Escherichia coli. As a 
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signature pattern, a conserved region located in the C-terminal part of these proteins was 
selected. 

Consensus pattern: K-x-[AV]-x(4)-G-x(2)-[LIV]-x-V-P-x(2)-[LIV]-x(2)-G- 

[ 1] Aoki H., Adams S.-L., Turner M.A., Ganoza M.C. Biochimie 79:7-11(1997). 

170. (EF TS) Elongation factor Ts signatures 

In prokaryotes elongation factor Ts (EF-Ts) is a component of the elongation cycle of protein 
biosynthesis. It associates with the EF-Tu.GDP complex and induces the exchange of GDP to 
GTP, it remains bound to the aminoacyl-tRNA.EF-Tu.GTP complex up to the GTP 
hydrolysis stage on the ribosome [1].EF-Ts is also a component of the chloroplast protein 
biosynthetic machinery and is encoded in the genome of some algal chloroplast [2]. It is also 
present in mitochondria [3]. As signature patterns for EF-Ts, two conserved regions located 
in the N-terminal part of the protein have been selected. 

Consensus pattern: L-R-x(2)-T-[GSDNQ]-x-[GS]-[LIVMF]-x(0,l)-[DENKAC]-x-K- 
[KRNEQSJ-A-L- 

Consensus pattern: E-[LIVM]-[NV]-[SCV]-[QE]-T-D-F-V-[SA]-[KRN]- 

[ 1] Bubunenko M.G., Kireeva M.L., Gudkov A.T. Biochimie 74:419-425(1992). 

[ 2] Kostrzewa M., Zetsche K. Plant Mol. Biol. 23:67-76(1993). 

[ 3] Xin H., Woriax V.L., Burkhart W.A., Spremulli L.L. J. Biol. Chem. 270:17243- 

17249(1995V 

171. (EMP24_GP25L) emp24/gp25L/p24 family 

Members of this family are implicated in bringing cargo forward from the ER and 
binding to coat proteins by their cytoplasmic domains. Number of members: 30 

Paccaud JP, Thomas DY, Bergeron JJ, Nilsson T, J Cell Biol 1998;140:751-765. 



172. ENV_polyprotein 
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ENV polyprotein (coat polyprotein) 
Number of members: 224 

173. (ERG4_ERG24) Ergosterol biosynthesis ERG4/ERG24 family signatures 

Two fungal enzymes involved in ergosterol biosynthesis and which act by reducing double 
bonds in precursors of ergosterol have been shown to be evolutionary related [1], These are 
C-14 sterol reductase (gene ERG24 in budding yeast and erg3 in Neurospora Crassa) and C- 
24(28) sterol reductase (gene ERG4 in budding yeast and stsl in fission yeast). Their 
sequences are also highly related to that of chicken lamin B receptor, which is thought to 
anchor the lamina to the inner nuclear membrane. These proteins are highly hydrophobic and 
seem to contain seven or eight transmembrane regions. As signature patterns, two conserved 
regions were selected. The first one is apparently located in a loop between the fourth and 
fifth transmembrane regions and the second is in the C-terminal section. 

Consensus pattern: G-x(2)-[LIVM]-[YH]-D-x-[FYW]-x-G-x(2)-L-N-P-R - 
Consensus pattern: [LIVM](2)-H-R-x(2)-R-D-x(3)-C-x(2)-K-Y-G- 

[ 1] Lai M.H., Bard M., Pierson C A., Alexander J.F., Goebl M., Carter G.T. ? Kirsch D.R. 
Gene 140:41-49(1994). 

174. (ERM) Ezrin/radixin/moesin family 

This family of proteins contain a band 4.1 domain ( Band_41 V at their amino terminus. 
This family represents the rest of these proteins. 

[1] Yonemura S ? Hirao M, Doi Y, Takahashi N, Kondo T, Tsukita S, J Cell Biol 
1998;140:885-895. 

175. ER lumen protein retaining receptor signatures 

Proteins that reside in the lumen of the endoplasmic reticulum (ER) contain aC-terminal 
tetrapeptide (generally K-D-E-L or H-D-E-L) that serves as a signal for their retrieval 
(retrograde transport) from subsequent compartments of the secretory pathway. The signal is 
recognized by a receptor molecule that is believed to cycle between the cis side of the Golgi 
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apparatus and the ER [l].This protein is known as the ER lumen protein retaining receptor or 
also as the 'KDEL receptor'. It has been characterized in a variety of species, including fungi 
(gene ERD2), plants, Plasmodium, Drosophila and mammals. In mammals two highly related 
forms of the receptor are known. Structurally, the receptor is a protein of about 220 residues 
5 that seems to contain seven transmembrane regions [2], The N-terminal part (3 residues) is 
oriented toward the lumen while the C-terminal tail (about 12 residues) is cytoplasmic. There 
are three lumenal and three cytoplasmic loops. Two signature patterns for these receptors 
were developed. The first pattern corresponds to the C-terminal half of the first cytoplasmic 
loop as well as most of the second transmembrane domain. The second pattern is a perfectly 
1 0 conserved decapeptide that corresponds to the central part of the fifth transmembrane 
domain. 

Consensus pattern: G-I-S-x-[KR]-x-Q-x-L-[FY]-x-[LIV](2)-F-x(2)-R-Y- 
Consensus pattern: L-E-[SA]-V-A-I-[LM]-P-Q-L- 

15 

[ 1] Pelham H.R.B. Curr. Opin. Cell BioL 3:585-591(1991). 

[ 2] Townsley F.M., Wilson D.W., Pelham FLR.B. EMBO J. 12:2821-2829(1993). 

2 0 176. (ETF_beta) Electron transfer flavoprotein beta-subunit signature 

The electron transfer flavoprotein (ETF) [1,2] serves as a specific electron acceptor for 
various mitochondrial dehydrogenases. ETF transfers electrons to the main respiratory chain 
via ETF-ubiquinone oxidoreductase. ETF is an heterodimer that consist of an alpha and a 
beta subunit and which bind one molecule of FAD per dimer. A similar system also exists in 

2 5 some bacteria. The beta subunit of ETF is a protein of about 28 Kd which is structurally 
related to the bacterial nitrogen fixation protein fixA which could play a role in a redox 
process and feed electrons to ferredoxin. Other related proteins are: - Escherichia coli 
hypothetical protein ydiQ. - Escherichia coli hypothetical protein ygcR.As a signature pattern 
for these proteins, a conserved region which is located in the central section was selected. 

30 

Consensus pattern: [IVA]-x-[KR]-x(2)-[DE]-[GD]-[GDE]-x(l ? 2)-[EQ]-x-[LIV]- x(4)-P-x- 
[LIVM](2)-[TAC]- 

[ 1] Finocchiaro G., Ikeda Y., Ito M, Tanaka K. Prog. Clin. BioL Res. 321:637-652(1990). 
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[ 2] Tsai M.H, Saier IVLH. Jr. Res. Microbiol- 146:397-404(1995). 
177. Endonuclease III signatures 

Escherichia coli endonuclease III (EC 4.2.99.18 ) (gene nth) [1] is a DNA repair enzyme that 
acts both as a DNA N-glycosylase, removing oxidized pyrimidines from DNA, and as an 
apurinic/apyrimidinic (AP) endonuclease, introducing a single-strand nick at the site from 
which the damaged base was removed. Endonuclease III is an iron-sulfur protein that binds a 
single 4Fe-4Scluster. The 4Fe-4S cluster does not seem to be important for catalytic activity, 
but is probably involved in the proper positioning of the enzyme along the DNA strand 
[2]. Endonuclease III is evolutionary related to the following proteins: - Fission yeast 
endonuclease III homolog (gene nthl) [3]. - Escherichia coli and related protein DNA repair 
protein mutY, which is an adenine glycosylase. MutY is a larger protein (350 amino acids) 
than endonuclease III (211 amino acids). - Micrococcus luteus ultraviolet N-glycosylase/AP 
lyase which initiates repair at cis-syn pyrimidine dimers. - ORF10 in plasmid pFVl of the 
thermophilic archaebacteria Methanobacterium thermoformicicum [4], Restriction methylase 
m.MthTI, which is encoded by this plasmid, generates 5-methylcytosine which is subject to 
deamination resulting in G-T mismatches. This protein could correct these mismatches. - 
Yeast hypothetical protein YAL015c. - Fission yeast hypothetical protein SpAC26A3.02. - 
Caenorhabditis elegans hypothetical protein R10E4.5. - Methanococcus jannaschii 
hypothetical protein MJ0613.The 4Fe-4S cluster is bound by four cysteines which are all 
located in a 17amino acid region at the C-terminal end of endonuclease III. A similar region 
is also present in the central section of mutY and in the C-terminus of ORFlOand of the 
Micrococcus UV endonuclease. The 4Fe-4S cluster region does not exist in YAL015c. Two 
signature patterns for these proteins were developed: the first corresponds to the core of the 
iron-sulfur binding domain, the second corresponds to the best conserved region in the 
catalytic core of these enzymes. 

Consensus pattern: C-x(3)-[KRS]-P-[KRAGL]-C-x(2)-C-x(5)-C [The four Cs are 4Fe-4S 
ligands]- 

Consensus pattern: [GST]-x-[LIVMF]-P-x(5)-[LIVMW]-x(2 ? 3)-[LI]-[PAS]-G-V-[GA]- x(3)- 
[GAC]-x(3)-[LIVM]-x(2)-[SALV]-[LIVMFYW]-[GANK]- 
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[ 1] Kuo C.-F., McRee D., Fisher C.L., OHandley S.F., Cunnigham R.P., Tainer J.A. Science 
258:434-440(1992). 

[ 2] Thomson A.J. Curr. Biol. 3:173-174(1993). 

[ 3] Roldan-Arjona T., Anselmino C, Lindahl T. Nucleic Acids. Res. 3307-3312(1996). 
[ 4] Noelling J., van Eeden F.J.M., Eggen R.I.L., de Vos W.M. Nucleic Acids Res. 20:6501- 
6507(1992). 

178. (Epimerase) NAD dependent epimerase/dehydratase family 

This family of proteins utilize NAD as a cofactor. The proteins in this family use 
nucleotide-sugar substrates for a variety of chemical reactions. 

[1] Thoden JB, Hegeman AD, Wesenberg G, Chapeau MC, Frey PA, Holden HM, 
Biochemistry 1997;36:6294-6304. 

179. Exonuclease 

This family includes a variety of exonuclease proteins, such as ribonuclease T and the 
epsilon subunit of DNA polymerase III. 

[1] Koonin EV, Deutscher MP, Nucleic Acids Res 1993;21:2521-2522. 

180. ENTH 
ENTH domain 

[1] Kay BK, Yamabhai M, Wendland B, Emr SD; Medline: 99156083, Identification of a 
novel domain shared by putative components of the endocytic and cytoskeletal machinery. 
Protein Sci 1999;8:435-438. 

The ENTH (Epsin N-terminal homology) domain is found in proteins involved in endocytosis 
and cytoskeletal machinery. The function of the ENTH domain is unknown. 



Number of members: 29 
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181. (elF-lA) Eukaryotic initiation factor 1A signature 

Eukaryotic translation initiation factor 1A (elF-lA) [1] (formerly known aseiF-4C) is a 
protein that seems to be required for maximal rate of protein biosynthesis. It enhances 
ribosome dissociation into subunits and stabilizesthe binding of the initiator Met-tRNA to 
5 40S ribosomal subunits. eIF-1 A is a hydrophilic protein of about 15 to 17 Kd. Archaebacteria 
also seem to possess a elF-lA homolog. As a signature pattern, a conserved region in the 
central section of these proteins was selected. 

Consensus pattern: [IM]-x-G-x-[GS]-[KRH]-x(4)-[CL]-x-D-G-x(2)-R-x(2)-[RH]-I- x-G 

10 

[ 1] Wei C.-L., Kainuma M., Hershey J.W.B. J. Biol. Chem. 270:22788-22794(1995) . 

182. (eIF-5A) Eukaryotic initiation factor 5A hypusine signature 

1 5 Eukaryotic initiation factor 5A (eIF-5A) (formerly known as eIF-4D) [1,2] is a small protein 
whose precise role in the initiation of protein synthesis is not known. It appears to promote 
the formation of the first peptide bond. eIF-5Aseems to be the only eukaryotic protein to 
contain an hypusine residue. Hypusine is derived from lysine by the post-translational 
addition of a butylamino group (from spermidine) to the epsilon-amino group of lysine. The 

2 0 hypusine group is essential to the function of eIF-5 A. A hypusine-containing protein has been 
found in archaebacteria such as Sulfolobus acidocaldarius or Methanococcus jannaschii; this 
protein is highlysimilar to eIF-5A and could play a similar role in protein biosynthesis. The 
signature developed for eIF-5A is centered around the hypusine residue. 

2 5 Consensus pattern: [PT]-G-K-H-G-x-A-K [The first K is modified to hypusine] 

[ 1] Park M.H., Wolff E.C., Folk J.E. Biofactors 4:95-104(1993). 

[ 2] Schnier J., Schwelberger H.G., Smit-McBride Z., Kang H.A. ? Hershey J.W.B. Mol. Cell. 
Biol. 11:3105-3114(1991). 

30 

183. (efhand) S-100/ICaBP type calcium binding protein signature 

S-100 are small dimeric acidic calcium and zinc-binding proteins [1] abundant in the brain. 
They have two different types of calcium-binding sites: a low affinity one with a special 
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structure and a 'normal' EF-hand type high affinity site. The vitamin-D dependent intestinal 
calcium-binding proteins (ICaBP or calbindin 9 Kd) also belong to this family of proteins, 
but it does not form dimers. In the past years the sequences of many new members of this 
family have been determined (for reviews see [2,3,4]); in most cases the function of these 
proteins is not yet known, although it is becoming clearthat they are involved in cell growth 
and differentiation, cell cycle regulation and metabolic control. These proteins are: - 
Calcyclin (Prolactin receptor associated protein (PRA); clatropin; 2a9; 5B10; S100A6). - 
Calpactin I light chain (plO; pll; 42c; S100A10). - Calgranulin A (cystic fibrosis antigen 
(CFAg); MIF related protein 8 (MRP- 8); p8; S100A8). - Calgranulin B (MIF related protein 
14 (MRP-14); pl4; S100A9). - Calgranulin C. - Calgizzarin (S100C). - Placental calcium- 
binding protein (CAPL) (18a2; peL98; 42a; p9K; MTS1; metastatin; S100A4). - Protein S- 
100D (S100A5). - Protein S-100E (S100A3). - Protein S-100L (CAN19; S100A2). - 
Placental protein S-100P (S100E). - Psoriasin (S100A7). - Chemotactic cytokine CP-10 [5]. - 
Protein MRP-126 [6]. - Trichohyalin [7]. This is a large intermediate filament-associated 
protein that associates with keratin intermediate filaments (KIF); it contains a S- 100 type 
domain in its N-terminal extremity. A number of these proteins are known to bind calcium 
while others are not (plOfor example). Our EF-hand detecting pattern will fail to pick those 
proteins which have lost their calcium-binding properties. A pattern was developed which 
unambiguously picks up proteins belonging to this family. This pattern spans the region of 
the EF-hand high affinity site but makes no assumptions on the calcium-binding properties of 
this site. 

Consensus pattern: [LIVMFYW](2)-x(2)-[LK]-D-x(3)-[DN]-x(3)-[DNSG]-[FY]-x- [ES]- 
[FYVC]-x(2)-[LIVMFS]-[LIVMF] 

[ 1] Baudier J. (In) Calcium and Calcium Binding proteins, Gerday C, Bollis L., Giller R., 
Eds., ppl02-113, Springer Verlag, Berlin, (1988). 

[ 2] Moncrief N.D., Kretsinger R.H., Goodman M. J. Mol. Evol. 30:522-562(1990). 

[ 3] Kligman D., Hilt D.C. Trends Biochem. Sci. 13:437-443(1988). 

[ 4] Schaefer B.W., Wicki R., Engelkamp D., Mattei M.-G., Heizmann C.W. Genomics 

25:638-643(1995). 

[ 5] Lackmann M., Cornish C.J., Simpson R.J., Moritz R.L., Geczy C.L. J. Biol. Chem. 
267:7499-7504(1992). 

[ 6] Nakano T., Graf T. Oncogene 7:527-534(1992). 
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[ 7] Lee S.-C, Kim L-G. ? Marekov L.N., O f Keefe EJ., Parry DAD, Steinert P.M., J. Biol. 
Chem. 268:12164-12176(1993). 

EF-hand calcium-binding domain 

Many calcium-binding proteins belong to the same evolutionary family and share 
a type of calcium-binding domain known as the EF-hand [1 to 5], This type of 
domain consists of a twelve residue loop flanked on both side by a twelve 
residue alpha-helical domain. In an EF-hand loop the calcium ion is 
coordinated in a pentagonal bipyramidal configuration. The six residues 
involved in the binding are in positions 1 ? 3, 5, 7, 9 and 12; these residues 
are denoted by X, Y, Z, -Y, -X and -Z. The invariant Glu or Asp at position 12 
provides two oxygens for liganding Ca (bidentate ligand). 

Listed below are the proteins which are known to contain EF-hand regions. For 
each type of protein the total number of EF-hand regions known or supposed to exist 
is indicated between parenthesis. This number does not include 
regions which clearly have lost their calcium-binding properties, or the 
atypical low-affinity site (which spans thirteen residues) found in the S-100/ 
ICaBP family of proteins [6], 

- Aequorin and Renilla luciferin binding protein (LBP) (Ca=3). 

- Alpha actinin (Ca=2). - Calbindin (Ca=4). 

- Calcineurin B subunit (protein phosphatase 2B regulatory subunit) (Ca=4). 

- Calcium-binding protein from Streptomyces erythraeus (Ca=3?). 

- Calcium-binding protein from Schistosoma mansoni (Ca=2?). 

- Calcium-binding proteins TCBP-23 and TCBP-25 from Tetrahymena thermophila 
(Ca=4?). - Calcium-dependent protein kinases (CDPK) from plants (Ca=4). 

- Calcium vector protein from amphoxius (Ca=2). 

- Calcyphosin (thyroid protein p24) (Ca=4?). 

- Calmodulin (Ca=4 ? except in yeast where Ca=3). 

- Calpain small and large chains (Ca=2). - Calretinin (Ca=6). 

- Calcyclin (prolactin receptor associated protein) (Ca=2). 

- Caltractin (centrin) (Ca=2 or 4). 

- Cell Division Control protein 31 (gene CDC31) from yeast (Ca=2?). 

- Diacylglycerol kinase (EC 2.7.1.107) (DGK) (Ca=2). 

- FAD-dependent glycerol-3 -phosphate dehydrogenase (EC 1.1.99.5) from 
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mammals (Ca=l). - Fimbrin (plastin) (Ca=2). 

- Flagellar calcium-binding protein (lfiB) from Trypanosoma cruzi (Ca=l or 2). 

- Guanylate cyclase activating protein (GCAP) (Ca=3). 

- Inositol phospholipid-specific phospholipase C isozymes gamma- 1 and delta- 1 
(Ca=2) [10]. - Intestinal calcium-binding protein (ICaBPs) (Ca=2). 

- MIF related proteins 8 (MRP-8 or CFAG) and 14 (MRP-14) (Ca=2). 

- Myosin regulatory light chains (Ca=l). - Oncomodulin (Ca=2). 

- Osteonectin (basement membrane protein BM-40) (SPARC) and proteins that 
contains an 'osteonectin' domain (QR1, matrix glycoprotein SCI) (see the 
entry <PDOC00535>) (Ca=l). - Paralbumins alpha and beta (Ca=2). 

- Placental calcium-binding protein (18a2) (nerve growth factor induced 
protein 42a) (p9k) (Ca=2). 

- Recoverins (visinin, hippocalcin, neurocalcin, S-modulin) (Ca=2 to 3). 

- Reticulocalbin (Ca=4). - S-100 protein, alpha and beta chains (Ca=2). 

- Sarcoplasmic calcium-binding protein (SCPs) (Ca=2 to 3). 

- Sea urchin proteins Spec 1 (Ca=4), Spec 2 (Ca=4?), Lps-1 (Ca=8). 

- Serine/threonine protein phosphatase rdgc (EC 3.1.3.16) from Drosophila 
(Ca=2) - Sorcin V19 from hamster (Ca=2). - Spectrin alpha chain (Ca=2). 

- Squidulin (optic lobe calcium-binding protein) from squid (Ca=4). 

- Troponins C; from skeletal muscle (Ca=4), from cardiac muscle (Ca=3), from 
arthropods and molluscs (Ca=2). 

There has been a number of attempts [7,8] to develop patterns that pick-up EF- 
hand regions, but these studies were made a few years ago when not so many 
different families of calcium-binding proteins were known. Therefore 
a new pattern was developed which takes into account all published sequences. This 
pattern includes the complete EF-hand loop as well as the first residue which 
follows the loop and which seem to always be hydrophobic. 

-Consensus pattern: D-x-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LIVMC]- 
[DENQSTAGC]-x(2)-[DE]-[LIVMFYW] 

-Note: positions 1 (X), 3 (Y) and 12 (-Z) are the most conserved. 

-Note: the 6th residue in an EF-hand loop is, in most cases a Gly, but the number of 

exceptions to this 'rule' has gradually increased and therefore the pattern should include all 
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the different residues which have been shown to exist in this position in functional Ca- 
binding sites. 

-Note: the pattern will, in some cases, miss one of the EF-hand regions in some proteins 
with multiple EF-hand domains. 

[ 1] Kawasaki H., Kretsinger R.H. Protein Prof. 2:305-490(1995).[ 2] Kretsinger R.H. Cold 
Spring Harbor Symp. Quant. Biol. 52:499-510(1987). 

[ 3] Moncrief N.D., Kretsinger R.H., Goodman M. J. Mol. Evol. 30:522-562(1990). 
[ 4] Nakayama S., Moncrief N.D., Kretsinger R.H. J. Mol. Evol. 34:416-448(1992). 
[ 5] Heizmann C.W., Hunziker W. Trends Biochem. Sci. 16:98-103(1991). 
[ 6] Kligman D., Hilt D.C. Trends Biochem. Sci. 13:437-443(1988). 
[ 7] Strynadka N.C.J., James M.N.G. 

Annu. Rev. Biochem. 58:951-98(1989). 
[ 8] Haiech J., Sallantin J. Biochimie 67:555-560(1985). 

[ 9] Chauvaux S., Beguin P., Aubert J.-P., Bhat K.M., Gow L.A., Wood T.M., Bairoch A. 
Biochem. J. 265:261-265(1990). 

[10] Bairoch A., Cox J.A. FEBS Lett. 269:454-456(1990). 
184. Enolase signature 

Enolase (EC 4.2.1.11 ) is a glycolytic enzyme that catalyzes the dehydration of2-phospho-D- 
glycerate to phosphoenolpyruvate [1]. It is a dimeric enzyme that requires magnesium both 
for catalysis and stabilizing the dimer. Enolase is probably found in all organisms that 
metabolize sugars. In vertebrates, there are three different tissue-specific isozymes: alpha 
present in most tissues, beta in muscles and gamma found only in nervous tissues. Tau- 
crystallin, one of the major lens proteins in some fish, reptiles and birds, has been shown [2] 
to be evolutionary related to enolase. As a signature pattern for enolase, the best conserved 
region was selected, it is located in the C-terminal third of the sequence.- 

Consensus pattern: [LIV](3)-K-x-N-Q-I-G-[ST]-[LIV]-[ST]-[DE]-[STA] 
[ 1] Lebioda L., Stec B., Brewer J.M. J. Biol. Chem. 264:3685-3693(1989). 
[ 2] Wistow G., Piattigorsky J. Science 236:1554-1556(1987). 
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185. (F-actin_cap_A) F-actin capping protein alpha subunit signatures 

The F-actin capping protein binds in a calcium-independent manner to the fast growing ends 
of actin filaments (barbed end) thereby blocking the exchange of subunits at these ends. 
Unlike gelsolin and severin this protein does not sever actin filaments. The F-actin capping 
protein is a heterodimer composed of two unrelated subunits: alpha and beta/The alpha 
subunit is a protein of about 268 to 286 amino acid residues whose sequence is well 
conserved in eukaryotic species [1]. As signature patterns two highly conserved regions in the 
C-terminal section of the alpha subunit were selected. 

Consensus pattern: V-H-[FY](2)-E-D-G-N-V 
Consensus pattern: F-K-[AE]-L-R-R-x-L-P- 

[ 1] Cooper J.A., Caldwell J.E., Gattermeir D.J., Torres M.A., Amatruda J.F., Casella J.F. 
Cell Motil. Cytoskeleton 18:204-214(1991). 

186. F-box domain 

[1] Bai C, Sen P, Hofmann K, Ma L, Goebl M, Harper JW, Elledge SJ, Cell 
1996;86:263-274. [2] Skowyra D, Craig KL, Tyers M, Elledge SJ, Harper JW, Cell 
1997;91:209-219. 

187. F-protein 

Negative factor, (F-Protein) or Nef. 

[1] Arold S, Franken P, Strub M-P, Hoh F, Benichou S, Benarous R, Dumas C; Medline: 
98035457, The crystal structure of HIV-1 Nef protein bound to the Fyn kinase SH3 domain 
suggests a role for this complex in altered T cell receptor signalling Structure 1997;5:1361- 
1372. 

Nef protein accelerates virulent progression of AIDS by its interaction with cellular proteins 
involved in signal transduction and host cell activation. Nef has been shown to bind 
specifically to a subset of the Src kinase family. 
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Number of members: 1013 
188. (FAD_binding_2) 

Fumarate reductase / succinate dehydrogenase FAD-binding site 

In bacteria two distinct, membrane-bound, enzyme complexes are responsible for the 
interconversion of fumarate and succinate (EC 1.3.99.1): fumarate reductase (Frd) is used in 
anaerobic growth, and succinate dehydrogenase (Sdh) is used in aerobic growth. Both 
complexes consist of two main components: a membrane-extrinsic component composed of a 
FAD-binding flavoprotein and an iron-sulfur protein; and an hydrophobic component 
composed of a membrane anchor protein and/or a cytochrome B. 

In eukaryotes mitochondrial succinate dehydrogenase (ubiquinone) (EC 1.3.5.1) is an enzyme 
composed of two subunits: a FAD flavoprotein and and iron-sulfur protein. 

The flavoprotein subunit is a protein of about 60 to 70 Kd to which FAD is covalently bound 
to a histidine residue which is located in the N-terminal section of the protein [1]. The 
sequence around that histidine is well conserved in Frd and Sdh from various bacterial and 
eukaryotic species [2] and can be used as a signature pattern. 

Consensus patternR-[ST]-H-[ST]-x(2)-A-x-G-G [H is the FAD binding site] Sequences 
known to belong to this class detected by the pattern ALL. 

[ 1] Blaut M., Whittaker K., Valdovinos A., Ackrell B.A., Gunsalus R.P., Cecchini G. J. Biol. 
Chem. 264:13599-13604(1989). 

[ 2] Birch-Machin M.A., Farnsworth L., Ackrell B.A., Cochran B., Jackson S., Bindoff L.A., 
Aitken A., Diamond A.G., Turnbull D.M. J. Biol. Chem. 267:11553-11558(1992). 

189. Fatty acid desaturases signatures (FA_desaturase) 

Fatty acid desaturases (EC 1.14.99.-) are enzymes that catalyze the insertion of a double bond 
at the delta position of fatty acids. There seems to be two distinct families of fatty acid 
desaturases which do not seem to be evolutionary related. Family 1 is composed of: - 
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Stearoyl-CoA desaturase (SCD) (EC 1.14.99.5 ) [1]. SCD is a key regulatory enzyme of 
unsaturated fatty acid biosynthesis. SCD introduces a cis double bond at the delta(9) position 
of fatty acyl-CoA's such as palmitoleoyl- and oleoyl-CoA. SCD is a membrane-bound 
enzyme that is thought to function as a part of a multienzyme complex in the endoplasmic 
reticulum of vertebrates and fungi. As a signature pattern for this family a conserved region 
in the C-terminal part of these enzymes was selected, this region is rich in histidine residues 
and in aromatic residues. Family 2 is composed of: - Plants stearoyl-acyl-carrier-protein 
desaturase (EC 1.14.99.6 ) [2], these enzymes catalyze the introduction of a double bond at 
the delta(9) position of steraoyl-ACP to produce oleoyl-ACP. This enzyme is responsible for 
the conversion of saturated fatty acids to unsaturated fatty acids in the synthesis of vegetable 
oils. - Cyanobacteria desA [3] an enzyme that can introduce a second cis double bond at the 
delta(12) position of fatty acid bound to membranes glycerolipids. DesA is involved in 
chilling tolerance; the phase transition temperature of lipids of cellular membranes being 
dependent on the degree of unsaturation of fatty acids of the membrane lipids. As a signature 
pattern for this family a conserved region in the C-terminal part of these enzymes was 
selected. 

Consensus pattern: G-E-x-[FY]-H-N-[FY]-H-H-x-F-P-x-D-Y- 

Consensus pattern: [ST]-[SA]-x(3)-[QR]-[LI]-x(5,6)-D-Y-x(2)-[LIVMFYW]-[LIVM]- [DE]- 

[ 1] Kaestner K.H., Ntambi J.M., Kelly T.J. Jr., Lane M.D. J. Biol. Chem. 264:14755- 
14761(1989). 

[ 2] Shanklin J., Somerville C.R. Proc. Natl. Acad. Sci. U.S.A. 88:2510-2514(1991). 
[ 3] Wada H., Gombos Z., Murata N. Nature 347:200-203(1990). 

190. Fructose-l-6-bisphosphatase active site (FBPase) 

Fructose-l,6-bisphosphatase (EC 3.1.3.11 ) (FBPase) [1], a regulatory enzyme in 
gluconeogenesis, catalyzes the hydrolysis of fructose 1,6-bisphosphate to fructose 6- 
phosphate. It is involved in many different metabolic pathways and found in most 
organisms.Sedoheptulose-l,7-bisphosphatase (EC 3.1.3.37) (SBPase) [2] is an enzyme found 
plant chloroplast and in photosynthetic bacteria that catalyzes the hydrolysis of sedoheptulose 
1,7-bisphosphate to sedoheptulose 7-phosphate, a step in the Calvin's reductive pentose 
phosphate cycle. It is functionally and structurally related to FBPase. In mammalian FBPase, 
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a lysine residue has been shown to be involved in the catalytic mechanism [3]. The region 
around this residue is highly conserved and can be used as a signature pattern for FBPase and 
SBPase. It must be noted that, in some bacterial FBPase sequences, the active site lysine is 
replaced by an arginine 

Consensus pattern: [AG]-[RK]-L-x(l,2)-[LIV]-[FY]-E-x(2)-P-[LIVM]-[GSA] [K/R is the 
active site residue] - 

[ 1] Benkovic S.J., DeMaine M.M. Adv. Enzymol. 53:45-82(1982). 

[ 2] Raines C.A., Lloyd J.C., Willingham N.M., Potts S., Dyer T.A. Eur. J. Biochem. 

205:1053-1059(1992). 

[ 3] Ke H., Thorpe CM., Seaton BA, Lipscomb W.N., Marcus F. J. Mol. Biol. 212:513- 
539(1989). 

191. FGGY family of carbohydrate kinases signatures * 

It has been shown [1] that four different type of carbohydrate kinases seem to be evolutionary 
related. These enzymes are: - L-fucolokinase (EC 2.7.1 .5 1) (gene fucK). - Gluconokinase 
(EC 2.7.1.12 ) (gene gntK). - Glycerokinase (EC 2.7.1.30 ) (gene glpK). - Xylulokinase (EC 
2.7.1.17 ) (gene xylB). - L-xylulose kinase (EC 2.7.1 .53) (gene lyxK).These enzymes are 
proteins of from 480 to 520 amino acid residues. As consensus patterns for this family of 
kinases two conserved regionswere selected, one in the central section, the other in the C- 
terminal section. 

Consensus pattern: [MFYGS]-x-[PST]-x(2)-K-[LIVMFYW]-x-W-[LIVMF]-x-[DENQTKR]- 
[ENQH]- 

Consensus pattern: [GSA]-x-[LIVMFYW]-x-G-[LIVM]-x(7,8)-[HDENQ]-[LIVMF]-x(2)- 
[AS]-[STAIVM]-[LIVMFY]-[DEQ]- 

[ 1] Reizer A., Deutscher J., Saier M.H. Jr., Reizer J. Mol. Microbiol. 5:1081-1089(1991). 



192. FKBP-type peptidyl-prolyl cis-trans isomerase signatures/profile (FKBP) 
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FKBP [1,2,3] is the major high-affinity binding protein, in vertebrates, for the 
immunosuppressive drug FK506. It exhibits peptidyl-prolyl cis-trans isomerase activity (EC 
5.2.1.8 ) (PPIase or rotamase). PPIase is an enzyme that accelerates protein folding by 
catalyzing the cis-trans isomerization of proline imidic peptide bonds in oligopeptides [4] .At 
least three different forms of FKBP are known in mammalian species: - FKBP-12, which is 
cytosolic and inhibited by both FK506 and rapamycin. - FKBP-13, which is membrane 
associated and inhibited by both FK506 and rapamycin. - FKBP-25, which is preferentially 
inhibited by rapamycin. These forms of FKBP are evolutionary related and show extensive 
similarities^ ,6,7] with the following proteins: - Fungal FKBP. - Mammalian hsp binding 
immunophilin (HBI) (also called p59). HBI is a protein which binds to hsp90 and contains 
two FKBP-like domains in its N- terminal section - the first of which seems to be functional. 
- The C-terminal part of the cell-surface protein mip from Legionella; a protein associated 
with macrophage infection by an unknown mechanism. - Escherichia coli slyD [8], a protein 
with a N-terminal FKBP domain followed by an histidine-rich metal-binding domain. - 
Escherichia coli fkpA. - Escherichia coli fklB (FKBP22). - Escherichia coli slpA. - Bacterial 
trigger factor (Tig). - Streptomyces hygroscopus and chrysomallus FK506-binding protein. - 
Chlamydia trachomatis 27 Kd membrane protein. - Neisseria meningitidis strain CI 14 
PPiase. - Probable PPiases from Haemophilus influenzae (HI0754), Methanococcus 
jannaschii (MJ0278 and MJ0825), Pseudomonas fluorescens and Pseudomonase aeruginosa. 
Two signature patterns for these proteins were developed. One is based on a conserved region 
in the N-terminus of FKBP, the other is located in the central section. The profile for FKBP 
spans the complete domain. 

Consensus pattern: [LIVMC]-x-[YF]-x-[GVL]-x(l,2)-[LFT]-x(2)-G-x(3)-[DE]- [STAEQK]- 
[STAN]- 

Consensus pattern: [LIVMFY]-x(2)-[GA]-x(3,4)-[LIVMF]-x(2)-[LIVMFHK]-x(2)-G- x(4)- 
[LIVMF]-x(3)-[PSGAQ]-x(2)-[AG]-[FY]-G- 

[ 1] Tropschug M., Wachter E., Mayer S., Schoenbrunner E.R., Schmid F.X. Nature 346:674- 
677(1990). 

[ 2] Stein R.L. Curr. Biol. 1:234-236(1991). 

[ 3] Siekierka J.J., Widerrecht G., Greulich H., Boulton D., Hung S.H.Y., Cryan J., Hodges 

P.J., Sigal N.H. J. Biol. Chem. 265:21011-21015(1990). 

[ 4] Fischer G., Schmid F.X. Biochemistry 29:2205-2212(1990). 
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[ 5] Trandinh C.C., Pao G.M., Saier M.H. Jr. FASEB J. 6:3410-3420(1992). 

[ 6] Galat A. Eur. J. Biochem. 216:689-707(1993). 

[ 7] Hacker J., Fischer G. Mol. Microbiol. 10:445456(1993). 

[ 8] Wuelfing C, Lomardero J., Plueckthun A. J. Biol. Chem. 269:2895-2901(1994). 

193. MAPEG family (aka: FLAP/GST2/LTC4S family signature) 

The following mammalian proteins are evolutionary related [1]: 

- Leukotriene C4 synthase (EC 2.5.1.37) (gene LTC4S), an enzyme that catalyzes 
the production of LTC4 from LTA4. 

- Microsomal glutathione S-transferase II (EC 2.5.1.18) (GST-II) (gene GST2), an 
enzyme that can also produces LTC4 fron LTA4. 

- 5-lipoxygenase activating protein (gene FLAP), a protein that seems to be 
required for the activation of 5-lipoxygenase. 

These are proteins of 150 to 160 residues that contain three transmembrane segments. 
As a signature pattern, a conserved region between the first and second transmembrane 
domains was selected. 

Consensus patternc: G-x(3)-F-E-R-V-[FY]-x-A-[NQ]-x-N-C 

[1] Jakobsson P.-J., Mancini J.A., Ford-Hutchinson A.W. J. Biol. Chem. 271:22203- 
22210(1996). 

194. FMN-dependent alpha-hydroxy acid dehydrogenases active site (FMN_dh) 
A number of oxidoreductases that act on alpha-hydroxy acids and which are FMN-containing 
flavoproteins have been shown [1,2,3] to be structurally related; these enzymes are: - Lactate 
dehydrogenase (EC 1.1.2.3) . which consists of a dehydrogenase domain and a heme-binding 
domain called cytochrome b2 and which catalyzes the conversion of lactate into pyruvate. - 
Glycolate oxidase (EC 1.1.3.15^ ((S)-2-hydroxy-acid oxidase), a peroxisomal enzyme that 
catalyzes the conversion of glycolate and oxygen to glyoxylate and hydrogen peroxide. - 
Long chain alpha-hydroxy acid oxidase from rat (EC 1 ,1.3.15), a peroxisomal enzyme. - 
Lactate 2-monooxygenase (EC 1.13.12.4^ (lactate oxidase) from Mycobacterium smegmatis, 
which catalyzes the conversion of lactate and oxygen to acetate, carbon dioxide and water. - 
(S)-mandelate dehydrogenase from Pseudomonas putida (gene mdlB), which catalyzes the 
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reduction of (S)-mandelate to benzoylformate. The first step in the reaction mechanism of 
these enzymes is the abstraction of the proton from the alpha-carbon of the substrate 
producing a carbanion which can subsequently attach to the N5 atom of FMN. A conserved 
histidine has been shown [4] to be involved in the removal of the proton. The region around 
this active site residue is highly conserved and contains an arginine residue which is involved 
in substrate binding. 

Consensus pattern: S-N-H-G-[AG]-R-Q [H is the active site residue] [R is a substrate-binding 
residue]- 

[ 1] Giegel D.A., Williams C.H. Jr., Massey V. J. Biol. Chem. 265:6626-6632(1990). 
[ 2] Tsou A.Y., Ransom S.C., Gerlt J.A., Buechter D.D., Babbitt P.C., Kenyon G.L. 
Biochemistry 29:9856-9862(1990). 

[ 3] Le K.H.D., Lederer F. J. Biol. Chem. 266:20877-20880(1991). 
[ 4] Lindqvist Y., Branden C.-I. J. Biol. Chem. 264:3624-3628(1989). 

195. Flavin-binding monooxygenase-like (FMO-like) 

This family includes FMO proteins, cyclohexanone monooxygenase 

196. (FPGS) 

Folylpolyglutamate synthase signatures (aka Murjigase) 

Folylpolyglutamate synthase (EC 6.3.2.17) (FPGS) [1] is the enzyme of folate metabolism 
that catalyzes ATP-dependent addition of glutamate moieties to tetrahydrofolate. 

Its sequence is moderately conserved between prokaryotes (gene folC) and eukaryotes. We 
developed two signature patterns based on the conserved regions which are rich in glycine 
residues and could play a role in the catalytical activity and/or in substrate binding. 



Consensus pattern [LIVMFY]-x-[LIVM]-[STAG]-G-T-[NK]-G-K-x-[ST]-x(7)- [LIVM](2> 
x(3)-[GSK] Sequences known to belong to this class detected by the pattern ALL. 
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Consensus pattern[LIVMFY](2)-E-x-G-[LIVM]-[GA]-G-x(2)-D-x-[GST]-x-[LIVM](2) 
Sequences known to belong to this class detected by the pattern ALL. 

[ 1] Shane B., Garrow T., Brenner A., Chen L., Choi Y.J., Hsu J.C., Stover P. Adv. Exp. 
Med. Biol. 338:629-634(1993). 



197. FYVE zinc finger 

The FYVE zinc finger is named after four proteins that it has been found in: Fabl, 
YOTB/ZK632.12, Vacl, and EEA1. The FYVE finger has been shown to bind two Zn++ 
ions [1]. The FYVE finger has eight potential zinc coordinating cysteine positions. Many 
members of this family also include two histidines in a motif R+HHC+XCG, where + 
represents a charged residue and X any residue. Members were included which do not 
conserve these histidine residues but are clearly related. 

[1] Stenmark H, Aasland R, Toh BH, D'Arrigo A, J Biol Chem 1996;271:24048- 
24054. [2] Gaullier JM, Simonsen A, D'Arrigo A, Bremnes B, Stenmark H, Aasland R, 
Nature 1998;394:432-433. 

198. F_actin_cap_B 

F-actin capping protein beta subunit signature 

The F-actin capping protein binds in a calcium-independent manner to the fast growing ends 
of actin filaments (barbed end) thereby blocking the exchange of subunits at these ends. 
Unlike gelsolin and severin this protein does not sever actin filaments. The F-actin capping 
protein is a heterodimer composed of two unrelated subunits: alpha and beta. 

The beta subunit is a protein of about 280 amino acid residues whose sequence is well 
conserved in eukaryotic species [1]. As a signature pattern a conserved hexapeptide in the N- 
terminal section of the beta subunit was selected. 

Consensus pattern: C-D-Y-N-R-D Sequences known to belong to this class detected by the 
pattern ALL. 
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[ 1] Amatruda J.F., Cannon J.F., Tatchell K., Hug C, Cooper J.A. Nature 344:352-354(1990). 
199. Isopenicillin N synthetase signatures (Fe_Asc_oxidored) 

Isopenicillin N synthetase (IPNS) [1,2] is a key enzyme in the biosynthesis of penicillin and 
cephalosporin. In the presence of oxygen, it removes iron and ascorbate, four hydrogen atoms 
from L-(alpha-aminoadipyl)-L-cysteinyl-d-valine to form the azetidinone and thiazolidine 
rings of isopenicillin. IPNS is an enzyme of about 330 amino-acid residues. Two cysteines 
are conserved in fungal and bacterial IPNS sequences; these may be involved in iron-binding 
and/or substrate-binding. Cephalosporium acremonium DAOCS/DACS [3] is a Afunctional 
enzyme involved in cephalosporin biosynthesis. The DAOCS domain, which is structurally 
related to IPNS, catalyzes the step from penicillin N to deacetoxy-cephalosporin C - used as a 
substrate by DACS to form deacetylcephalosporin C. Streptomycesclavuligerus possesses a 
monofunctional DAOCS enzyme (gene ceffl) [4] also related to IPNS. Two signature patterns 
for these enzymes were derived, centered around the conserved cysteine residues. 

Consensus pattern: [RK]-x-[STA]-x(2)-S-x-C-Y-[SL]- 

Consensus pattern: [LIVM](2)-x-C-G-[STA]-x(2)-[STAG]-x(2)-T-x-[DNG]- 

[ 1] Martin J.F. Trends Biotechnol. 5:306-308(1987). 

[ 2] Chen G., Shiffman D., Mevarech M., Aharonowitz Y. Trends Biotechnol. 8:105- 
111(1990). 

[ 3] Samson S.M., Dotzlaf J.E., Slisz M.L., Becker G.W., van Frank R.M., Veal L.E., Yeh 

W.K., Miller J.R., Queener S.W., Ingolia T.D. Bio/Technology 5:1207-1214(1987). 

[ 4] Kovacevic S., Weigel B.J., Tobin M.B., Ingolia T.D., Miller J.R. J. Bacteriol. 171:754- 

760(1989). 

200. Fibrillarin signature 

Fibrillarin [1] is a component of a nucleolar small nuclear ribonucleoprotein(SnRNP) particle 
thought to participate in the first step of the processing of pre-rRNA. In mammals, fibrillarin 
is associated with the U3, U8 and U13small nuclear RNAs [2]. Fibrillarin is an extremely 
well conserved protein of about 320 amino acid residues. Structurally it consists of three 
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different domains: - An N-terminal domain of about 80 amino acids which is very rich in 
glycine and contains a number of dimethylated arginine residues (DMA). - A central domain 
of about 90 residues which resembles that of RNA-binding proteins and contains an 
octameric sequence similar to the RNP-2 consensus found in such proteins. - A C-terminal 
alpha-helical domain. A protein evolutionary related to fibrillarin has been found [3] in 
archaebacteria such as Methanococcus vannielii or voltae. This protein (geneflpA) is 
involved in pre-rRNA processing. It lacks the Gly/Axg-rich N-terminal domain. As a 
signature pattern, a region was selected that starts with and encompases theRNP-2 like 
octapeptide sequence. 

Consensus pattern: [GST]-[LIVMAP]-V-Y-A-[IV]-E-[FY]-[SA]-x-R-x(2)-R-[DE] - 

[ 1] Aris J.P., Blobel G. Proc. Natl. Acad. Sci. U.S.A. 88:931-935(1991). 

[ 2] Bandziulis R.J., Swanson M.S., Dreyfuss G. Genes Dev. 3:431-437(1989). 

[ 3] Agha-Amiri K. J. Bacterid. 176:2124-2127(1994). 

201. Filamin/ABP280 repeat 

[1] Fucini P, Renner C, Herberhold C, Noegel AA, Holak TA, Nat Struct Biol 

1997;4:223-230. 

202. Fucosyl transferase 

This family of Fucosyltransferases are the enzymes transferring 
fucose from GDP-Fucose to GlcNAc in an alphal,3 linkage [1]. 

[1] Breton C, Oriol R, Imberty A; Glycobiology 1998;8:87-94. 

203. 2Fe-2S ferredoxins, iron-sulfur binding region signature (fer2A) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide 
variety of metabolic reactions. Ferredoxins can be divided into several subgroups depending 
upon the physiological nature of the iron sulfur cluster(s) and according to sequence 
similarities. One of these subgroups are the 2Fe-2S ferredoxins, which are proteins or 
domains of around one hundred amino acid residues that bind a single 2Fe-2S iron-sulfur 
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cluster. The proteins that are known [2] to belong to this family are listed below. - Ferredoxin 
from photosynthetic organisms; namely plants and algae where it is located in the chloroplast 
or cyanelle; and cyanobacteria. - Ferredoxin from archaebacteria of the Halobacterium genus. 
- Ferredoxin IV (gene pftA) and V (gene fdxD) from Rhodobacter capsulatus. - Ferredoxin in 
the toluene degradation operon (gene xylT) and naphthalene degradation operon (gene nahT) 
of Pseudomonas putida. - Hypothetical Escherichia coli protein yfaE. - The N-terminal 
domain of the Afunctional ferredoxin/ferredoxin reductase electron transfer component of the 
benzoate 1,2-dioxygenase complex (gene benC) from Acinetobacter calcoaceticus, the 
toluene 4-monooxygenase complex (gene tmoF), the toluate 1,2-dioxygenase system (gene 
xylZ), and the xylene monooxygenase system (gene xylA) from Pseudomonas. - The N- 
terminal domain of phenol hydroxylase protein p5 (gene dmpP) from Pseudomonas Putida. - 
The N-terminal domain of methane monooxygenase component C (gene mmoC) from 
Methylococcus capsulatus . - The C-terminal domain of the vanillate degradation pathway 
protein vanB in a Pseudomonas species. - The N-terminal domain of bacterial fumarate 
reductase iron-sulfur protein (gene frdB). - The N-terminal domain of CDP-6-deoxy-3,4- 
glucoseen reductase (gene ascD) from Yersinia pseudotuberculosis. - The central domain of 
eukaryotic succinate dehydrogenase (ubiquinone) iron- sulfur protein. - The N-terminal 
domain of eukaryotic xanthine dehydrogenase. - The N-terminal domain of eukaryotic 
aldehyde oxidase. In the 2Fe-2S ferredoxins, four cysteine residues bind the iron-sulfur 
cluster. Three of these cysteines are clustered together in the same region of the protein. Our 
signature pattern spans that iron-sulfur binding region. 

Consensus pattern: C-{C}-{C}-[GA]-{C}-C-[GAST]-{CPDEKRHFYW}-C [The three C's 
are 2Fe-2S ligands]- 

[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988).[ 2] Harayama S., Polissi A., Rekik M. 
FEBS Lett. 285:85-88(1991). 

Adrenodoxin family, iron-sulfur binding region signature (fer2B) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide 
variety of metabolic reactions. Ferredoxins can be divided into several subgroups depending 
upon the physiological nature of the iron sulfur cluster(s) and according to sequence 
similarities. One family of ferredoxins groups together the following proteins that all bind a 
single 2Fe-2S iron-sulfur cluster: - Adrenodoxin (ADX) (adrenal ferredoxin), a vertebrate 
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mitochondrial protein which transfers electrons from adrenodoxin reductase to cytochrome 
P450scc, which is involved in cholesterol side chain cleavage. - Putidaredoxin (PTX), a 
Pseudomonas putida protein which transfers electrons from putidaredoxin reductase to 
cytochrome P450-cam, which is involved in the oxidation of camphor. - Terpredoxin [2], a 
Pseudomonas protein which transfers electrons from terpredoxin reductase to cytochrome 
P450-terp, which is involved in the oxidation of alpha-terpineol. - Rhodocoxin [3], a 
Rhodococcus protein which transfers electrons from rhodocoxin reductase to cytochrome 
CYP116 (thcB), which is involved in the degradation of thiocarbamate herbicides. - 
Escherichia coli ferredoxin (gene fdx) [4] whose exact function is not yet known. - 
Rhodobacter capsulatus ferredoxin VI [5], which may transfer electrons to a yet 
uncharacterized oxygenase. - Caulobacter crescentus ferredoxin (gene fdxB) [6] .In these 
proteins, four cysteine residues bind the iron-sulfur cluster. Three of these cysteines are 
clustered together in the same region of the protein. Our signature pattern spans that iron- 
sulfur binding region. 

Consensus pattern: C-x(2)-[STAQ]-x-[STAMV]-C-[STA]-T-C-[HR] [The three C's are 2Fe- 
2S ligands]- 

[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988). 

[ 2] Peterson J.A., Lu J.-Y., Geisselsoder J., Graham-Lorence S., Carmona C, Witney F., 
Lorence M.C. J. Biol. Chem. 267:14193-14203(1992). 

[ 3] Nagy I., Schoofs G., Compernolle F., Proost P., Vanderleyden J., De Mot R. J. Bacterid. 
177:676-687(1995). 

[ 4] Ta D.T., Vickery L.E. J. Biol. Chem. 267:11120-11125(1992). 

[ 5] Naud I., Vincon M., Garin J., Gaillard J., Forest E., Jouanneau Y. Eur. J. Biochem. 

222:933-939(1994). 

[ 6] Amemiya K EMBL/Genbank: X51607. 



204. 4Fe-4S ferredoxins, iron-sulfur binding region signature (fer4) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide 
variety of metabolic reactions. Ferredoxins can be divided into several subgroups depending 
upon the physiological nature of the iron-sulfur clusters). One of these subgroups are the 
4Fe-4S ferredoxins, which are found in bacteria and which are thus often referred as 
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'bacterial-type' ferredoxins. The structure of these proteins [2] consists of the duplication of a 
domain of twenty six amino acid residues; each of these domains contains four cysteine 
residues that bind to a 4Fe-4S center. A number of proteins have been found [3] that include 
one or more 4Fe-4Sbinding domains similar to those of bacterial-type ferredoxins. These 
proteins are listed below (references are only provided for recently determined sequences). - 
The iron-sulfur proteins of the succinate dehydrogenase and the fumarate reductase 
complexes (EC 1.3.99.1V These enzyme complexes, which are components of the 
tricarboxylic acid cycle, each contain three subunits: a flavoprotein, an iron-sulfur protein, 
and a b-type cytochrome. The iron- sulfur proteins contain three different iron-sulfur centers: 
a 2Fe-2S, a 3Fe-3S and a 4Fe-4S. - Escherichia coli anaerobic glycerol-3-phosphate 
dehydrogenase (EC 1.1.99.5 ) This enzyme is composed of three subunits: A, B, and C. The C 
subunit seems to be an iron-sulfur protein with two ferredoxin-like domains in the N- 
terminal part of the protein. - Escherichia coli anaerobic dimethyl sulfoxide reductase. The B 
subunit of this enzyme (gene dmsB) is an iron-sulfur protein with four 4Fe-4S ferredoxin-like 
domains. - Escherichia coli formate hydrogenlyase. Two of the subunits of this oligomeric 
complex (genes hycB and hycF) seem to be iron-sulfur proteins that each contain two 4Fe-4S 
ferredoxin-like domains. - Methanobacterium formicicum formate dehydrogenase (EC 
1.2.1.2) . This enzyme is used by the archaebacteria to grow on formate. The beta chain of this 
dimeric enzyme probably binds two 4Fe-4S centers. - Escherichia coli formate 
dehydrogenases N and O (EC 1 .2.1.2 ). The beta chain of these two enzymes (genes fdnH and 
fdoH) are iron-sulfur proteins with four 4Fe-4S ferredoxin-like domains. - Desulfovibrio 
periplasms [Fe] hydrogenase (EC 1.18.99.1V The large chain of this dimeric enzyme binds 
three 4Fe-4S centers, two of which are located in the ferredoxin-like N-terminal region of the 
protein. - Methanobacterium thermoautrophicum methyl viologen-reducing hydrogenase 
subunit mvhB, which contains six tandemly repeated ferredoxin-like domains and which 
probably binds twelve 4Fe-4S centers. - Salmonella typhimurium anaerobic sulfite reductase 
(EC 1.8.1.-) [4]. Two of the subunits of this enzyme (genes asrA and asrC) seem to both bind 
two 4Fe-4S centers. - A Ferredoxin-like protein (gene fixX) from the nitrogen-fixation genes 
locus of various Rhizobium species, and one from the Nif-region of Azotobacter species. - 
The 9 Kd polypeptide of chloroplast photosystem I [5] (gene psaC). This protein contains 
two low potential 4Fe-4S centers, referred as the A and B centers. - The chloroplast frxB 
protein which is predicted to carry two 4Fe-4S centers. - An ferredoxin from a primitive 
eukaryote, the enteric amoeba Entamobea histolytica. - Escherichia coli hypothetical protein 
yjjW, a protein with a N-terminal region belonging to the radical activating enzymes family 
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(see <PDOC00834>) and two potential 4Fe-4S centers.The pattern of cysteine residues in the 
iron-sulfur region is sufficient todetect this class of 4Fe-4S binding proteins. 

Consensus pattern: C-x(2)-C-x(2)-C-x(3)-C-[PEG] [The four C's are 4Fe-4S ligands]- 

[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988). 

[ 2] Otaka E., Ooi T. J. Mol. Evol. 26:257-267(1987). 

[ 3] Beinert H. FASEB J. 4:2483-2492(1990). 

[ 4] Huang C.J., Barrett E.L. J. Bacteriol. 173:1544-1553(1991). 

[ 5] Knaff D.B. Trends Biochem. Sci. 13:460-461(1988). 

205. NifH/frxC family signatures (fer4_NifH) 

Nitrogenase (EC 1.18.6.1) [1] is the enzyme system responsible for biological nitrogen 
fixation. Nitrogenase is an oligomeric complex which consists of two components: 
component 1 which contains the active site for the reduction of nitrogen to ammonia and 
component 2 (also called the iron protein).Component 2 is a homodimer of a protein (gene 
nifH) which binds a single 4Fe-4S iron sulfur cluster [2]. In the nitrogen fixation process nifH 
is first reduced by a protein such as ferredoxin; the reduced protein then transfers electrons to 
component 1 with the concomitant consumption of ATP .A number of proteins are known to 
be evolutionary related to nifH. These proteins are: - Chloroplast encoded frxC (or chlL) 
protein [3]. FrxC is encoded on the chloroplast genome of some plant species, its exact 
function is not known, but it could act as an electron carrier in the conversion of 
protochlorophyllide to chlorophyllide. - Rhodobacter capsulatus proteins bchL and bchX [4]. 
These proteins are also likely to play a role in chlorophyll synthesis. There are a number of 
conserved regions in the sequence of these proteins: in the N-terminal section there is an 
ATP-binding site motif A' (P-loop) and in the central section there are two conserved 
cysteines which have been shown, in nifH, to be the ligands of the 4Fe-4S cluster. Two 
signatures patterns that correspond to the regions around these cysteines were developed. 

Consensus pattern: E-x-G-G-P-x(2)-[GA]-x-G-C-[AG]-G [C binds the iron-sulfur center]- 
Consensus pattern: D-x-L-G-D-V-V-C-G-G-F-[AG]-x-P [C binds the iron-sulfur center]- 



[ 1] Pau R.N. Trends Biochem. Sci. 14:183-186(1989). 
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[ 2] Georgiadis M.M., Komiya H., Chakrabarti P., Woo D., Kornuc J.J., Rees D.C. Science 
257:1653-1659(1992). 

[ 3] Fujita Y., Takahashi Y., Kohchi T., Ozeki H., Ohyama K., Matsubara H. Plant Mol. Biol. 
13:551-561(1989). 

[ 4] Burke D.H., Alberti M., Hearst J.E. J. Bacterid. 175:2407-2413(1993). 
206. Ferritin iron-binding regions signatures 

Ferritin [1,2] is one of the major non-heme iron storage proteins. It consists of a mineral core 
of hydrated ferric oxide, and a multi-subunit protein shell which englobes the former and 
assures its solubility in an aqueous environment. In animals the protein is mainly cytoplasmic 
and there are generally two or more genes that encodes for closely related subunits (in 
mammals there are two subunits which are known as H(eavy) and L(ight)). In plants ferritin 
is found in the chloroplast [3]. There are a number of well conserved region in the sequence of 
ferritins. Two of these regions to develop signature patterns were selected. The first pattern is 
located in the central part of the sequence of ferritin and it contains three conserved glutamate 
which are thought to be involved in the binding of iron. The second pattern is located in the 
C-terminal section, it corresponds to a region which forms a hydrophilic channel through 
which small molecules and ions can gain access to the central cavity of the molecule; this 
pattern also includes conserved acidic residues which are potential metal-binding sites. 

Consensus pattern: E-x-[KR]-E-x(2)-E-[KR]-[LF]-[LIVMA]-x(2)-Q-N-x-R-x-G-R [The 3 E's 
are potential iron ligands]- 

Consensus pattern: D-x(2)-[LIVMF]-[STAC]-[DH]-F-[LI]-[EN]-x(2)-[FY]-L-x(6)- [LIVM]- 
[KN] [The second D and the E are potential iron ligands]- 

[ 1] Crichton R.R., Charloteaux-Wauters M. Eur. J. Biochem. 164:485-506(1987). 
[ 2] Theil E.C. Annu. Rev. Biochem. 56:289-315(1987). 

[ 3] Ragland M., Briat J.-F., Gagnon J. 5 Laulhere J.-P., Massenet O., Theil E.C. J. Biol. 
Chem. 265:18339-18344(1990). 



207. Intermediate filaments signature (filament) 
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Intermediate filaments (IF) [1,2,3] are proteins which are primordial components of the 
cytoskeleton and the nuclear envelope. They generally form filamentous structures 8 to 14 
nm wide. IF proteins are members of a very large multigene family of proteins which has 
been subdivided in five major subgroups: - Type I: Acidic cytokeratins. - Type II: Basic 
cytokeratins. - Type III: Vimentin, desmin, glial fibrillary acidic protein (GFAP), peripherin, 
and plasticin. - Type IV: Neurofilaments L, H and M, alpha-internexin and nestin. - Type V: 
Nuclear lamins A, Bl, B2 and C. All IF proteins are structurally similar in that they consist 
of: a central rod domain comprising some 300 to 350 residues which is arranged in coiled- 
coiled alpha-helices, with at least two short characteristic interruptions; a N-terminal non- 
helical domain (head) of variable length; and a C-terminal domain (tail) which is also non- 
helical, and which shows extreme length variation between different IF proteins. While IF 
proteins are evolutionary and structurally related, they have limited sequence homologies 
except in several regions of the rod domain. A conserved region at the C-terminal extremity 
of the rod domain was used as a sequence pattern for this class of proteins. 

Consensus pattern: [IV]-x-[TACI]-Y-[RKH]-x-[LM]-L-[DE]- 

[ 1] Quinlan R., Hutchison C, Lane B. Protein Prof. 2:801-952(1995). 
[ 2] Steiner P.M., Roop D.R. Annu. Rev. Biochem. 57:593-625(1988). 
[ 3] Stewart M. Curr. Opin. Cell Biol. 2:91-100(1990). 

208. Flavodoxin signature 

Flavodoxins [1,E1] are electron-transfer proteins that function in various electron transport 
systems. Flavodoxins bind one FMN molecule, which serves as a redox-active prosthetic 
group. Flavodoxins are functionally interchangeable with ferredoxins. They have been 
isolated from prokaryotes, cyanobacteria, and some eukaryotic algae. The signature pattern 
for these proteins is derived from a conserved region in their N-terminal section, this region is 
involved in the binding of the FMN phosphate group. 

Consensus pattern: [LIV]-[LIVFY]-[FY]-x-[ST]-x(2)-[AGC]-x-T-x(3)-A-x(2)-[LIV]- 

[ 1] Wakabayashi S., Kimura K., Matsubara H., Rogers L.J. Biochem. J. 263:981-984(1989). 
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209. Growth factor and cytokines receptors family signatures (fn3) 

A number of receptors for lymphokines, hematopoeitic growth factors and growth hormone- 
related molecules have been found [1 to 5] to share a common binding domain. Receptors 
known to belong to this family are: - Cytokine receptor common beta chain. This chain is 
common to the IL-3, IL-5 and GM-CSF receptors. - Cytokine receptor common gamma 
chain. This chain is common to the IL-2, IL-4, IL-7 and IL-13 receptors. - Ciliary 
neurotrophic factor receptor (CNTFR). - Erythropoietin receptor (EPOR). - Granulocyte 
colony-stimulating factor receptor (G-CSFR). - Granulocyte-macrophage colony-stimulating 
factor receptor alpha chain (GM- CSFR). - Interleukin-2 receptor beta chain (IL2R-beta). - 
Interleukin-3 receptor alpha chain (IL3R). - Interleukin-4 receptor alpha chain (IL4R). - 
Interleukin-5 receptor alpha chain (IL5R). - Interleukin-6 receptor (IL6R). - Interleukin-7 
receptor alpha chain (IL7R). - Interleukin-9 receptor (IL9R). - Growth hormone receptor 
(GRHR). - Prolactin receptor (PRLR). - Thrombopoeitin receptor (TPOR).The conserved 
region constitutes all or part of the extracellular ligand-binding region and is about 200 amino 
acid residues long. In the N-terminal of this domain there are two pairs of cysteines known, in 

the growth hormone receptor, to be involved in disulfide bonds. + 

xxxxxxx + | C C C C Extracellular XXXXXXX Cytoplasmic | +- 

|.| |-| xxxxxxx + | 1 1 1 Transmembrane +-+ +- 

+ Two patterns to detect this family of receptors were used. The first one is derived from the 
first N-terminal disulfide loop, the second is a tryptophan-rich pattern located at the C- 
terminal extremity of the extracellular region. 

Consensus pattern: C-[LVFYR]-x(7,8)-[STIVDN]-C-x-W [The two C's are linked by a 
disulfide bond]- 

Consensus pattern: [STGL]-x-W-[SG]-x-W-S- 

[ 1] Bazan J.F. Biochem. Biophys. Res. Commun. 164:788-795(1989). 

[ 2] Bazan J.F. Proc. Natl. Acad. Sci. U.S.A. 87:6934-6938(1990). 

[ 3] Cosman D., Lyman S.D., Idzerda R.L., Beckmann M.P., Park L.S., Goodwin R.G., 

March C.J. Trends Biochem. Sci. 15:265-270(1990). 

[ 4] d'Andrea A.D., Fasman G.D., Lodish H.F. Cell 58:1023-1024(1989). 

[ 5] d'Andrea A.D., Fasman G.D., Lodish H.F. Curr. Opin. Cell Biol. 2:648-651(1990). 
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210, Phosphoribosylglycinamide formyltransferase active site (formyljransf) 
Phosphoribosylglycinamide formyltransferase (EC 2.1.2.2) (GART) [1] catalyzes the third 
step in de novo purine biosynthesis, the transfer of a formyl group to 5 - 
phosphoribosylglycinamide. In higher eukaryotes, GART is part of a multifunctional enzyme 
polypeptide that catalyzes three of the steps of purine biosynthesis. In bacteria, plants and 
yeast, GART is a monofunctional protein of about 200 amino-acid residues. In the 
Escherichia coli enzyme, an aspartic acid residue has been shown to be involved in the 
catalytic mechanism. The region around this active site residue is well conserved in GART 
from prokaryotic and eukaryotic sources and can be used as a signature pattern. Mammalian 
formyltetrahydrofolate dehydrogenase (EC 1.5.1.6 ) [2] is a cytosolicenzyme responsible for 
the NADP-dependent decarboxylative reduction of 10-formyltetrahydrofolate into 
tetrahydrofolate. It is a protein of about 900 amino acids consisting of three domains; the N- 
terminal domain (200 residues) is structurally related to GARTs.Escherichia coli methionyl- 
tRNA formyltransferase (EC 2.1.2.9 ) (gene fmt) [3]is the enzyme responsible for modifying 
the free amino group of the aminoacylmoiety of methionyl- A( f Met). The central part of fmt 
seems to be evolutionary related to GART's active site region. 

Consensus pattern: G-x-[STM]-[IVT]-x-[FYWVO]-[VMAT]-x-[DEVM]-x-[LIVMY]-D-x- 
G- x(2)-[LIVT]-x(6)-[LIVM] [D is the active site residue] - 

[ 1] Inglese J., Smith J.M., Benkovic S.J. Biochemistry 29:6678-6687(1990). 

[ 2] Cook R.J., Lloyd R.S., Wagner C. J. Biol. Chem. 266:4965-4973(1991). 

[ 3] Guillon J.-M., Mechulam Y., Schmitter J.-M., Blanquet S., Fayat G. J. Bacteriol. 

174:4294-4301(1992). 

211. G10 protein signatures 

A Xenopus protein known as G10 [1] has been found to be highly conserved in a wide range 
of eukaryotic species. The function of G10 is still unknown. G10 is a protein of about 17 to 
18 Kd (143 to 157 residues) which is hydrophilic and whose C-terminal half is rich in 
cysteines and could be involved in metal-binding. As signature patterns, two of these 
cysteine-rich segments were selected. 
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Consensus pattern: L-C-C-x-[KR]-C-x(4)-[DE]-x-N-x(4)-C-x-C-R-V-P- 
Consensus pattern: C-x-H-C-G-C-[KRH]-G-C-[SA]- 

[ 1] McGrew L.L., Dworkin-Rastl E., Dworkin M.B., Richter J.D. Genes Dev. 3:803- 
815(1989). 

212. G-protein alpha subunit 

G proteins couple receptors of extracellular signals to intracellular signaling 
pathways. The G protein alpha subunit binds guanyl nucleotide and is a weak GTPase. 
Number of members: 195 

[1] Coleman DE, Berghuis AM, Lee E, Linder ME, Gilman AG, Sprang SR, Science 
1994;265:1405-1412. 

[2] How G proteins work: a continuing story. Coleman DE, Sprang SR, Trends Biochem Sci 
1996;21:41-44. 

213. Glucose-6-phosphate dehydrogenase active site (G6PD) 

Glucose-6-phosphate dehydrogenase (EC 1.1.1.49) (G6PD) [1] catalyzes the first step in the 
pentose pathway, the reduction of glucose-6-phosphate to gluconolactone 6-phosphate. A 
lysine residue has been identified as are active nucleophile associated with the activity of the 
enzyme. The sequence around this lysine is totally conserved from bacterial to mammalian 
G6PD's and can be used as a signature pattern 

Consensus pattern: D-H-Y-L-G-K-[EQK] [K is the active site residue]- 

[ 1] Jeffery J., Persson B., Wood I., Bergman T., Jeffery R., Joernvall H. Eur. J. Biochem. 
212:41-49(1993). 

214. GATA-type zinc finger domain 

The GATA family of transcription factors are proteins that bind to DNA sites with the 
consensus sequence (A/T)GATA(A/G), found within the regulatory region of a number of 
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genes. Proteins currently known to belong to this family are: - GATA-1 [1] (also known as 
Eryfl, GF-1 or NF-E1), which binds to the GATA region of globin genes and other genes 
expressed in erythroid cells. It is a transcriptional activator which probably serves as a 
general 'switch' factor for erythroid development. - GATA-2 [2], a transcriptional activator 
which regulates endothelin-1 gene expression in endothelial cells. - GATA- 3 [3], a 
transcriptional activator which binds to the enhancer of the T-cell receptor alpha and delta 
genes. - GATA-4 [4], a transcriptional activator expressed in endodermally derived tissues 
and heart. - Drosophila protein pannier (or DGATAa) (gene pnr) which acts as a repressor of 
the achaete-scute complex (as-c). - Bombyx mori BCFI [5], which regulates the expression of 
chorion genes. - Caenorhabditis elegans elt-1 and elt-2, transcriptional activators of genes 
containing the GATA region, including vitellogenin genes [6]. - Ustilago maydis urbsl [7], a 
protein involved in the repression of the biosynthesis of siderophores. - Fission yeast protein 
GAF2.A11 these transcription factors contain a pair of highly similar 'zinc finger' type 
domains with the consensus sequence C-x2-C-xl7-C-x2-C.Some other proteins contain a 
single zinc finger motif highly related to those of the GATA transcription factors. These 
proteins are: - Drosophila box A-binding factor (ABF) (also known as protein serpent (gene 
srp)) which may function as a transcriptional activator protein and may play a key role in the 
organogenesis of the fat body. - Emericella nidulans areA [8], a transcriptional activator 
which mediates nitrogen metabolite repression. - Neurospora crassa nit-2 [9], a 
transcriptional activator which turns on the expression of genes coding for enzymes required 
for the use of a variety of secondary nitrogen sources, during conditions of nitrogen 
limitation. - Neurospora crassa white collar proteins 1 and 2 (WC-1 and WC-2), which 
control expression of light-regulated genes. - Saccharomyces cerevisiae DAL81 (or UGA43), 
a negative nitrogen regulatory protein. - Saccharomyces cerevisiae GLN3, a positive nitrogen 
regulatory protein. - Saccharomyces cerevisiae GAT1. - Saccharomyces cerevisiae GZF3. 

Consensus pattern: C-x-[DN]-C-x(4,5)-[ST]-x(2)-W-[HR]-[RK]-x(3)-[GN]-x(3,4)- C-N- 
[AS]-C [The four C's are zinc ligands] 

[ 1] Trainor CD., Evans T., Felsenfeld G., Boguski M.S. Nature 343:92-96(1990). 
[ 2] Lee M.E., Temizer D.T., Clifford J.A., Quertermous T. J. Biol. Chem. 266:16188- 
16192(1991). 

[ 3] Ho I.-C, Vorhees P., Marin N., Oakley B.K., Tsai S.-F., Orkin S.H., Leiden J.M. EMBO 
J. 10:1187-1192(1991). 
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[ 4] Spieth J., Shim Y.H., Lea K., Conrad R., Blumenthal T. Mol. Cell. Biol. 11:4651- 
4659(1991). 

[ 5] Drevet J.R., Skeiky Y.A., Iatrou K. J. Biol. Chem. 269:10660-10667(1994). 

[ 6] Hawkins M.G., McGhee J.D. T Biol. Chem. 770:1 4666-1 4671(1995). 

[ 7] Voisard C.P.O., Wang J., Xu P., Leong S.A., McEvoy J.L. Mol. Cell. Biol. 13:7091- 

7100(1993). 

[ 8] Arst H.N. Jr., Kudla B., Martinez-Rossi N.M., Caddick M.X., Sibley S., Davies R.W. 
Trends Genet. 5:291-291(1989). 

[ 9] Fu Y.-H., Marzluf G.A. Mol. Cell. Biol. 10:1056-1065(1990). 



215. Glutamine amidotransferases class-I active site (GATase) 

A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group 
from glutamine and then to transfer this group to a substrate to form a new carbon-nitrogen 
group. This catalytic activity is known asglutamine amidotransferase (GATase) (EC 2.4.2.-) 
[1]. The GATase domain exists either as a separate polypeptide subunit or as part of a larger 
polypeptide fused in different ways to a synthase domain. On the basis of sequence 
similarities two classes of GATase domains have been identified [2,3]: class-I(also known as 
trpG-type) and class-II (also known as purF-type). Class-I GATase domains have been found 
in the following enzymes: - The second component of anthranilate synthase (AS) (EC 
4,1,3.27 ) [4]. AS catalyzes the biosynthesis of anthranilate from chorismate and glutamine. 
AS is generally a dimeric enzyme: the first component can synthesize anthranilate using 
ammonia rather than glutamine, whereas component II provides the GATase activity. In some 
bacteria and in fungi the GATase component of AS is part of a multifunctional protein that 
also catalyzes other steps of the biosynthesis of tryptophan. - The second component of 4- 
amino-4-deoxychorismate (ADC) synthase (EC 4.1.3. -), a dimeric prokaryotic enzyme that 
function in the pathway that catalyzes the biosynthesis of para-aminobenzoate (PABA) from 
chorismate and glutamine. The second component (gene pabA) provides the GATase activity 
[4]. - CTP synthase (EC 6.3.4.2) . CTP synthase catalyzes the final reaction in the 
biosynthesis of pyrimidine, the ATP-dependent formation of CTP from UTP and glutamine. 
CTP synthase is a single chain enzyme that contains two distinct domains; the GATase 
domain is in the C-terminal section [2]. - GMP synthase (glutamine-hydrolyzing) (EC 
6.3.5.2 ). GMP synthase catalyzes the ATP-dependent formation of GMP from xanthosine 5'- 
phosphate and glutamine. GMP synthase is a single chain enzyme that contains two distinct 
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domains; the GATase domain is in the N-terminal section [5]. - Glutamine-dependent 
carbamoyl-phosphate synthase (EC 6.3.5.5^ (GD-CPSase); an enzyme involved in both 
arginine and pyrimidine biosynthesis and which catalyzes the ATP-dependent formation of 
carbamoyl phosphate from glutamine and carbon dioxide. In bacteria GD-CPSase is 
composed of two subunits: the large chain (gene carB) provides the CPSase activity, while 
the small chain (gene carA) provides the GATase activity. In yeast the enzyme involved in 
arginine biosynthesis is also composed of two subunits: CPA1 (GATase), and CPA2 
(CPSase). In most eukaryotes, the first three steps of pyrimidine biosynthesis are catalyzed by 
a large multifunctional enzyme (called URA2 in yeast, rudimentary in Drosophila, and CAD 
in mammals). The GATase domain is located at the N-terminal extremity of this polyprotein 
[6]. - Phosphoribosylformylglycinamidine synthase II (EC 6.3.5.3), an enzyme that catalyzes 
the fourth step in the de novo biosynthesis of purines. In some species of bacteria, FGAM 
synthase II is composed of two subunits: a small chain (gene purQ) which provides the 
GATase activity and a large chain (gene purL) which provides the aminator activity. - The 
histidine amidotransferase hisH, an enzyme that catalyzes the fifth step in the biosynthesis of 
histidine in prokaryotes.In the second component of AS a cysteine has been shown [7] to be 
essentialfor the amidotransferase activity. The sequence around this residue is well conserved 
in all the above GATase domains and can be used as a signature pattern for class-I GATase.- 

Consensus pattern: [PAS]-[LIVMFYT]-[LIVMFY]-G-[LIVMFY]-C-[LIVMFYN]-G-x- 
[QEH]- x-[LIVMFA] [C is the active site residue]- 

[ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[ 2] Weng M., Zalkin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4] Crawford LP. Annu. Rev. Microbiol. 43:567-600(1989). 

[ 5] Zalkin H., Argos P., Narayana S.V.L., Tiedeman A. A., Smith J.M. J. Biol. Chem. 
260:3350-3354(1985). 

[ 6] Davidson J.N., Chen K.C., Jamison R.S., Musmanno L.A., Kern C.B. BioEssays 15:157- 
164(1993). 

[ 7] Tso J.Y., Hermodson M.A., Zalkin H. J. Biol. Chem. 255:1451-1457(1980). 



216. Glutamine amidotransferases class-II active site (GATase_2) 
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A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group 
from glutamine and then to transfer this group to a substrate to form a new carbon-nitrogen 
group. This catalytic activity is known as glutamine amidotransferase (GATase) (EC 2.4.2.-) 
[1]. The GATase domain exists either as a separate polypeptide subunit or as part of a larger 
polypeptide fused in different ways to a synthase domain. On the basis of sequence 
similarities two classes of GATase domains have been identified [2,3]: class-I(also known as 
trpG-type) and class-II (also known as purF-type). Class-II GATase domains have been 
found in the following enzymes: - Amido phosphoribosyltransferase (glutamine 
phosphoribosylpyrophosphate amidotransferase) (EC 2.4.2.14) . An enzyme which catalyzes 
the first step in purine biosynthesis, the transfer of the ammonia group of glutamine to PRPP 
to form 5-phosphoribosylamine (gene purF in bacteria, ADE4 in yeast). - Glucosamine- 
fructose-6-phosphate aminotransferase (EC 2.6.1 .16) . This enzyme catalyzes a key reaction 
in amino sugar synthesis, the formation of glucosamine 6-phosphate from fructose 6- 
phosphate and glutamine (gene glmS in Escherichia coli, nodM in Rhizobium, GFA1 in 
yeast) - Asparagine synthetase (glutamine-hydrolyzing) (EC 6.3.5.4). This enzyme is 
responsible for the synthesis of asparagine from aspartate and glutamine. A cysteine is 
present at the N-terminal extremity of the mature form of all these enzymes. The cysteine has 
been shown, in amido phosphoribosyltransferase [4] and in asparagine synthetase [5] to be 
important for the catalytic mechanism. 

Consensus pattern: <x(0,ll)-C-[GS]-[IV]-[LIVMFYW]-[AG] [C is the active site residue]- 

[ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[ 2] Weng M., Zalkin H. J. Bacterid. 169:3023-3028(1987). 

[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4] van Heeke G., Schuster M. J. Biol. Chem. 264:5503-5509(1989). 

[ 5] Vollmer S.J., Switzer R.L., Hermodson M.A., Bower S.G., Zalkin H. J. Biol. Chem. 

258:10582-10585(1983). 

217. GDP dissociation inhibitor (GDI) 

[1] Schalk I, Zeng K, Wu SK, Stura EA, Matteson J, Huang M, Tandon A, Wilson LA, 
Balch WE, Nature 1996;381:42-48. 
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218. Oxidoreductase family (GFO IDH MocA) 

This family of enzymes utilise NADP or NAD. This family: is called the 
GFO/IDH/MOCA family in swiss-prot. 
5 [1] Kingston RL, Scopes RK, Baker EN, Structure 1996;4:1413-1428. 

219. GHMP kinases putative ATP-binding domain 

The following kinases contains, in their N-terminal section, a conserved Gly/Ser-rich region 
10 which is probably involved in the binding of ATP [1]. These kinases are listed below. - 

Galactokinase (EC 2.7.1.6) . - Homoserine kinase (EC 2.7.1.39) . - Mevalonate kinase (EC 
2.7.1.36 ). - Phosphomevalonate kinase (EC 2.7.4.2) . This group of kinases was called 
T GHMP' (from the first letter of their substrate) 

1 5 Consensus pattern: [LIVM]-[PK]-x-[GSTA]-x(0,l)-G-L-[GS]-S-S-[GSA]-[GSTAC]- 

[ 1] Tsay Y.H, Robinson G.W. Mol. Cell. Biol. 11:620-631(1991). 

2 0 220. Glucose inhibited division protein A family signatures (GIDA) 

Bacterial glucose inhibited division protein A (gene gidA) is a protein of 70Kd whose 
function is not yet known and whose sequence is highly conserved. It is evolutionary related 
to yeast hypothetical protein YGL236C, Caenorhabditis elegans hypothetical protein 
F52H3.2 and a Bacillus subtilis protein called gid (and which is different from B.subtilis 

2 5 gidA). Two highly conserved regions were selected as signature patterns. Both regions are 
located in the central region of the protein. 

Consensus pattern: [GS]-[PT]-x-Y-C-P-S-[LIVM]-E-x-K-[LIVM]-x-[KR]- 
Consensus pattern: A-G-Q-x-[NT]-G-x(2)-G- Y-x-E-[SAG](3)-[QS]-G-[LIVM](2)-A-G- 
30 [LIVMT]-N-A- 

221. (GLFV_dehydrog) 

Glu / Leu / Phe / Val dehydrogenases active site 
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- Glutamate dehydrogenases (EC 1.4.1.2, EC 1.4.1.3, and EC 1.4.1.4) (GluDH) 
are enzymes that catalyze the NAD- or NADP-dependent reversible deamination 
of glutamate into alpha-ketoglutarate [1,2]. GluDH isozymes are generally 
involved with either ammonia assimilation or glutamate catabolism. 

- Leucine dehydrogenase (EC 1.4.1.9) (LeuDH) is a NAD-dependent enzyme that 
catalyzes the reversible deamination of leucine and several other aliphatic 
amino acids to their keto analogues [3]. 

- Phenylalanine dehydrogenase (EC 1.4.1.20) (PheDH) is a NAD-dependent enzyme 
that catalyzes the reversible deamidation of L-phenylalanine into phenyl- 
pyruvate [4]. 

- Valine dehydrogenase (EC 1.4.1.8) (ValDH) is a NADP-dependent enzyme that 
catalyzes the reversible deamidation of L-valine into 3-methyl-2- 
oxobutanoate [5]. 

These dehydrogenases are structurally and functionally related. A conserved lysine residue 
located in a glycine-rich region has been implicated in the catalytic mechanism. The 
conservation of the region around this residue allows the derivation of a signature pattern for 
such type of enzymes. 

Consensus pattern[LIV]-x(2)-G-G-[SAG]-K-x-[GV]-x(3)-[DNST]-[PL] [K is the active site 
residue] Sequences known to belong to this class detected by the pattern ALL. 

Note all known sequences from this family have Pro in the last position of the pattern with 
the exception of yeast GluDH which as Leu. 

[ 1] Britton K.L., Baker P.J., Rice D.W., Stillman T.J. Eur. J. Biochem. 209:851-859(1992). 
[ 2] Benachenhou-Lahfa N., Forterre P., Labedan B. J. Mol. Evol. 36:335-346(1993). 
[ 3] Nagata S., Tanizawa K., Esaki N., Sakamoto Y., Ohshima T., Tanaka H., Soda K. 
Biochemistry 27:9056-9062(1988). 

[ 4] Takada H., Yoshimura T., Ohshima T., Esaki N., Soda K. J. Biochem. 109:371- 
376(1991). 

[ 5] Hutchinson C.R., Tang L. J. Bacteriol. 175:4176-4185(1993). 
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222. GMC oxidoreductases signatures 

The following FAD flavoproteins oxidoreductases have been found [1,2] to be evolutionary 
related. These enzymes, which are called 'GMC oxidoreductases are listed below. - Glucose 
5 oxidase (EC 1.13.4) (GOX) from Aspergillus niger. Reaction catalyzed: glucose + oxygen -> 
delta-gluconolactone + hydrogen peroxide. - Methanol oxidase (EC 1.13.13 ) (MOX) from 
fungi. Reaction catalyzed: methanol + oxygen -> acetaldehyde + hydrogen peroxide. - 
Choline dehydrogenase (EC 1.1.99.1 ) (CHD) from bacteria. Reaction catalyzed: choline + 
unknown acceptor -> betaine acetaldehyde + reduced acceptor. - Glucose dehydrogenase 

10 (GLD) (EC 1.1.99.10 ) from Drosophila. Reaction catalyzed: glucose + unknown acceptor -> 
delta-gluconolactone + reduced acceptor. - Cholesterol oxidase (CHOD) (EC 1.13.6) from 
Brevibacterium sterolicum and Streptomyces strain SA-COO. Reaction catalyzed: cholesterol 
+ oxygen -> cholest-4-en-3-one + hydrogen peroxide. - AlkJ [3], an alcohol dehydrogenase 
from Pseudomonas oleovorans, which converts aliphatic medium-chain-length alcohols into 

1 5 aldehydes. This family also includes a lyase: - (R)-mandelonitrile lyase (EC 4.1.2.10 ) 

(hydroxynitrile lyase) from plants [4], an enzyme involved in cyanogenis, the release of 
hydrogen cyanide from injured tissues. These enzymes are proteins of size ranging from 556 
(CHD) to 664 (MOX) amino acid residues which share a number of regions of sequence 
similarities. One of these regions, located in the N-terminal section, corresponds to the FAD 

2 0 ADP-binding domain. The function of the other conserved domains is not yet known; two of 
these domains were selected as signature patterns. The first one is located in the N-terminal 
section of these enzymes, about 50 residues after the ADP-binding domain, while the second 
one is located in the central section. 

2 5 Consensus pattern: [GA]-[RKN]-x-[LIV]-G(2)-[GST](2)-x-[LIVM]-N-x(3)-[FYWA]- x(2)- 
[PAG]-x(5)-[DNESH]- 

Consensus pattern: [GS]-[PSTA]-x(2)-[ST]-P-x-[LIVM](2)-x(2)-S-G-[LIVM]-G- 

[ 1] Cavener D.R. J. MoL Biol. 223:811-814(1992). 
30 [2] Henikoff S., Henikoff J.G. Genomics 19:97-107(1994). 

[ 3] van Beilen J.B., Eggink G., Enequist H., Bos R., Witholt B. Mol. Microbiol. 6:3121- 
3136(1992). 

[ 4] Cheng LP., Poulton J.E. Plant Cell Physiol. 34:1139-1143(1993). 
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223. (GMP_synt_C) 

Glutamine amidotransferases class-I active site 

A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group 
from glutamine and then to transfer this group to a substrate to form a new carbon-nitrogen 
group. This catalytic activity is known as glutamine amidotransferase (GATase) (EC 2.4.2.-) 
[1]. The GATase domain exists either as a separate polypeptide subunit or as part of a larger 
polypeptide fused in different ways to a synthase domain. On the basis of sequence 
similarities two classes of GATase domains have been identified [23] : class-I (also known as 
trpG-type) and class-II (also known as purF-type). Class-I GATase domains have been found 
in the following enzymes: 

- The second component of anthranilate synthase (AS) (EC 4.1.3.27) [4]. AS catalyzes the 
biosynthesis of anthranilate from chorismate and glutamine. AS is generally a dimeric 
enzyme: the first component can synthesize anthranilate using ammonia rather than 
glutamine, whereas component II provides the GATase activity. In some bacteria and in fungi 
the GATase component of AS is part of a multifunctional protein that also catalyzes other 
steps of the biosynthesis of tryptophan. 

- The second component of 4-amino-4-deoxy chorismate (ADC) synthase (EC 4.1.3. -) ? a 
dimeric prokaryotic enzyme that function in the pathway that catalyzes the biosynthesis of 
para-aminobenzoate (PABA) from chorismate and glutamine. The second component (gene 
pabA) provides the GATase activity [4]. 

- CTP synthase (EC 6.3.4.2). CTP synthase catalyzes the final reaction in the biosynthesis of 
pyrimidine, the ATP-dependent formation of CTP from UTP and glutamine. CTP synthase is 
a single chain enzyme that contains two distinct domains; the GATase domain is in the C- 
terminal section [2]. 

- GMP synthase (glutamine-hydrolyzing) (EC 6.3.5.2). GMP synthase catalyzes the ATP- 
dependent formation of GMP from xanthosine 5'-phosphate and glutamine. GMP synthase is 
a single chain enzyme that contains two distinct domains; the GATase domain is in the N- 
terminal section [5], 

- Glutamine-dependent carbamoyl-phosphate synthase (EC 6.3.5.5) (GD-CPSase); an 
enzyme involved in both arginine and pyrimidine biosynthesis and which catalyzes the ATP- 
dependent formation of carbamoyl phosphate from glutamine and carbon dioxide. In bacteria 
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GD-CPSase is composed of two subunits: the large chain (gene carB) provides the CPSase 
activity, while the small chain (gene carA) provides the GATase activity. In yeast the 
enzyme involved in arginine biosynthesis is also composed of two subunits: CPA1 (GATase), 
and CPA2 (CPSase). In most eukaryotes ? the first three steps of pyrimidine biosynthesis are 
5 catalyzed by a large multifunctional enzyme (called URA2 in yeast, rudimentary in 
Drosophila, and CAD in mammals). The GATase domain is located at the N-terminal 
extremity of this poly protein [6]. 

- Phosphoribosylformylglycinamidine synthase II (EC 6.3.5.3), an enzyme that catalyzes the 
fourth step in the de novo biosynthesis of purines. In some species of bacteria, FGAM 

1 0 synthase II is composed of two subunits: a small chain (gene purQ) which provides the 
GATase activity and a large chain (gene purL) which provides the aminator activity. 

- The histidine amidotransferase hisH, an enzyme that catalyzes the fifth step in the 
biosynthesis of histidine in prokaryotes. 

15 In the second component of AS a cysteine has been shown [7] to be essential for the 
amidotransferase activity. The sequence around this residue is well conserved in all the 
above GATase domains and can be used as a signature pattern for class-I GATase. 

Consensus pattern[PAS]-[LIVMFYT]-[LIVMFY]-G-[LIVMFY]-C-[LIVMFYN]-G-x- 
2 0 [QEH]- x-[LIVMFA] [C is the active site residue] Sequences known to belong to this class 
detected by the pattern ALL, except for 6 sequences. 

Note: in the first position of the pattern Pro is found in all cases except in the slime mold GD- 
CPSase where it is replaced by Ala. 

25 

[ 1] Buchanan J.M. Adv. EnzymoL 39:91-183(1973). 
[ 2] Weng M., Zalkin H. L Bacteriol. 169:3023-3028(1987). 
[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 
[ 4] Crawford LP. Annu. Rev. Microbiol. 43:567-600(1989). 
30 [5] Zalkin H., Argos P., Narayana S.V.L., Tiedeman A.A., Smith J.M. J. Biol. Chem. 
260:3350-3354(1985). 

[ 6] Davidson J.N., Chen K.C., Jamison R.S., Musmanno L.A., Kern C.B. BioEssays 15:157- 
164(1993). 

[ 7] Tso J.Y., Hermodson M.A., Zalkin H. J. Biol. Chem. 255:1451-1457(1980). 
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224. Glutathione peroxidases signatures (GSHPx) 

Glutathione peroxidase (EC 1.11.1.9 ) (GSHPx) [1,2] is an enzyme that catalyzes the 
5 reduction of hydroxyperoxides by glutathione. Its main function is to protect against the 
damaging effect of endogenously formed hydroxyperoxides. In higher vertebrates at least 
four forms of GSHPx are known to exist: a ubiquitous cytosolic form (GSHPx-1), a 
gastrointestinal cytosolic for (GSHPx-GI) [3], a plasma secreted form (GSHPx-P) [4], and a 
epididymal secretory form (GSHPx-EP). In addition to these characterized forms, the 

10 sequence of a protein of unknown function [5] has been shown to be evolutionary related to 
those of GSHPx's. In filarial nematode parasites such as Brugia pahangi the major soluble 
cuticular protein, known as gp29, is a secreted GSHPx which could provide a mechanism of 
resistance to the immune reaction of the mammalian host by neutralizing the products of the 
oxidative burst of leukocytes [6] .Escherichia coli protein btuE, a periplasmic protein involved 

15 in the transport of vitamin B12, is also evolutionary related to GSHPx's; the significance of 
this relationship is not yet clear. Selenium, in the form of selenocysteine [7] is part of the 
catalytic site of GSHPx. The sequence around the selenocysteine residue is moderately well 
conserved in GSHPx's and the related proteins and can be used as a signature pattern. As a 
second signature for this family of proteins a highly conserved octapeptide located in the 

2 0 central section of these proteins was selected. 

Consensus pattern: [GN]-[RKHNFYC]-x-[LIVMFC]-[LIVMF](2)-x-N-[VT]-x^[STC]-x-C- 
[GA]-x-T [C is the active site selenocysteine residue] 
Consensus pattern: [LIV]-[AGD]-F-P-[CS]-[NG]-Q- 

25 

[ 1] Mannervik B. Meth. EnzymoL 113:490-495(1985). 

[ 2] Mullenbach G.T. ? Tabrizi A. ; Irvine B.D., Bell G.L, Tainer J.A., Hallewell RA. Protein 
Eng. 2:239-246(1988). 

[ 3] Chu F.F., Doroshow J.H., Esworthy R.S. J. Biol. Chem. 268:2571-2576(1993). 
30 [4] Takahashi K., Akasaka M., Yamamoto Y., Kobayashi C. ? Mizoguchi J., Koyama J. J. 
Biochem. 108:145-148(1990). 

[ 5] Dunn D.K., Howells D.D., Richardson J., Goldfarb P.S. Nucleic Acids Res. 17:6390- 
6390(1989). 
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[ 6] Cookson E., Blaxter M.L., Selkirk M.E. Proc. Natl. Acad. Sci. U.S.A. 89:5837- 
5841(1992). 

[ 7] Stadtman T.C. Annu. Rev. Biochem. 59:111-127(1990). 

225. (GST) 

Glutathione S-transferases 

Function: conjugation of reduced glutathione to a variety of targets. Also included in 
the alignment, but are not GSTs S-crystallins from squid. Similarity to GST was previously 
noted. Eukaryotic elongation factors 1-gamma. Not known to have GST activity; similarity 
not previously recognized. Supported by HMM and manual alignment inspection. HSP26 
family of stress-related proteins, including auxin-regulated proteins in plants and stringent 
starvation proteins in E. coli. Not known to have GST activity. Similarity not previously 
recognized. Supported by HMM and manual alignment inspection. Alignment spans entire 
protein. 

226. GTP1/OBG family signature 

A widespread family of GTP-binding proteins has been recently characterized [1,2]. This 
family currently includes: - Mouse and Xenopus protein DRG. - Human protein DRG2. - 
Drosophila protein 128up. - Fission yeast protein gtpl. - A Halobacterium cutirubrum 
hypothetical protein in a ribosomal protein gene cluster. - Bacillus subtilis protein obg. Obg 
has been experimentally shown to bind GTP. - Escherichia coli hypothetical protein yhbZ. - 
Haemophilus influenzae hypothetical protein HI0877. - Mycoplasma genitalium hypothetical 
protein MG384. - Yeast hypothetical protein YAL036c (FUN11). - Yeast hypothetical 
protein YGR173w. - Caenorhabditis elegans hypothetical protein C02F5.3.The function of 
the proteins that belong to this family is not yet known. They are polypeptides of about 40 to 
48 Kd which contain the five small sequence elements characteristic of GTP-binding proteins 
[3]. As a signature pattern the region that correspond to the ATP/GTP B motif (also called G- 
3 inGTP-binding proteins) was selected. 



Consensus pattern: D-[LIVM]-P-G-[LIVM](2)-[DEY]-[GN]-A-x(2)-G-x-G - 
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[ 1] Sazuka T., Tomooka Y., Ikawa Y., Noda M., Kumar S. Biochem. Biophys. Res. 
Coramun. 189:363-370(1992). 

[ 2] Hudson J.D., Young P.G. Gene 125:191-193(1993). 

[ 3] Bourne H.R., Sanders D.A., McCormick F. Nature 349:117-127(1991). 

227. (GTP_EFTU1) 

ATP/GTP-binding site motif A (P-loop) 

From sequence comparisons and crystallographic data analysis it has been shown 
[1,2,3,4,5,6] that an appreciable proportion of proteins that bind ATP or GTP share a number 
of more or less conserved sequence motifs. The best conserved of these motifs is a glycine- 
rich region, which typically forms a flexible loop between a beta-strand and an alpha-helix. 
This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is 
generally referred to as the A' consensus sequence [1] or the T-loop' [5]. There are numerous 
ATP- or GTP-binding proteins in which the P-loop is found. Listed below are a number of 
protein families for which the relevance of the presence of such motif has been noted: - ATP 
synthase alpha and beta subunits (see <PDOC00137>). - Myosin heavy chains. - Kinesin 
heavy chains and kinesin-like proteins (see <PDOC00343>). - Dynamins and dynamin-like 
proteins (see <PDOC00362>). - Guanylate kinase (see <PDOC00670>). - Thymidine kinase 
(see < PPOC00524 >\ - Thymidylate kinase (see <PDOC01034>). - Shikimate kinase (see 
<PJiOC00868>). - Nitrogenase iron protein family (nifH/frxC) (see <PDOC00580>). - ATP- 
binding proteins involved in 'active transport' (ABC transporters) [7] (see < PDOC00185 >). - 
DNA and RNA helicases [8,9,10]. - GTP-binding elongation factors (EF-Tu, EF-lalpha, EF- 
G, EF-2, etc.). - Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, Yptl, SEC4, etc.). 
- Nuclear protein ran (see <PDQC00859>). - ADP-ribosylation factors family (see 
<PDOC00781>). - Bacterial dnaA protein (see < PDOC00771 >Y - Bacterial recA protein (see 
<PDQC00131>). - Bacterial recF protein (see < PDOC00539 >). - Guanine nucleotide-binding 
proteins alpha subunits (Gi, Gs, Gt, GO, etc.). - DNA mismatch repair proteins mutS family 
(See <PDOC00388>). - Bacterial type II secretion system protein E (see < PDOC0Q567 >).Not 
all ATP- or GTP-binding proteins are picked-up by this motif. A number of proteins escape 
detection because the structure of their ATP-binding site is completely different from that of 
the P-loop. Examples of such proteins are the E1-E2 ATPases or the glycolytic kinases. In 
other ATP- or GTP-binding proteins the flexible loop exists in a slightly different form; this 
is the case for tubulins or protein kinases. A special mention must be reserved for adenylate 
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kinase, in which there is a single deviation from the P-loop pattern: in the last position Gly is 
found instead of Ser or Thr. 

-Consensus pattern: [AG]-x(4)-G-K-[ST]- 

[ 1] Walker J.E., Saraste M., Runswick M.J., Gay N.J. EMBO J. 1:945-951(1982). 
[ 2] Moller W., Amons R. FEBS Lett. 186:1-7(1985). 

[ 3] Fry D.C., Kuby S.A., Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 
[ 4] Dever T.E., Glynias M.J., Merrick W.C. Proc. Natl. Acad. Sci. U.S.A. 84:1814- 
1818(1987). 

[ 5] Saraste M., Sibbald P.R., Wittinghofer A. Trends Biochem. Sci. 15:430-434(1990). 
[ 6] Koonin E.V. J. Mol. Biol. 229:1165-1174(1993). 

[ 7] Higgins C.F., Hyde S.C., Mimmack M.M., Gileadi U., Gill D.R., Gallagher M.P. J. 
Bioenerg. Biomembr. 22:571-592(1990). 

[ 8] Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 

[ 9] Linder P., Lasko P., Ashburner M., Leroy P., Nielsen P.J., Nishi K., Schnier J., Slonimski 

P.P. Nature 337:121-122(1989). 

[10] Gorbalenya A.E., Koonin E.V., Donchenko A.P., Blinov V.M. Nucleic Acids Res. 
17:4713-4730(1989). 

GTP-binding elongation factors signature (GTP_EFTU2) 

Elongation factors [1,2] are proteins catalyzing the elongation of peptide chains in protein 
biosynthesis. In both prokaryotes and eukaryotes, there are three distinct types of elongation 

factors, as described in the following table: 

Eukaryotes Prokaryotes Function 

EF-lalpha EF-Tu Binds GTP and an aminoacyl-tRNA; delivers the latter to 

the A site of ribosomes. EF-lbeta EF-Ts Interacts with EF-la/EF-Tu to displace GDP and 
thus allows the regeneration of GTP-EF-la. EF-2 EF-G Binds GTP and peptidyl-tRNA and 

translocates the latter from the A site to the P site. 

The GTP-binding elongation factor family also includes the following 

proteins: - Eukaryotic peptide chain release factor GTP-binding subunits [3]. These proteins 
interact with release factors that bind to ribosomes that have encountered a stop codon at their 
decoding site and help them to induce release of the nascent polypeptide. The yeast protein 
was known as SUP2 (and also as SUP35, SUFI 2 or GST1) and the human homolog as 
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GSTl-Hs. - Prokaryotic peptide chain release factor 3 (RF-3) (gene prfC). RF-3 is a class-II 
RF, a GTP-binding protein that interacts with class I RFs (see <PDOC00607>) and enhance 
their activity [4]. - Prokaryotic GTP-binding protein lepA and its homolog in yeast (gene 
GUF1) and in Caenorhabditis elegans (ZK1236.1). - Yeast HBS1 [5]. - Rat statin SI [6], a 
protein of unknown function which is highly similar to EF-lalpha. - Prokaryotic 
selenocysteine-specific elongation factor selB [7], which seems to replace EF-Tu for the 
insertion of selenocysteine directed by the UGA codon. - The tetracycline resistance proteins 
tetM/tetO [8,9] from various bacteria such as Campylobacter jejuni, Enterococcus faecalis, 
Streptococcus mutans and Ureaplasma urealyticum. Tetracycline binds to the prokaryotic 
ribosomal 30S subunit and inhibits binding of aminoacyl-tRNAs. These proteins abolish the 
inhibitory effect of tetracycline on protein synthesis. - Rhizobium nodulation protein nodQ 
[10]. - Escherichia coli hypothetical protein yihK [ll].In EF-l-alpha, a specific region has 
been shown [12] to be involved in a conformational change mediated by the hydrolysis of 
GTP to GDP. This region is conserved in both EF-lalpha/EF-Tu as well as EF-2/EF-G and 
thus seems typical for GTP-dependent proteins which bind non-initiator tRNAs to the 
ribosome. The pattern developed for this family of proteins include that conserved region. 

Consensus pattern: D-[KRSTGANQFYW]-x(3)-E-[KRAQ]-x-[RKQD]-[GC]-[IVMK]-[ST]- 
[IV]-x(2)-[GSTACKRNQ]- 

[1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New- 
York (1988). 

[ 2] Moldave K. Annu. Rev. Biochem. 54:1109-1149(1985). 

[ 3] Stansfield I., Jones K.M., Kushnirov V.V., Dagkesamanskaya A.R., Poznyakovski A.I., 
Paushkin S.V., Nierras C.R., Cox B.S., Ter-Avanesyan M.D., Tuite M.F. EMBO J. 14:4365- 
4373(1995). 

[ 4] Grentzmann G., Brechemier-Baey D., Heurgue-Hamard V., Buckingham R.H. J. Biol. 
Che.m. 270:1 0595-1 0600(1 995\ 

[ 5] Nelson R.J., Ziegelhoffer T., Nicolet C, Werner-Washburne M., Craig E.A. Cell 71:97- 
105(1992). 

[ 6] Ann D.K., Moutsatsos I.K., Nakamura T., Lin H.H., Mao P.-L., Lee M.-J., Chin S., Liem 

R.K.H., Wang E. J. Biol. Chem. 266:10429-10437(1991). 

[ 7] Forchammer K., Leinfeldr W., Bock A. Nature 342:453-456(1989). 

[ 8] Manavathu E.K., Hiratsuka K., Taylor D.E. Gene 62:17-26(1988). 
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[ 9] Leblanc D.J., Lee L.N., Titmas B.M., Smith C.J., Tenover F.C. J. Bacteriol. 170:3618- 
3626(1988). 

[10] Cervantes E., Sharma S.B., Maillet F., Vasse J., Truchet G., Rosenberg C. Mol. 
Microbiol. 3:745-755(1989). 

[11] Plunkett G. Ill, Burland V.D., Daniels D.L., Blattner F.R. Nucleic Acids Res. 21:3391- 
3398(1993). 

[12] Moller W., Schipper A., Amons R. Biochimie 69:983-989(1987). 

228. GTP cyclohydrolase II. 

GTP cyclohydrolase II catalyses the first committed step in the biosynthesis of riboflavin. 

[1] Richter G, Ritz H, Katzenmeier G, Volk R, Kohnle A, Lottspeich F, Allendorf D, Bacher 
A, J Bacteriol 1993;175:4045-4051. 

229. Galactose-l-phosphate uridyl transferase signatures (GalP_UDP_transf) 
Galactose-l-phosphate uridyl transferase (EC 2.7.7.10) (galT) catalyzes the transfer of an 
uridyldiphosphate group on galactose (or glucose) 1-phosphate. During the reaction, the 
uridyl moiety links to a histidine residue. In the Escherichia coli enzyme, it has been shown 
[1] that two histidine residues separated by a single proline residue are essential for enzyme 
activity. On the basis of sequence similarities, two apparently unrelated families seem to 
exist. Class-I enzymes are found in eukaryotes as well as some bacteria such as Escherichia 
coli or Streptomyces lividans, while class-II enzymes have been found so far only in bacteria 
such as Bacillus subtilis or Lactobacillus helveticus [2]. Signature patterns for both families 
were developed. For class-I enzymes the signature is based on the active site residues. For 
class-II enzymes a region which also includes two conserved histidines was chosen. 

Consensus pattern: F-E-N-[RK]-G-x(3)-G-x(4)-H-P-H-x-Q [The two H's are the active site 
residues]- 

Consensus pattern: D-L-P-I-V-G-G-[ST]-[LIVM](2)-[SA]-H-[DEN]-H-[FY]-Q-G-G - 
Note: class-I enzymes are structurally related to the HIT family of proteins (see 
<PDOC00694 
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[ 1] Reichardt J.K.V., Berg P. Nucleic Acids Res. 16:9017-9026(1988). 
[ 2] Mollet B., Pilloud N. J. Bacteriol. 173:4464-4473(1991). 



230. Gamma-thionins family signature 

The following small plant proteins are evolutionary related: 

- Gamma-thionins from wheat endosperm (gamma-purothionins) and barley 
(gamma- hordothionins) which are toxic to animal cells and inhibit protein 
synthesis in cell free systems [1]. 

- A flower-specific thionin (FST) from tobacco [2], 

- Antifungal proteins (AFP) from the seeds of Brassicaceae species such as radish, 
mustard, turnip and Arabidopsis thaliana [3]. 

- Inhibitors of insect alpha-amylases from sorghum [4]. 

- Probable protease inhibitor P322 from potato. 

- A germination-related protein from cowpea [5]. 

- Anther-specific protein SF18 from sunflower [6]. SF18 is a protein that contains a 
gamma-thionin domain at its N-terminus and a proline-rich C- terminal domain. 

- Soybean sulfur-rich protein SE60 [7]. 

- Vicia faba antibacterial peptides fabatin-1 and -2. 

In their mature form, these proteins generally consist of about 45 to 50amino-acid 
residues. As shown in the following schematic representation, these peptides contain eight 
conserved cysteines involved in disulfide bonds. 

+ +i+ +iim 

xxCxxxxxxxxxxCxxxxxCxxxCxxxxxxxxxCxxxxxxCxCxxxC *******************|***| 1 1 

+ ...| + | + + 

'C: conserved cysteine involved in a disulfide bond. 
'*': position of the pattern. 

Consensus pattern: [KRG]-x-C-x(3)-[SV]-x(2)-[FYWH]-x-[GF]-x-C-x(5)-C-x(3)-C [The 
four C's are involved in disulfide bonds]- 



[1] Bruix M., Jimenez M.A., Santoro J., Gonzalez C, Colilla F.J., Mendez E., Rico M. 
Biochemistry 32:715-724(1993). 
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[2] Gu Q., Kawata E.E., Morse M.-J., Wu H.-M., Cheung A.Y. Mol. Gen. Genet. 234:89- 
96(1992). 

[3] Terras F.R.G., Torrekens S., van Leuven F., Osborn R.W., Vanderleyden J., Cammue 

B.P.A., Broekaert W.F. FEBS Lett. 316:233-240(1993). 

[4] Bloch C. Jr., Richardson M. FEBS Lett. 279:101-104(1991). 

[5] Ishibashi N., Yamauchi D., Miniamikawa T. Plant Mol. Biol. 15:59-64(1990). 

[7] Choi Y., Choi Y.D., Lee J.S. Plant Physiol. 101:699-700(1993). 

231. Gelsolin. Gelsolin repeat. Number of members: 170 

[l]Medline: 97433077. The crystal structure of plasma gelsolin: implications for actin 
severing, capping, and nucleation. Burtnick LD, Koepf EK, Grimes J, Jones EY, Stuart DI, 
McLaughlin PJ, Robinson RC; Cell 1997;90:661-670. 

232. Germin family signature 

Germins [1] are a family of homopentameric cereal glycoproteins expressed during 
germination which may play a role in altering the properties of cell walls during germinative 
growth. It has been shown that wheat and barleygermins act as oxalate oxidases (EC 1.2.3 T 4) , 
an enzyme that catalyzes the oxidative degradation of oxalate to carbonate and hydrogen 
peroxide. Germins are highly similar to: - Germin-like proteins from various plants such as 
rape, violet or white mustard. - Slime mold spherulins la and lb which are proteins that 
accumulate specifically during spherulation, a process induced by various forms of 
environmental stress which leads to encystment and dormancy. As a signature pattern the best 
conserved region was selected: a decapeptide located in the central section of these proteins. 

Consensus pattern: G-x(4)-H-x-H-P-x-A-x-E-[LIVM]- 

[ 1] Lane B.G. FASEB J. 8:294-301(1994). 



233. (GlutR) 

Glutamyl-tRNA reductase signature 
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Delta-aminolevulinic acid (ALA) is the obligatory precursor for the synthesis of all 
tetrapyrroles including porphyrin derivatives such as chlorophyll and heme. ALA can be 
synthesized via two different pathways: the Shemin (or C4) pathway which involves the 
single step condensation of succinyl-CoA and glycine and which is catalyzed by ALA 
synthase (EC 23.1.37) and via the C5pathway from the five-carbon skeleton of glutamate. 
The C5 pathway operates in the chloroplast of plants and algae, in cyanobacteria, in some 
eubacteria and in archaebacteria. 

The initial step in the C5 pathway is carried out by glutamyl-tRNA reductase (GluTR) [1] 
which catalyzes the NADP-dependent conversion of glutamate- tRNA(Glu) to 
glutamate-l-semialdehyde (GSA) with the concomitant release of tRNA(Glu) which can 
then be recharged with glutamate by glutamyl-tRNA synthetase. 

GluTR is a protein of about 50 Kd (467 to 550 residues) which contains a few conserved 
region. The best conserved region is located in positions 99 to 122 in the sequence of known 
GluTR. This region seems important for the activity of the enzyme. We have developed a 
signature pattern from that conserved region. 

Consensus patternH-[LIVM]-x(2)-[LIVM]-[GSTAC](3)-[LIVM]-[DEQ]-S-[LIVMA]- 
[LIVM](2)-[GF]-E-x-[EQR]-[IV]-[LIT]-[STAG]-Q-[LIVM]-[KR] Sequences known to 
belong to this class detected by the pattern ALL. 

[ 1] Jahn D., Verkamp E. ; Soell D. Trends Biochem. Sci. 17:215-218(1992). 
234. (Glycopro tease) 

Glycoprotease family signature (aka Peptidase_M22) 

Glycoprotease (GCP) (EC 3.4.24.57) [1], or o-syaloglycoprotein endopeptidase, is a 
metalloprotease secreted by Pasteurella haemolytica which specifically cleaves O- 
sialoglycoproteins such as glycophorin A. The sequence of GCP is highly similar to the 
following uncharacterized proteins: 
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- Escherichia coli hypothetical protein ygjD (ORF-X). 

- Bacillus subtilis hypothetical protein ydiR 

- Mycobacterium leprae hypothetical protein U229E. 

- Mycobacterium tuberculosis hypothetical protein MtCY78.10. 

- Synechocystis strain PCC 6803 hypothetical protein slr0807. 

- Methanococcus jannaschii hypothetical protein MJ1130. 

- Haloarcula marismortui hypothetical protein in HSH 3 1 region. 

- Yeast hypothetical protein YKR038c. 

- Yeast hypothetical protein QRI7. 

One of the conserved regions contains two conserved histidines. It is possible that this region 
is involved in coordinating a metal ion such as zinc. 

Consensus pattern[KR]-[GSAT]-x(4)-[FYWLH]-[DQNGK]-x-P-x-[LIVMFY]-x(3)-H- x(2)- 
[AG]-H-[LIVM] Sequences known to belong to this class detected by the pattern ALL. 

Note: these proteins belong to family M22 in the classification of peptidases [2 ? E1]. 

[ 1] Abdullah K.M., Lo R.Y.C., Mellors A. J. Bacterid. 173:5597-5603(1991). 
[ 2] Rawlings RD., Barrett AJ. Meth. EnzymoL 248:183-228(1995). 

235. (Glucosamine_iso) 

Glucosamine/galactosamine-6-phosphate isomerases signature 

Glucosamine-6-phosphate isomerase (EC 5.3.1.10 ) (or Glc-6-P deaminase) is the enzyme 
responsible for the conversion of glucosamine 6-phosphate into fructose6 phosphate [1]. It is 
the last specific step in the pathway for N-acetylglucosamine (GlcNAC) utilization in bacteria 
such as Escherichia coli (gene nagB) or in fungi such as Candida albicans (gene NAGl).Glc- 
6-P isomerase is evolutionary related to: - A putative Escherichia coli galactosamine-6- 
phosphate isomerase (gene agal) [2]. - Escherichia coli hypothetical protein yieK. - Bacillus 
subtilis hypothetical protein ybfT. As a signature pattern a conserved region located in the 
central part of these enzymes was selected. This region contains a conserved histidine which 
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has been shown [1], in nagB, to be important for the pyranose ring-opening step of the 
catalytic mechanism 

Consensus pattern: [LIVM]-x(3)-G-x-[LIT]-x-[LIV]-x-[LIVM]-x-G-[LIVM]-G-x- [DENJ-G- 
H- 

[ 1] Oliva G., Fontes M.R.M., Garratt R.C., Altamirano M.M., Calcagno M.L., Horjales E. 
Structure 3:1323-1332(1995). 

[ 2] Reizer J., Ramseier T.M., Reizer A., Charbit A., Saier M.H. Jr. Microbiology 142:231- 
250(1996). 

236. Pneumovirus attachment glycoprotein G (glycoprotein G) 

This family includes attachment proteins from respiratory synctial virus. Glycoprotein 
G has not been shown to have any neuraminidase or hemagglutinin activity (Swiss-Prot). The 
amino terminus is thought to be cytoplasmic, and the carboxyl terminus extracellular. The 
extracellular region contains four completely conserved cysteine residues. 

[1] Johnson PR, Spriggs MK, Olmsted RA, Collins PL, Proc Natl Acad Sci U S A 
1987;84:5625-5629. 

237. Glycosyl transferases group 1 

Mutations in this domain of Swiss:P37287 lead to disease (Paroxysmal Nocturnal 
haemoglobinuria). Members of this family transfer activated sugars to a variety of substrates, 
including glycogen, Fructose-6-phosphate and lipopolysaccharides. Members of this family 
transfer UDP, ADP, GDP or CMP linked sugars. The eukaryotic glycogen synthases may be 
distant members of this family. 

238. Glycosyl transferases (Glycos_transf_2) 

Diverse family, transferring sugar from UDP-glucose, UDP-N-acetyl-galactosamine, 
GDP-mannose or CDP-abequose, to a range of substrates including cellulose, dolichol 
phosphate and teichoic acids. 
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239. (GlucosJransf_3) 

Thymidine and pyrimidine-nucleoside phosphorylases signature 

Thymidine phosphorylase (EC 2.4.2.4) catalyzes the reversible phosphorolysis of 
thymidine, deoxyuridine and their analogues to their respective bases and 2-deoxyribose 1- 
phosphate. This enzyme regulates the availability of thymidine and is therefore essential to 
nucleic acid metabolism. 

In Escherichia coli (gene deoA), the enzyme is a dimer of identical subunits of about 48 
Kd [1]. In humans it was first identified as platelet-derived endothelial cell growth factor 
(PD-ECGF) [El] before being recognized [2] as thymidine phosphorylase. 

Bacterial pyrimidine-nucleoside phosphorylase (EC 2.4.2.2) (gene pdp) [3] is an enzyme 
evolutionary and structurally related to thymidine phosphorylase. 

A a well conserved region of 19 residues located in the N-terminal part of these proteins 
signature pattern for these enzymes was selected. 

Consensus patternS-[GS]-R-[GA]-[LIV]-x(2)-[TA]-[GA]-G-T-x-D-x-[LIV]-E Sequences 
known to belong to this class detected by the pattern ALL. 

[ 1] Walter M.R., Cook W.J., Cole L.B., Short S.A., Koszalka G.W. ? Krenitsky T.A., Ealick 
S.E. J. Biol. Chem. 265:14016-14022(1990). 

[ 2] Furukawa T., Yoshimura A., Sumizawa T., Haraguchi M., Akiyama S.-L, Fukui K. ? 
Yamada Y. Nature 356:668-668(1992). 

[ 3] Saxild H.H., Andersen L.N., Hammer K. J. Bacterid. 178:424-434(1996). 

240. Glycos_transf_4. Glycosyl transferase. Number of members: 44. 

[1] Medline: 95252686. A family of UDP-GlcNAc/MurNAc: polyisoprenol-P 
GlcNAc/MurNAc-l-P transferases. Lehrman MA; Glycobiology 1994;4:768-771. 
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241. Glycosyl hydrolases family 15. 21 members. 

242. Glycosyl hydrolases family 16 signature 

It has been shown [1] that the following glycosyl hydrolases can be classified into a single 
family on the basis of sequence similarities: - Bacterial beta-l,3-l ? 4-glucanases, or 
lichenases, (EC 3.2.1.73 ) mainly from Bacillus but also from Clostridium thermocellum 
(gene licB), Fibrobacter succinogenes and Rhodothermus marinus (gene bglA). - Bacillus 
circulans beta~l,3-glucanase Al (EC 3.2.1.39 ) (gene glcA). - Lamarinase (EC 3.2.1.6) from 
Clostridium thermocellum (gene laml). - Streptomyces coelicolor agarase (EC 3.2.1.81) 
(gene dagA). - Alteromonas carrageenovora kappa-carrageenase (EC 3.2.1.83) (gene 
cgkA).Two closely clustered conserved glutamates have been shown [2] to be involved in the 
catalytic activity of Bacillus licheniformis lichenase. The region was used that contains these 
residues as a signature pattern. 

Consensus pattern: E-[LIV]-D-[LIV]-x(0,l)-E-x(2)-[GQ]-[KRNF]-x-[PSTA] [The two E's 
are active site residues] - 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Juncosa M., Pons J., Dot T. ? Querol E., Planas A. J. Biol. Chem. 269:14530- 
14535(1994). 

243. Glycosyl hydrolases family 17 signature 

It has been shown [1,2] that the following glycosyl hydrolases can be classified into a single 
family on the basis of sequence similarities: - Glucan endo-l,3-beta-glucosidases (EC 
3.2.1.39 ) (endo-(l->3)-beta- glucanase) from various plants. This enzyme may be involved in 
the defense of plants against pathogens through its ability to degrade fungal cell wall 
polysaccharides. - Glucan 1,3-beta-glucosidase (EC 3.2.1.58 ) (exo-(l->3)-beta-glucanase) 
from yeast (gene BGL2). This enzyme may play a role in cell expansion during growth, in 
cell-cell fusion during mating, and in spore release during sporulation. - Lichenases (EC 
3.2.1.73 ) (endo-(l->3,l->4)-beta-glucanase) from various plants. The best conserved region 
in the sequence of these enzymes is located in their central section. This region contains a 
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conserved tryptophan residue which could be involved in the interaction with the glucan 
substrates [2] and it also contains a conserved glutamate which has been shown [3] to act as 
the nucleophile in the catalytic mechanism, this region was used as a signature pattern. 

Consensus pattern: [LIVM]-x-[LIVMFYWA](3)-[STAG]-E-[STA]-G-W-P-[STN]-x- 
[SAGO] [E is an active site residue]- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Ori N., Sessa G, Lotan T., Himmelhoch S., Fluhr R. EMBO J. 9:3429-3436(1990). 
[ 3] Varghese J.N., Garrett T.P.J., Colman P.M., Chen L., Hoj P.J., Fincher G.B. Proc. Natl. 
Acad. Sci. U.S.A. 91:2785-2789(1994). 

244. Glyoxalase I signatures 

Glyoxalase I (EC 4.4.1.5^ (lactoylglutathione lyase) catalyzes the first step of the glyoxal 
pathway, the transformation of methylglyoxal and glutathioneinto S-lactoylglutathione which 
is then converted by glyoxalase II to lactic acid [1], Glyoxalase I is an ubiquitous enzyme 
which binds one mole of zinc per subunit. The bacterial and yeast enzymes are monomeric 
while the mammalian one is homodimeric. The sequence of glyoxalase I is well conserved. In 
bacteria and mammals, the enzyme is a protein of about 130 to 180 residues while in fungi it 
is about twice longer. In these organisms the enzyme is built out of the tandem repeat of an 
homologous domain. Two signature patterns for this family were derived. The first one is 
located in the N-terminal region while the second one is located in the central section of the 
protein and contains a conserved histidine that could be implicated in the binding of the zinc 
atom. 

Consensus pattern: [HQ]-[IVT]-x-[LIVFY]-x-[IV]-x(5)-[STA]-x(2)-F-[YM]-x(2 ? 3)- [LMF]- 
G-[LMF]- 

Consensus pattern: G-[NTKQ]-x(0,5)-[GA]-[LVFY]-[GH]-H-[IVF]-[CGA]-x-[STAGLE]- 
x(2)-[DNC]~ 



[ 1] Kim N.-S., Umezawa Y., Ohmura S„ Kato S. J. Biol. Chem. 268:11217-11221(1993). 
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245. (Glypican) 
Glypicans signature 

Glypicans [1,2] are a family of heparan sulfate proteoglycans which are anchored to cell 
membranes by a glycosylphosphatidylinositol (GPI) linkage. Structurally, these proteins 
consist of three separate domains: 

a) A signal sequence; 

b) An extracellular domain of about 500 residues that contains 12 conserved 
cysteines probably involved in disulfide bonds and which also contains the 
sites of attachment of the heparan sulfate glycosaminoglycan side chains; 

c) A C-terminal hydrophobic region which is post-translationally removed 
after formation of the GPI-anchor. 

The proteins known to belong to this family are: 

- Glypican 1 (GPC1). 

- Glypican 2 (GPC2) or cerebroglycan. 

- Glypican 3 (GPC3) or OCI-5. In man, defects in GPC3 are the cause of a X- 
linked genetic disease, Simpson-Galabi-Behmel syndrome (SGBS). 

- K-glypican. 

- Glypican 5 (GPC5). 

- Drosophila protein dally. 

The signature pattern that was developed for glypicans is located in the central section of 
the extracellular domain and contains five of the conserved cysteines. 

Consensus pattemC-x(2)-C-x-G-[LIVM]-x(4)-P-C-x(2)-[FY]-C-x(2)-[LIVM]-x(2)- G-C [The 
Cs are probably involved in a disulfide bonds] Sequences known to belong to this class 
detected by the pattern ALL, except for dally. 



[ 1] Weksberg R. ? Squire J.A., Templeton D.M. Nat. Genet. 12:225-227(1996). 
[ 2] Watanabe K., Yamada H., Yamaguchi Y. J. Cell Biol. 130:1207-1218(1995). 
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246. Granins signatures 

Granins (chromogranins or secretogranins) [1] are a family of acidic proteins present in the 
secretory granules of a wide variety of endocrine and neuro-endocrine cells. The exact 
function(s) of these proteins is not yet known but they seem to be the precursors of 
biologically active peptides and/or they may act as helper proteins in the packaging of peptide 
hormones and neuropeptides. Three members of this family of proteins show some sequence 
similarities: - Chromogranin A (CGA) [2]. CGA is a protein of about 420 residues; it is the 
precursor of the peptide pancreastatin which strongly inhibits glucose- induced insulin release 
from the pancreas. - Secretogranin 1 (chromogranin B). A sulfated protein of about 600 
residues. - Secretogranin 2 (chromogranin C). A sulfated protein of about 650 residues. Apart 
from their subcellular location and the abundance of acidic residues(Asp and Glu), these 
proteins do not share many structural similarities. Only one short region, located in the C- 
terminal section, is conserved in all these proteins. Chromogranins A and B share a region of 
high similarity in their N-terminal section; this region includes two cysteine residues involved 
in a disulfide bond 

Consensus pattern: [DE]-[SN]-L-[SAN]-x(2)-[DE]-x-E-L- 

Consensus pattern: C-[LIVM](2)-E-[LIVM](2)-S-[DN]-[STA]-L-x-K-x-S-x(3)- [LIVM]- 
[STA]-x-E-C [The two Cs are linked by a disulfide bond]- 

[ 1] Huttner W.B., Gerdes H.-H., Rosa P. Trends Biochem. Sci. 16:27-30(1991). 
[ 2] Simon J.-P., Aunis D. Biochem. J. 262:1-13(1989). 

247. grpE protein signature 

In prokaryotes the grpE protein [1] stimulates, jointly with dnaJ, the ATPase activity of the 
dnaK chaperone. It seems to accelerate the release of ADP from dnaK thus allowing dnaK to 
recycle more efficiently. GrpE is a protein of about 22 to 25 Kd. In yeast, an evolutionary 
related mitochondrial protein(gene GRPE) has been shown [2] to associate with the 
mitochondrial hsp70protein and to thus play a role in the import of proteins from the 
cytoplasm. As a signature pattern, the most conserved region of grpE was selected. It is 
located in the C-terminal section. 
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Consensus pattern: [FL]-[DN]-[PHEA]-x(2)-[HM]-x-A-[LIVMTN]~x(16 ? 20)-G-[FY]- x(3)- 
[DEG]-x(2)-[LIVM]-[RI]-x-[SA]-x-V-x-[IV]- 

[ 1] Georgopoulos C, Welch W, Annu. Rev, Cell Biol. 9:601-635(1993). 

[ 2] Bolliger L., Deloche O., Glick B.S., Georgopoulos C, Jenoe P., Kronidou N., Horst M., 

Morishima N., Schatz G. EMBO J. 13:1998-2006(1994). 

248. Guanylate kinase signature and profile 

Guanylate kinase (EC 2.7.4.8 ) (GK) [1] catalyzes the ATP-dependent phosphorylation of 
GMP into GDP. It is essential for recycling GMP and indirectly, cGMP. In prokaryotes (such 
as Escherichia coli), lower eukaryotes (such as yeast) and in vertebrates, GK is a highly 
conserved monomeric protein of about 200 amino acids, GK has been shown [2,3,4] to be 
structurally similar to the following proteins: - Protein A57R (or SalG2R) from various 
strains of Vaccinia virus. This protein is highly similar to GK, but contains a frameshift 
mutation in the N-terminal section and could therefore be inactive in that virus. The 
following proteins are characterized by the presence in their sequence of one or more copies 
of the DHR domain, a SH3 domain (see <PDOC50002> as well as a C-terminal GK-like 
domain, these protein are collectively termed MAGUKs (membrane-associated guanylate 
kinase homologs) [5]: - Drosophila lethal(l)discs large-1 tumor suppressor protein (gene 
dlgl). This protein is associated with septate junctions in developing flies and defects in the 
dlgl gene cause neoplastic overgrowth of the imaginal disks. - Mammalian tight junction 
protein Zo-1. - A family of mammalian synaptic proteins that seem to interact with the 
cytoplasmic tail of NMDA receptor subunits. This family currently consist of SAP90/PSD- 
95, CHAPSYN-110/PSD-93, SAP97/DLG1 and SAP102. - Vertebrate 55 Kd erythrocyte 
membrane protein (p55). p55 is a palmitoylated, membrane-associated protein of unknown 
function. - Caenorhabditis elegans protein lin-2, which may play a structural role in the 
induction of the vulva. - Rat protein CASK. - Human protein DLG2. - Human protein 
DLG3 .There is an ATP-binding site (P-loop) in the N-terminal section of GK. This region is 
not conserved in the GK-like domain of the above proteins which are therefore unlikely to be 
kinases. However these proteins retain the residues known, in GK, to be involved in the 
binding of GMP. As a signature pattern a highly conserved region was selected that contains 
two arginine and a tyrosine which are involved in GMP-binding 
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Consensus pattern: T-[ST]-R-x(2)-[KR]-x(2)-[DE]-x(2)-G-x(2)-Y-x-[FY]-[LIVMK]- 

[ 1] Stehle T., Schulz G.E. J. Mol. Biol. 224:1127-1141(1992). 
[ 2] Bryant P.J., Woods D.F. Cell 68:621-622(1 992V 
[ 3] Goebl M.G. Trends Biochem. ScL 17:99-99(1992). 

[ 4] Zschocke P.D., Schiltz E., Schulz G.E. Eur. J. Biochem. 213:263-269(1993). 
[ 5] Woods D.F., Bryant P.J. Mech. Dev. 44:85-89(1994). 

249. (Glyco_hydro_35) 

Glycosyl hydrolases family 35 putative active site 

Beta-galactosidases (EC 3.2.1.23) from mammals, fungi, plants and the bacteria 
Xanthomonas manihotis are evolutionary related [1,2]. They belong to family 35 in the 
classification of glycosyl hydrolases [3,E1]. 

Mammalian beta-galactosidase is a lysosomal enzyme (gene GLB1) which cleaves the 
terminal galactose from gangliosides, glycoproteins, and glycosaminoglycans and whose 
deficiency is the cause of the genetic disease Gm(l) gangliosidosis (Morquio disease type B). 

On of the best conserved regions in these enzymes contains a glutamic acid residue which, on 
the basis of similarities with other families of glycosyl hydrolases [4], probably acts as the 
proton donor in the catalytic mechanism. This region wss used as a signature pattern. 

Consensus pattern: G-G-P-[LIVM](2)-x(2)-Q-x-E-N-E-[FY] [The second E is the putative 
active site residue] Sequences known to belong to this class detected by the pattern ALL. 

[ 1] Taron C.H., Benner J.S., Hornstra L.J., Guthrie E.P. Glycobiology 5:603-610(1995). 
[ 2] Carey A.T., Holt K., Picard S., Wilde R., Tucker G.A., Bird C.R., Schuch W., Seymour 
G.B. Plant Physiol. 108:1099-1107(1995). 
[ 3] Henrissat B., Bairoch A. Biochem. J. 293:781-788(1993). 

[ 4] Henrissat B., Callebaut I., Fabrega S., Lehn P., Mornon J.-P., Davies G. Proc. Natl. Acad. 
Sci. U.S.A. 92:7090-7094(1995). 
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250. (Glyco_hydro_16) 

Glycosyl hydrolases family 16 signature 

It has been shown [1] that the following glycosyl hydrolases can be classified into a single 
family on the basis of sequence similarities: 

- Bacterial beta-l ? 3-l ? 4-glucanases ? or lichenases, (EC 3.2.1.73) mainly from 
Bacillus but also from Clostridium thermocellum (gene licB), Fibrobacter 
succinogenes and Rhodothermus marinus (gene bglA). 

- Bacillus circulans beta-l,3-glucanase Al (EC 3.2.1.39) (gene glcA). 

- Lamarinase (EC 3.2.1.6) from Clostridium thermocellum (gene laml). 

- Streptomyces coelicolor agarase (EC 3.2.1.81) (gene dagA). 

- Alteromonas carrageenovora kappa-carrageenase (EC 3.2.1.83) (gene cgkA). 

Two closely clustered conserved glutamates have been shown [2] to be involved in the 
catalytic activity of Bacillus licheniformis lichenase. The region that contains these residues 
as a signature pattern was used. 

Consensus pattern E-[LIV]-D-[LIV]-x(0 ? l)-E-x(2)-[GQ]-[KRNF]-x-[PSTA] [The two E f s are 
active site residues] 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Juncosa M., Pons J., Dot T., Querol E. 7 Planas A. J. Biol. Chem. 269:14530- 
14535(1994). 

251. (Glyco_hydro_17) 

Glycosyl hydrolases family 17 signature 

(aka glycosyl_hydro4) 

It has been shown [1,2] that the following glycosyl hydrolases can be classified into a single 
family on the basis of sequence similarities: 
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- Glucan endo-l,3-beta-glucosidases (EC 3.2.1.39) (endo-(l->3)-beta-glucanase) from 
various plants. This enzyme may be involved in the defense of plants against pathogens 
through its ability to degrade fungal cell wall polysaccharides. 

- Glucan 1,3-beta-glucosidase (EC 3.2.1.58) (exo-(l->3)-beta~glucanase) from yeast (gene 
BGL2). This enzyme may play a role in cell expansion during growth, in cell-cell fusion 
during mating, and in spore release during sporulation. 

- Lichenases (EC 3.2.1.73) (endo-(l->3,l->4)-beta-glucanase) from various plants. 

The best conserved region in the sequence of these enzymes is located in their central section. 
This region contains a conserved tryptophan residue which could be involved in the 
interaction with the glucan substrates [2] and it also contains a conserved glutamate which 
has been shown [3] to act as the nucleophile in the catalytic mechanism. This region was used 
as a signature pattern. 

Consensus pattern [LIVM]-x-[LIVMFYWA](3)-[STAG]-E-[STA]-G-W-P-[STN]-x-[SAGQ] 
[E is an active site residue] Sequences known to belong to this class detected by the pattern 
ALL. 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Ori N., Sessa G., Lotan T., Himmelhoch S., Fluhr R. EMBO J. 9:3429-3436(1990). 
[ 3] Varghese J.N., Garrett T.PJ., Colman P.M., Chen L., Hoj P.L, Fincher G.B. Proc. Natl. 
Acad. ScL U.S.A. 91:2785-2789(1994). 

252. (Glycojiydro_3) 

Glycosyl hydrolases family 3 active site 

It has been shown [1,2] that the following glycosyl hydrolases can be, on the basis of 
sequence similarities, classified into a single family: 

- Beta glucosidases (EC 3.2.1.21) from the fungi Aspergillus wentii (A-3), 
Hansenula anomala, Kluyveromyces fragilis, Saccharomycopsis fibuligera, 
(BGL1 and BGL2), Schizophyllum commune and Trichoderma reesei (BGL1). 

- Beta glucosidases from the bacteria Agrobacterium tumefaciens (Cbgl), 
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Butyrivibrio fibrisolvens (bglA), Clostridium thermocellum (bglB), 
Escherichia coli (bglX), Erwinia chrysanthemi (bgxA) and Ruminococcus 
albus. 

- Alteromonas strain 0-7 beta-hexosaminidase A (EC 3.2.1.52). 

- Bacillus subtilis hypothetical protein yzbA. 

- Escherichica coli hypothetical protein ycfO and HI0959, the corresponding 
Haemophilus influenzae protein. 

One of the conserved regions in these enzymes is centered on a conserved aspartic acid 
residue which has been shown [3], in Aspergillus wentii beta- glucosidase A3, to be 
implicated in the catalytic mechanism. This region was used as a signature pattern. 

Consensus pattem[LIVM](2)-[KR]-x-[EQ [STJ-D- 
x(2)-[SGADNI] [D is the active site residue] Sequences known to belong to this class 
detected by the patternALL. 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Castle L.A., Smith K.D., Morris R.O. J. Bacteriol. 174:1478-1486(1992). 
[ 3] Bause E., Legler G. Biochim. Biophys. Acta 626:459-465(1980). 

253. (GlycoJiydro_28) 
Polygalacturonase active site (aka PG) 

Polygalacturonase (EC 3.2.1.15) (PG) (pectinase) [1,2] catalyzes the random hydrolysis of 
1,4-alpha-D-galactosiduronic linkages in pectate and other galacturonans. In fruit, 
polygalacturonase plays an important role in cell wall metabolism during ripening. In plant 
bacterial pathogens such as Erwinia carotovora or Pseudomonas solanacearum and fungal 
pathogens such as Aspergillus niger, polygalacturonase is involved in maceration and soft- 
rotting of plant tissue. 

Exo-poly-alpha-D-galacturonosidase (EC 3.2.1.82) (exoPG) [3] hydrolyzes peptic acid from 
the non-reducing end, releasing digalacturonate. 
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Prokaryotic, eukaryotic PG and exoPG share a few regions of sequence similarity. The best 
conserved of these regions was selected. It is centered on a conserved histidine most 
probably involved in the catalytic mechanism [4], 

Consensus pattern[GSDENKRH]-x(2)-[VMFC]-x(2)-[GS]-H-G-[LIVMAG]-x(l ? 2)- [LIVM]- 
G-S [H is the putative active site residue] Sequences known to belong to this class detected 
by the patternALL. 

Note: these proteins belong to family 28 in the classification of glycosyl hydrolases [5]. 

[ 1] Ruttowski E., Labitzke R., Khanh N.Q., Loeffler F., Gottschalk M., Jany K.-D. Biochim. 
Biophys. Acta 1087:104-106(1990). 

[ 2] Huang J., Schell MA. J. Bacteriol. 172:3879-3887(1990). 

[ 3] He S.Y., Collmer A. J. Bacteriol. 172:4988-4995(1990). 

[ 4] Bussink H.J.D., Buxton F.P., Visser J. Curr. Genet. 19:467-474(1991). 

[ 5] Henrissat B. Biochem. J. 280:309-316(1991). 

254. (Glyco_hydro_32) 

Glycosyl hydrolases family 32 active site 

It has been shown [1,2] that the following glycosyl hydrolases can be classified into a single 
family on the basis of sequence similarities: 

- Inulinase (EC 3.2.1.7) (or inulase) from the fungi Kluyveromyces marxianus. 

- Beta-fructofuranosidase (EC 3.2.1.26), commonly known as invertase in fungi 
and plants and as sucrase in bacteria (gene sacA or scrB). 

- Raffinose invertase (EC 3.2.1.26) (gene rafD) from Escherichia coli plasmid 
pRSD2. 

- Levanase (EC 3.2.1.65) (gene sacC) from Bacillus subtilis. 

One of the conserved regions in these enzymes is located in the N-terminal section and 
contains an aspartic acid residue which has been shown [3], in yeast invertase to be important 
for the catalytic mechanism. This region was used as a signature pattern. 
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Consensus pattern H-x(2)-P-x(4)-[LIVM]-N-D-P-N-G [D is the active site residue] 
Sequences known to belong to this class detected by the patternALL. 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Gunasekaran P., Karunakaran T., Cami B., Mukundan A.G., Preziosi L., Baratti J. J. 
Bacteriol. 172:6727-6735(1990). 

[ 3] Reddy V.A., Maley F. J. Biol. Chem. 265:10817-10120(1990). 

255. (Glyco_hydro_l) 

Glycosyl hydrolases family 1 signatures 

It has been shown [1 to 4] that the following glycosyl hydrolases can be, on the basis of 
sequence similarities, classified into a single family: 

- Beta-glucosidases (EC 3.2.1.21) from various bacteria such as Agrobacterium 
strain ATCC 21400, Bacillus polymyxa, and Caldocellum saccharolyticum. 

- Two plants (clover) beta-glucosidases (EC 3.2.1.21). 

- Two different beta-galactosidases (EC 3.2.1.23) from the archaebacteria 
Sulfolobus solfataricus (genes bgaS and lacS). 

- 6-phospho-beta-galactosidases (EC 3.2.1.85) from various bacteria such as 
Lactobacillus casei, Lactococcus lactis, and Staphylococcus aureus. 

- 6-phospho-beta-glucosidases (EC 3.2.1.86) from Escherichia coli (genes bglB 
and ascB) and from Erwinia chrysanthemi (gene arbB). 

- Plants myrosinases (EC 3.2.3.1) (sinigrinase) (thioglucosidase). 

- Mammalian lactase-phlorizin hydrolase (LPH) (EC 3.2.1.108 / EC 3.2.1.62). 
LPH, an integral membrane glycoprotein, is the enzyme that splits lactose 

in the small intestine. LPH is a large protein of about 1900 residues which 
contains four tandem repeats of a domain of about 450 residues which is 
evolutionary related to the above glycosyl hydrolases. 

One of the conserved regions in these enzymes is centered on a conserved glutamic acid 
residue which has been shown [5], in the beta-glucosidase from Agrobacterium, to be directly 
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involved in glycosidic bond cleavage by acting as a nucleophile. This region was used as a 
signature pattern. As a second signature pattern we selected a conserved region, found in the 
N-terminal extremity of these enzymes, this region also contains a glutamic acid residue. 

Consensus pattem[LIVMFSTC]-[LIVFYS]-[LIV]-[LIVMST]-E-N-G-[LIVMFAR]- 
[CSAGN] [E is the active site residue] Sequences known to belong to this class detected by 
the patternALL. 

Note: this pattern will pick up the last two domains of LPH; the first two domains, which are 
removed from the LPH precursor by proteolytic processing, have lost the active site 
glutamate and may therefore be inactive [4]. 

Consensus pattemF-x-[FYWM]-[GSTA]-x-[GSTA]-x-[GSTA](2)-[FYNH]-[NQ]-x-E-x- 
[GSTA] Sequences known to belong to this class detected by the pattern ALL, 

Note: this pattern will pick up the last three domains of LPH. 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991). 

[ 3}Gonzalez-Candelas L. ? Ramon D., Polaina J. Gene 95:31-38(1990). 

[ 4] El Hassouni M., Henrissat B., Chippaux M., Barras F. J. Bacteriol. 174:765-777(1992). 

[ 5] Withers S.G., Warren R.A.J., Street LP., Rupitz K., Kempton J.B., Aebersold R. J. Am. 

Chem. Soc. 112:5887-5889(1990). 

256. Glyco_hydro_20 
Glycosyl hydrolase family 20 
Previous Pfam IDs: glycosyljiydrll; 
Number of members: 33 

257. (Glyco_hydro_9) 

Glycosyl hydrolases family 9 active sites signatures 
(aka Glycosyl_hydrl2) 
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The microbial degradation of cellulose and xylans requires several types of enzymes such as 
endoglucanases (EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91) (exoglucanases), or xylanases 
(EC 3.2.1.8) [1,2]. Fungi and bacteria produces a spectrum of cellulolytic enzymes 
(cellulases) and xylanases which, on the basis of sequence similarities, can be classified into 
families. One of these families is known as the cellulase family E [3] or as the glycosyl 
hydrolases family 9 [4,E1]. The enzymes which are currently known to belong to this family 
are listed below. 

- Butyrivibrio fibrisolvens cellodextrinase 1 (cedl). 

- Cellulomonas fimi endoglucanases B (cenB) and C (cenC). 

- Clostridium cellulolyticum endoglucanase G (celCCG). 

- Clostridium cellulovorans endoglucanase C (engC). 

- Clostridium stercoararium endoglucanase Z (avicelase I) (celZ). 

- Clostridium thermocellum endoglucanases D (celD), F (celF) and I (cell). 

- Fibrobacter succinogenes endoglucanase A (endA). 

- Pseudomonas fluorescens endoglucanase A (celA). 

- Streptomyces reticuli endoglucanase 1 (cell). 

- Thermomonospora fusca endoglucanase E-4 (celD). 

- Dictyostelium discoideum spore germination specific endoglucanase 270-6. This slime 
mold enzyme may digest the spore cell wall during germination, to release the enclosed 
amoeba. 

- Endoglucanases from plants such as Avocado or French bean. In plants this enzyme may be 
involved the fruit ripening process. 

Two of the most conserved regions in these enzymes are centered on conserved residues 
which have been shown [5,6], in the endoglucanase D from Cellulomonas thermocellum, to 
be important for the catalytic activity. The first region contains an active site histidine and the 
second region contains two catalytically important residues: an aspartate and a glutamate. 
Both regions were used as signature patterns. 
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Consensus pattern [STV]-x-[LIVMFY]-[STV]-x(2)-G-x-[NKR]-x(4)-[PLIVM]-H-x-R [H is 
an active site residue] Sequences known to belong to this class detected by the pattern ALL, 
except for Cellulomonas fimi cenC and Streptomyces reticuli cell. 

Consensus pattern [FYW]-x-D-x(4)-[FYW]-x(3)-E-x-[STA]-x(3)-N-[STA] [D and E are 
active site residues] Sequences known to belong to this class detected by the pattern ALL, 
except for Fibrobacter succinogenes endA whose sequence seems to be incorrect. 

[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

[ 2] Gilkes N.R., Henrissat B., Kilburn D.G., Miller R.C. Jr., Warren R.AJ. Microbiol. Rev. 
55:303-315(1991). 

[ 3] Henrissat B., Claeyssens M., Tomme P., Lemesle L., Mornon J.-P. Gene 81:83-95(1989). 
[ 4] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 5] Tomme P., Chauvaux S., Beguin P., Millet J., Aubert J.-P., Claeyssens M. J. Biol. Chem. 
266:10313-10318(1991). 

[ 6] Tomme P., van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 

258. Matrix protein (MA), pl5 (GAG_ma) 

The matrix protein, pl5, is encoded by the gag gene. MA is involved in pathogenicity 

[!]• 

[1] : Pozsgay JM, Beilharz MW, Wines BD, Hess AD, Pitha PM, J Virol 
1993;67:5989-5999. 

259. Gag polyprotein, inner coat protein pl2 (GAG_P12) 

The retroviral pl2 is a virion structural protein. pl2 is proline rich. The function 
carried out by pl2 in assembly and replication is unknown. pl2C is associated with 
pathogenicity of the virus 

[1] Pozsgay JM, Beilharz MW, Wines BD, Hess AD, Pitha PM, J Virol 1993;67:5989-5999. 



260. Glutamine synthetase signatures (GLN-SYNT) 
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Glutamine synthetase (EC 6.3.1.2 ) (GS) [1] plays an essential role in the metabolism of 
nitrogen by catalyzing the condensation of glutamate and ammonia to form glutamine. There 
seem to be three different classes of GS [2,3,4]: - Class I enzymes (GSI) are specific to 
prokaryotes, and are oligomers of 12 identical subunits. The activity of GSI-type enzyme is 
controlled by the adenylation of a tyrosine residue. The adenylated enzyme is inactive. - 
Class II enzymes (GSII) are found in eukaryotes and in bacteria belonging to the 
Rhizobiaceae, Frankiaceae, and Streptomycetaceae families (these bacteria have also a class-I 
GS). GSII are octamer of identical subunits. Plants have two or more isozymes of GSII, one 
of the isozymes is translocated into the chloroplast. - Class HI enzymes (GSIII) has, 
currently, only been found in Bacteroides fragilis and in butyrivibrio fibrisolvens. It is a 
hexamer of identical chains. It is much larger (about 700 amino acids) than the GSI (450 to 
470 amino acids) or GSII (350 to 420 amino acids) enzymes. While the three classes of GS f s 
are clearly structurally related, the sequence similarities are not so extensive. As signature 
patterns three conserved regions were selected. The first pattern is based on a conserved 
tetrapeptide in the N-terminal section of the enzyme, the second one is based on a glycine- 
rich region which is thought to be involved in ATP-binding. The third pattern is specific to 
class I glutamine synthetases and includes the tyrosine residue which is reversibly 
adenylated. 

Consensus pattern: [FYWL]-D-G-S-S-x(6,8)-[DENQSTAK]-[SA]-[DE]-x(2)-[LIVMFY]- 
Consensus pattern: K-P-[LIVMFYA]-x(3,5)-[NPAT]-G-[GSTAN]-G-x-H-x(3)-S- 
Consensus pattern: K-[LIVM]-x(5)-[LIVMA]-D-[RK]-[DN]-[LI]-Y [Y is the site of 
adenylation] - 

[ 1] Eisenberg D., Almassy R.J., Janson C.A., Chapman M.S., Suh S.W., Cascio D., Smith 
W.W. Cold Spring Harbor Symp. Quant. Biol. 52:483-490(1987), 

[ 2] Kumada Y. ? Benson D.R., Hillemann D., Hosted T.J., Rochefort D.A., Thompson C.J., 
Wohlleben W., Tateno Y. Proc. Natl. Acad. Sci. U.S.A. 90:3009-3013(1993). 
[ 3] Shatters R.G., Kahn M.L. J. Mol. Evol. 29:422-428(1989). 

[ 4] Brown J.R., Masuchi Y., Robb F.T., Doolittle W.F. J. Mol. Evol. 38:566-576(1994). 



261. Globins profile (globinl) 
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Globins are heme-containing proteins involved in binding and/or transporting oxygen [1], 
They belong to a very large and well studied family which is widely distributed in many 
organisms. The major groups of globins are: - Hemoglobins (Hb) from vertebrates. Hb is the 
protein responsible for transporting oxygen from the lungs to other tissues. It is a tetramer of 
two alpha and two beta chains. Most vertebrate species also express specific embryonic or 
fetal forms of hemoglobin where the alpha or the beta chains are replaced by a chain with 
higher oxygen affinity, as for the gamma, delta, epsilon and zeta chains in mammals, for 
example. - Myoglobins (Mg) from vertebrates. Mg is a monomeric protein responsible for 
oxygen storage in muscles. - Invertebrate globins [2]. A wide variety of globins are found in 
invertebrates. Molluscs generally have one or two muscle globins which are either 
monomeric or dimeric. Insects, such as the midge Chironomus thummi, have a large set of 
extracellular globins. Nematodes and annelids have a variety of intracellular and extracellular 
globins; some of them are multi- domain polypeptides (from two up to nine-domain globins) 
and some produce large, disulfide-bonded aggregates. - Leghemoglobins (Lg) from the root 
nodules of leguminous plants. Lg provides oxygen for bacteroids. - Flavohemoproteins from 
bacteria (Escherichia coli hmpA) and fungi [3]. These proteins consist of two distinct 
domains: an N-terminal globin domain and a C-terminal FAD-containing reductase domain. 
In bacteria such as Vitreoscilla, the enzyme-associated globin is a single domain protein. All 
these globins seem to have evolved from a common ancestor. The profile developed to detect 
members of the globin family is based on a structural alignment of selected globin sequences 
[ 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New- 
York (1988).[ 2] Goodman M., Pedwaydon J., Czelusniak J., Suzuki T., Gotoh T., Moens L., 
Shishikura F., Walz D., Vinogradov S. J. Mol. Evol. 27:236-249(1988). 

Plant hemoglobins signature (globin2) 

Leghemoglobins [1] are hemoproteins present in the root nodules of leguminousplants. 
Leghemoglobins are structurally and functionally related to hemoglobin and myoglobin. By 
providing oxygen to the bacteroids, they are essential for symbiotic nitrogen fixation. 
Structurally related hemoglobins from the nodules of non-leguminous plants [2,3], and from 
the roots of non-nodulating plants[4] have been recently sequenced. A signature pattern was 
developed that picks up the sequence of plants hemoglobins, exclusively. 

Consensus pattern: [SN]-P-x-L-x(2)-H-A-x(3)-F- 
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[ 1] Powell R., Gannon F. BioEssays 9:117-121(1988). 

[ 2] Kortt A A., Trinick M.J., Appleby C.A. Eur. J. Biochem. 175:141-149(1988). 
[ 3] Kortt A. A., Inglis A.S, Fleming A.I., Appleby C.A. FEBS Lett. 231:341-346(1988). 
[ 4] Bogusz D., Appleby C.A., Landsmann J., Dennis E.S., Trinick MJ., Peacock W.J. 
Nature 331:178-180(1988). 

262. Fructose-bisphosphate aldolase class-I active site (glycolytic_enz) 

Fructose-bisphosphate aldolase [1,2] is a glycolytic enzyme that catalyzes the 
reversible aldol cleavage or condensation of fructose- 1,6-bisphosphate into 
dihydroxyacetone-phosphate and glyceraldehyde 3-phosphate.There are two classes of 
fructose-bisphosphate aldolases with different catalytic mechanisms. Class-I aldolases [3], 
mainly found in higher eukaryotes, are homotetrameric enzymes which form a Schiff-base 
intermediate between the C-2 carbonyl group of the substrate (dihydroxyacetone 
phosphate)and the epsilon-amino group of a lysine residue. In vertebrates, three forms of this 
enzyme are found: aldolase A in muscle, aldolase B in liver and aldolase C in brain. The 
sequence around the lysine involved in the Schiff-base is highly conserved and can be used as 
a signature for this class of enzyme. 

Consensus pattern: [LIVM]-x-[LIVMFYW]-E-G-x-[LS]-L-K-P-[SN] [Kis involved in 
Schiff-base formation] - 

[ 1] Perham R.N. Biochem. Soc. Trans. 18:185-187(1990). 

[ 2] Marsh JJ., Lebherz H.G. Trends Biochem. Sci. 17:110-113(1992). 

[ 3] Freemont P.S., Dunbar B., Fothergill-Gilmore L.A. Biochem. J. 249:779-788(1988). 

263. Glycosyl hydrolases family 11 active sites signatures 

The microbial degradation of cellulose and xylans requires several types of enzymes such as 
endoglucanases (EC 3.2.1.4 ), cellobiohydrolases (EC 3.2.1.91 ) (exoglucanases), or xylanases 
(EC 3.2.1.8 ) [1,2]. Fungi and bacteria produces a spectrum of cellulolytic enzymes 
(cellulases) and xylanases which, on the basis of sequence similarities, can be classified into 
families. One of these families is known as the cellulase family G [3] or as the glycosyl 
hydrolases family 11 [4JB1]. The enzymes which are currently known to belong to this family 
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are listed below. - Aspergillus awamori xylanase C (xynC). - Bacillus circulans, pumilus, 
stearothermophilus and subtilis xylanase (xynA). - Clostridium acetobutylicum xylanase 
(xynB). - Clostridium stercorarium xylanase A (xynA). - Fibrobacter succinogenes xylanase 
C (xynC) which consist of two catalytic domains that both belong to family 10. - 
Neocallimastix patriciarum xylanase A (xynA). - Ruminococcus flavefaciens bifunctional 
xylanase XYLA (xynA). This protein consists of three domains: a N-terminal xylanase 
catalytic domain that belongs to family 11 of glycosyl hydrolases; a central domain 
composed of short repeats of Gin, Asn an Trp, and a C-terminal xylanase catalytic domain 
that belongs to family 10 of glycosyl hydrolases. - Schizophyllum commune xylanase A. - 
Streptomyces lividans xylanases B (xlnB) and C (xlnC). - Trichoderma reesei xylanases I and 
II. Two of the conserved regions in these enzymes are centered on glutamic acidresidues 
which have both been shown [5], in Bacillus pumilis xylanase, to be necessary for catalytic 
activity. Both regions were used as signature patterns. 

Consensus pattern: [PSA]-[LQ]-x-E-Y-Y-[LIVM](2)-[DE]-x-[FYWHN] [E is an active site 
residue] - 

Consensus pattern: [LIVMF]-x(2)-E-[AG]-[YWG]-[QRFGS]-[SG]-[STAN]-G-x-[SAF] [E is 
an active site residue] - 

[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

[ 2] Gilkes N.R., Henrissat B. ? Kilburn D.G., Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 
55:303-315(1991). 

[ 3] Henrissat B. ? Claeyssens M., Tomme P., Lemesle L., Mornon J.-P. Gene 81:83-95(1989). 
[ 4] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 5] Ko E.P., Akatsuka H., Moriyama H., Shinmyo A., Hata Y., Katsube Y., Urabe L, Okada 
H. Biochem. J. 288:117-121(1992). 

264. Glycosyl hydrolase family 14 

This family are beta amylases. 



265. Glycosyl hydrolases family 1 signatures 
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It has been shown [1 to 4] that the following glycosyl hydrolases can be, on the basis of 
sequence similarities, classified into a single family: - Beta-glucosidases (EC 3.2.1.21) from 
various bacteria such as Agrobacterium strain ATCC 21400, Bacillus polymyxa, and 
Caldocellum saccharolyticum. - Two plants (clover) beta-glucosidases (EC 3.2.1,21). - Two 
different beta-galactosidases (EC 3.2.1.23 ) from the archaebacteria Sulfolobus solfataricus 
(genes bgaS and lacS). - 6-phospho-beta-galactosidases (EC 3.2.1.85 ) from various bacteria 
such as Lactobacillus casei, Lactococcus lactis, and Staphylococcus aureus. - 6-phospho- 
beta-glucosidases (EC 3.2.1.86 ) from Escherichia coli (genes bglB and ascB) and from 
Erwinia chrysanthemi (gene arbB). - Plants myrosinases (EC 3,2.3.1 ) (sinigrinase) 
(thioglucosidase). - Mammalian lactase-phlorizin hydrolase (LPH) (EC 3.2.1.108 / EC 
3.2.1.62 ). LPH, an integral membrane glycoprotein, is the enzyme that splits lactose in the 
small intestine. LPH is a large protein of about 1900 residues which contains four tandem 
repeats of a domain of about 450 residues which is evolutionary related to the above glycosyl 
hydrolases. One of the conserved regions in these enzymes is centered on a conserved 
glutamic acid residue which has been shown [5], in the beta-glucosidase from 
Agrobacterium, to be directly involved in glycosidic bond cleavage by acting as a 
nucleophile. This region was used as a signature pattern. As a second signature pattern a 
conserved region was selected, found in the N-terminal extremity of these enzymes, this 
region also contains a glutamic acid residue. 

Consensus pattern: [LIVMFSTC]-[LIVFYS]-[LIV]-[LIVMST]-E-N-G-[LIVMFAR]- 
[CSAGN] [E is the active site residue] 

Note: this pattern will pick up the last two domains of LPH; the first two domains, which are 
removed from the LPH precursor by proteolytic processing, have lost the active site 
glutamate and may therefore be inactive [4]. 

Consensus pattern: F-x-[FYWM]-[GSTA]-x-[GSTA]-x-[GSTA](2)-[FYNH]-[NQ]-x-E-x- 
[GSTA]- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991). 

[ 3] Gonzalez-Candelas L., Ramon D., Polaina J. Gene 95:31-38(1990). 

[ 4] El Hassouni M., Henrissat B. ? Chippaux M., Barras F. J. BacterioL 174:765-777(1992). 

[ 5] Withers S.G., Warren R.A.J., Street LP., Rupitz K. ? Kempton J.B., Aebersold R. J. Am. 

Chem. Soc. 112:5887-5889(1990). 
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266. Glycosyl hydrolases family 2 signatures 

It has been shown [1,2,E1] that the following glycosyl hydrolases can be, on the basis of 
sequence similarities, classified into a single family: - Beta-galactosidases (EC 3.2,1.23) from 
bacteria such as Escherichia coli (genes lacZ and ebgA), Clostridium acetobutylicum, 
Clostridium thermosulfurogenes, Klebsiella pneumoniae, Lactobacillus delbrueckii, or 
Streptococcus thermophilus and from the fungi Kluyveromyces lactis. - Beta-glucuronidase 
(EC 3.2.1.31 ) from Escherichia coli (gene uidA) and from mammals. One of the conserved 
regions in these enzymes is centered on a conserved glutamic acid residue which has been 
shown [3], in Escherichia coli lacZ, to be the general acid/base catalyst in the active site of 
the enzyme. This region was used as a signature pattern. As a second signature pattern a 
highly conserved region was selected located some sixty residues upstream from the active 
site glutamate. 

Consensus pattern: N-x-[LIVMFYWD]-R-[STACN](2)-H-Y-P-x(4)-[LIVMFYWS](2)-x(3> 
[DN]-x(2)-G-[LIVMFYW](4)- 

Consensus pattern: [DENQLF]-[KRVW]-N-[HRY]-[STAPV]-[SAC]-[LIVMFS](3)-W-[GS]- 
x(2,3)-N-E [E is the active site residue]- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Schroeder C.J., Robert C, Lenzen G., McKay L.L., Mercenier A. J. Gen. Microbiol. 
137:369-380(1991). 

[ 3] Gebler J.C., Aebersold R., Withers S.G. J. Biol. Chem. 267:11126-11130(1992). 
267. Glycosyl hydrolases family 3 active site 

It has been shown [1,2] that the following glycosyl hydrolases can be, on the basis of 
sequence similarities, classified into a single family: 

- Beta glucosidases (EC 3.2.1.21) from the fungi Aspergillus wentii (A-3), 
Hansenula anomala, Kluyveromyces fragilis, Saccharomycopsis fibuligera, 
(BGL1 and BGL2), Schizophyllum commune and Trichoderma reesei (BGL1). 

- Beta glucosidases from the bacteria Agrobacterium tumefaciens (Cbgl), 
Butyrivibrio fibrisolvens (bglA), Clostridium thermocellum (bglB), 
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Escherichia coli (bglX), Erwinia chrysanthemi (bgxA) and Ruminococcus 
albus. - Alteromonas strain 0-7 beta-hexosaminidase A (EC 3.2.1.52). 

- Bacillus subtilis hypothetical protein yzbA. 

- Escherichica coli hypothetical protein ycfO and HI0959, the corresponding 
Haemophilus influenzae protein. 

One of the conserved regions in these enzymes is centered on a conserved 
aspartic acid residue which has been shown [3], in Aspergillus wentii beta- 
glucosidase A3, to be implicated in the catalytic mechanism. This 
region was used as a signature pattern. 

Consensus pattern: [LIVM](2)-[KR]-x-[EQK]-x(4)-G-[LIVMFT]-[LIVT]-[LIVM 
x(2)-[SGADNI] [D is the active site residue] 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Castle LA, Smith K.D., Morris R.O. J. Bacterid. 174:1478-1486(1992). 
[ 3] Bause E., Legler G. Biochim. Biophys. Acta 626:459-465(1980). 

268. Glycosyl hydrolases family 8 signature 

The microbial degradation of cellulose and xylans requires several types of enzymes such as 
endoglucanases (EC 3.2.1.4 1 cellobiohydrolases (EC 3.2.1.91 )(exoglucanases), or xylanases 
(EC 3.2.1.8 ) [1,2]. Fungi and bacteria produces a spectrum of cellulolytic enzymes 
(cellulases) and xylanases which, on the basis of sequence similarities, can be classified into 
families. One of these families is known as the cellulase family D [3] or as the glycosyl 
hydrolases family 8 [4,E1]. The enzymes which are currently known to belong to this family 
are listed below, - Acetobacter xylinum endonuclease cmcAX. - Bacillus strain KSM-330 
acidic endonuclease K (Endo-K). - Cellulomonas josui endoglucanase 2 (celB). - 
Cellulomonas uda endoglucanase. - Clostridium cellulolyticum endoglucanases C (celcCC). - 
Clostridium thermocellum endoglucanases A (celA). - Erwinia chrysanthemi minor 
endoglucanase y (celY). - Bacillus circulans beta-glucanase (EC 3.2.1.73). - Escherichia coli 
hypothetical protein yhjM. The most conserved region in these enzymes is a stretch of about 
20 residues that contains two conserved aspartate. The first asparatate is thought [5] to act as 
the nucleophile in the catalytic mechanism. This region was used as a signature pattern. 
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Consensus pattern: A-[ST]-D-[AG]-D-x(2)-[IM]-A-x-[SA]-[LIVM]-[LIVMG]-x-A- x(3)- 
[FW] [The first D is an active site residue]- 

[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

[ 2] Gilkes N.R., Henrissat B., Kilburn D.G., Miller R.C. Jr., Warren R.AJ. Microbiol. Rev. 
55:303-315(1991). 

[ 3] Henrissat B., Claeyssens M., Tomme P., Lemesle L., Mornon J.-P. Gene 81:83-95(1989). 
[ 4] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 5] Alzari P.M., Souchon H., Dominguez R. Structure 4:265-275(1996). 
269. Glycosyl hydrolases family 9 active sites signatures 

The microbial degradation of cellulose and xylans requires several types of enzymes such as 
endoglucanases (EC 3.2.1.4 ). cellobiohydrolases (EC 3.2.1.91) (exoglucanases), or xylanases 
(EC 3.2.1.8) [1,2]. Fungi and bacteria produce a spectrum of cellulolytic enzymes (cellulases) 
and xylanases which, on the basis of sequence similarities, can be classified into families. 
One of these families is known as the cellulase family E [3] or as the glycosyl hydrolases 
family 9 [4,E1], The enzymes which are currently known to belong to this family are listed 
below. - Butyrivibrio fibrisolvens cellodextrinase 1 (cedl). - Cellulomonas fimi 
endoglucanases B (cenB) and C (cenC). - Clostridium cellulolyticum endoglucanase G 
(celCCG). - Clostridium cellulovorans endoglucanase C (engC). - Clostridium stercoararium 
endoglucanase Z (avicelase I) (celZ). - Clostridium thermocellum endoglucanases D (celD), 
F (celF) and I (cell). - Fibrobacter succinogenes endoglucanase A (endA). - Pseudomonas 
fluorescens endoglucanase A (eel A). - Streptomyces reticuli endoglucanase 1 (cell). - 
Thermomonospora fusca endoglucanase E-4 (celD). - Dictyostelium discoideum spore 
germination specific endoglucanase 270-6. This slime mold enzyme may digest the spore cell 
wall during germination, to release the enclosed amoeba. - Endoglucanases from plants such 
as Avocado or French bean. In plants this enzyme may be involved the fruit ripening process. 
Two of the most conserved regions in these enzymes are centered on conserved residues 
which have been shown [5,6], in the endoglucanase D from Cellulomonas thermocellum, to 
be important for the catalytic activity. The first region contains an active site histidine and the 
second region contains two catalytically important residues: an aspartate and a glutamate. 
Both regions were used as signature patterns. 
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Consensus pattern: [STV]-x-[LIVMFY]-[STV]-x(2)-G-x-[NKR]-x(4)-[PLIVM]-H-x-R [H is 
an active site residue]- 

Consensus pattern: [FYW]-x-D-x(4)-[FYW]-x(3)-E-x-[STA]-x(3)-N-[STA] [D and E are 
active site residues]- 

[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

[ 2] Gilkes N.R., Henrissat B., Kilburn D.G., Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 
55:303-315(1991). 

[ 3] Henrissat B., Claeyssens M., Tomme P., Lemesle L., Mornon J.-P. Gene 81:83-95(1989). 
[ 4] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 5] Tomme P., Chauvaux S., Beguin P., Millet J., Aubert J.-P., Claeyssens M. J. Biol. Chem. 
266:10313-10318(1991). 

[ 6] Tomme P., van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 

270. Glyceraldehyde 3-phosphate dehydrogenase active site (gpdh) 
Glyceraldehyde 3-phosphate dehydrogenase (EC 1.2.1.12) (GAPDH) [1] is a tetrameric 
NAD-binding enzyme common to both the glycolytic and gluconeogenic pathways. A 
cysteine in the middle of the molecule is involved in forming a covalent phosphoglycerol 
thioester intermediate. The sequence around this cysteine is totally conserved in eubacterial 
and eukaryotic GAPDHs and is also present, albeit in a variant form, in the otherwise highly 
divergent archaebacterial GAPDH [2]. Escherichia coli D-erythrose 4-phosphate 
dehydrogenase (E4PDH) (gene epd orgapB) is an enzyme highly related to GAPDH [3]. 

Consensus pattern: [ASV]-S-C-[NT]-T-x(2)-[LIM] [C is the active site residue]- 

[ 1] Harris J.I., Waters M. (In) The Enzymes (3rd edition) 13:1-50(1976). 

[ 2] Fabry S., Lang J., Niermann T., Vingron M., Hensel R. Eur. J. Biochem. 179:405- 

413(1989). 

[ 3] Zhao G., Pease A.J., Bharani N., Winkler M.E. J. Bacteriol. 177:2804-2812(1995). 



271. Granulins signature 
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Granulins [1] are a family of cysteine-rich peptides of about 6 Kd which may have multiple 
biological activity. A precursor protein (known as acrogranin) potentially encodes seven 
different forms of granulin (grnA to grnG) which are probably released by post-translational 
proteolytic processing. A schematic representation of the structure of a granulin is shown 
below: xxxCxxxxxCxxxxxCCxxxxxxxxCCxxxxxxCCxxxxxCCxxxxxCxxxxxxCx 
**************'£*. conserved cysteine probably involved in a disulfide bond. 1 *': position of 
the pattern. Granulins are evolutionary related to a PMP-D1, a peptide extracted from thepars 
intercerebralis of migratory locusts [2], 

Consensus pattern: C-x-D-x(2)-H-C-C-P-x(4)-C [The four Cs are probably involved in 
disulfide bonds] - 

[ 1] Bhandari V., Palfree R.G., Bateman A. Proa Natl. Acad. Sci. U.S.A. 89:1715- 
1719(1992). 

[ 2] Nakakura N., Hietter H. ? van Dorsselaer A., Luu B. Eur. J. Biochem. 204:147-153(1992). 

272. (HCV RdRp) Hepatitis C virus RNA dependent RNA polymerase 

The RNA dependent RNA polymerase is also known as 

non-structural protein NS5B. NS5B is a 65 kDa protein 

that resembles other viral RNA polymerases. HCV replication 

is thought to occur in membrane bound replication 

complexes. These complexes transcribe the positive 

strand and the resulting minus strand is used as a 

template for the synthesis of genomic RNA. There are 

two viral proteins involved in the reaction, NS3 and NS5B.[1,2] 

[1] Lohmann V, Korner F, Herian U ? Bartenschlager R; 
J Virol 1997;71:8416-8428. [2] Behrens SE, Tomei L, De Francesco R; 
EMBO J 1996;15:12-22. [3] Ishido S, Fujita T, Hotta H; 
Biochem Biophys Res Commun 1998;244:35-40. 



Attorney No. 2750-1237P 

284 

273. (HHH) Helix-hairpin-helix motif. 

[1] Doherty AJ, Serpell LC, Ponting CP; Nucleic Acids Res 1996;24:2488-2497. 

274. HIT family signature 

Recently a family of small proteins of about 12 to 16 Kd has been described[l]. This family 
currently consists of: - Mammalian protein HINT (also known as Protein kinase C inhibitor 1 
or PKCI- 1). HINT was incorrectly thought to be a specific inhibitor of PKC. It has been 
shown to bind zinc. - Fission yeast diadenosine 5',5'"-Pl,P4-tetraphosphate asymmetrical 
hydrolase (Ap4Aase) (EC 3.6.1.17) [2] (gene aphl), which cleaves A-5'-PPPP- 5 A to yield 
AMP and ATP. - FHIT, a human protein whose gene is altered in different tumors and which 
acts [3] as a diadenosine 5',5 m -Pl,P3-triphosphate hydrolase (Ap3Aase) (EC 3-6.1.29) 
cleaving A-5'-PPP-5'A to yield AMP and ADP. - Yeast proteins HNT1 and HNT2. - Maize 
zinc-binding protein ZBP14. - Escherichia coli hypothetical protein ycfF. - Haemophilus 
influenzae hypothetical protein HI0961. - Helicobacter pylori hypothetical protein HP0404. - 
Methanococcus jannaschii hypothetical protein MJ0866. - Mycobacterium leprae 
hypothetical protein U296A. - Synechocystis strain PCC 6803 hypothetical protein slrl234. - 
Caenorhabditis elegans hypothetical protein F21C3.3. - A hypothetical 13.2 Kd protein in 
hisE 3'region in Azospirillum brasilense. - A hypothetical 13.1 Kd protein in p37 5'region in 
Mycoplasma hyorhinis. - A hypothetical 12.4 Kd protein in psbAII 5'region in 
Synechococcus strain PCC 7942.A11 these proteins contains a region with three clustered 
histidines. This region is responsible for the designation of this family: HIT, for 
'HIstidineTriad [1], This region was originally thought to be implied in the binding of a zinc 
ion but was later identified [4] as part of the alpha-phosphate binding site of a nucleotide- 
binding domain. As a signature pattern, the region of the histidine triad was selected. 

Consensus pattern: [NQA]-x(4)-[GAV]-x-[QF]-x-[LIVM]-x-H-[LIVMFYT]-H-[LIVMFT]- 
H-[LIVMF](2)-[PSGA]- 

[ 1] Seraphin B. DNA Seq. 3:177-179(1992). 

[ 2] Huang Y., Garrison P.N., Barnes L.D. Biochem. J. 312:925-932(1995). 

[ 3] Barnes L.D., Garrison P.N., Siprashvili Z., Guranowski A., Robinson A.K., Ingram S.W., 

Croce CM., Ohta M., Huebner K. Biochemistry 35:11529-11535(1996). 
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[ 4] Brenner C. ? Garrison P., Gilmour J. ? Peisach D. ? Ringe D., Petsko GA, Lowenstein J.M. 
Nat. Struct. Biol. 4:231-238(1997). 

275. Myc-type, 'helix-loop-helix' dimerization domain signature (HLH) 
A number of eukaryotic proteins, which probably are sequence specific DNA-binding 
proteins that act as transcription factors, share a conserved domain of 40 to 50 amino acid 
residues. It has been proposed [1] that this domain is formed of two amphipathic helices 
joined by a variable length linker region that could form a loop. This 'helix-loop-helix' (HLH) 
domain mediates protein dimerization and has been found in the proteins listed below 
[2,3,E1,E2]. Most of these proteins have an extra basic region of about 15 amino acid 
residues that is adjacent to the HLH domain and specifically binds to DNA. They are refered 
as basic helix-loop-helix proteins (bHLH), and are classified in two groups: class A 
(ubiquitous) and class B (tissue-specific). Members of the bHLH family bind variations on 
the core sequence 'CANNTG', also referred to as the E-box motif. The homo- or 
heterodimerization mediated by the HLH domain is independent of, but necessary for DNA 
binding, as two basic regions are required for DNA binding activity. The HLH proteins 
lacking the basic domain (Emc, Id) function as negative regulators since they form 
heterodimers, but fail to bind DNA. The hairy-related proteins (hairy, E(spl), deadpan) also 
repress transcription although they can bind DNA. The proteins of this subfamily act together 
with co-repressor proteins, like groucho, through their C-terminal motif WRPW. - The myc 
family of cellular oncogenes [4], which is currently known to contain four members: c-myc 
[E3], N-myc, L-myc, and B-myc. The myc genes are thought to play a role in cellular 
differentiation and proliferation. - Proteins involved in myogenesis (the induction of muscle 
cells). In mammals MyoDl (Myf-3), myogenin (Myf-4), Myf-5, and Myf-6 (Mrf4 or 
herculin), in birds CMD1 (QMF-1), in Xenopus MyoD and MF25, in Caenorhabditis elegans 
CeMyoD, and in Drosophila nautilus (nau). - Vertebrate proteins that bind specific DNA 
sequences ('E boxes') in various immunoglobulin chains enhancers: E2A or ITF-1 (E12/pan-2 
and E47/pan-l), ITF-2 (tcf4), TFE3, and TFEB. - Vertebrate neurogenic differentiation factor 
1 that acts as differentiation factor during neurogenesis. - Vertebrate MAX protein, a 
transcription regulator that forms a sequence- specific DNA-binding protein complex with 
myc or mad. - Vertebrate Max Interacting Protein 1 (MXI1 protein) which acts as a 
transcriptional repressor and may antagonize myc transcriptional activity by competing for 
max. - Proteins of the bHLH/PAS superfamily which are transcriptional activators. In 
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mammals, AH receptor nuclear translocator (ARNT), single-minded homologs (SIM1 and 
SIM2), hypoxia-inducible factor 1 alpha (HIF1A), AH receptor (AHR), neuronal pas domain 
proteins (NPAS1 and NPAS2), endothelial pas domain protein 1 (EPAS1), mouse ARNT2, 
and human BMAL1. In drosophila, single-minded (SIM), AH receptor nuclear translocator 
(ARNT), trachealess protein (TRH), and similar protein (SIMA). - Mammalian transcription 
factors HES, which repress transcription by acting on two types of DNA sequences, the E 
box and the N box. - Mammalian MAD protein (max dimerizer) which acts as transcriptional 
repressor and may antagonize myc transcriptional activity by competing for max. - 
Mammalian Upstream Stimulatory Factor 1 and 2 (USF1 and USF2), which bind to a 
symmetrical DNA sequence that is found in a variety of viral and cellular promoters. - 
Human lyl-1 protein; which is involved, by chromosomal translocation, in T- cell leukemia. - 
Human transcription factor AP-4. - Mouse helix-loop-helix proteins MATH-1 and MATH-2 
which activate E box- dependent transcription in collaboration with E47. - Mammalian stem 
cell protein (SCL) (also known as tall), a protein which may play an important role in 
hemopoietic differentiation. SCL is involved, by chromosomal translocation, in stem-cell 
leukemia. - Mammalian proteins Idl to Id4 [5]. Id (inhibitor of DNA binding) proteins lack a 
basic DNA-binding domain but are able to form heterodimers with other HLH proteins, 
thereby inhibiting binding to DNA. - Drosophila extra-macrochaetae (emc) protein, which 
participates in sensory organ patterning by antagonizing the neurogenic activity of the 
achaete- scute complex. Emc is the homolog of mammalian Id proteins. - Human Sterol 
Regulatory Element Binding Protein 1 (SREBP-1), a transcriptional activator that binds to the 
sterol regulatory element 1 (SRE-1) found in the flanking region of the LDLR gene and in 
other genes. - Drosophila achaete-scute (AS-C) complex proteins T3 (Fsc), T4 (scute), T5 
(achaete) and T8 (asense). The AS-C proteins are involved in the determination of the 
neuronal precursors in the peripheral nervous system and the central nervous system. - 
Mammalian homologs of achaete-scute proteins, the MASH-1 and MASH-2 proteins. - 
Drosophila atonal protein (ato) which is involved in neurogenesis. - Drosophila daughterless 
(da) protein, which is essential for neurogenesis and sex-determination. - Drosophila deadpan 
(dpn), a hairy-like protein involved in the functional differentiation of neurons. - Drosophila 
delilah (dei) protein, which is plays an important role in the differentiation of epidermal cells 
into muscle. - Drosophila hairy (h) protein, a transcriptional repressor which regulates the 
embryonic segmentation and adult bristle patterning. - Drosophila enhancer of split proteins 
E(spl), that are hairy-like proteins active during neurogenesis, also act as transcriptional 
repressors. - Drosophila twist (twi) protein, which is involved in the establishment of germ 
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layers in embryos. - Maize anthocyanin regulatory proteins R-S and LC. - Yeast centromere- 
binding protein 1 (CPF1 or CBF1). This protein is involved in chromosomal segregation. It 
binds to a highly conserved DNA sequence, found in centromers and in several promoters. - 
Yeast IN02 and IN04 proteins. - Yeast phosphate system positive regulatory protein PH04 
which interacts with the upstream activating sequence of several acid phosphatase genes. - 
Yeast serine-rich protein TYE7 that is required for ty-mediated ADH2 expression. - 
Neurospora crassa nuc-1, a protein that activates the transcription of structural genes for 
phosphorus acquisition. - Fission yeast protein escl which is involved in the sexual 
differentiation process. The schematic representation of the helix-loop-helix domain is shown 

here: xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx 

Amphipathic helix 1 Loop Amphipathic helix 2. The signature pattern developed to detect 
this domain spans completely the second amphipathic helix. 

Consensus pattern: [DENSTAP]-[KTR]-[LIVMAGSNT]-{FYWCPHKR}-[LIVMT]- 

[LIVM]-x(2)-[STAV]-[LIVMSTACKR]-x-[VMFYH]-[LIVMTA]-{P}-{P}- 

[LIVMRKHQ].- 

[ 1] Murre C, McCaw P.S., Baltimore D. Cell 56:777-783(1989). 
[ 2] Garrel J., Campuzano S. BioEssays 13:493-498(1991). 
[ 3] Kato G.J., Dang C.V. FASEB J. 6:3065-3072(1992). 

[ 4] Krause M., Fire A., Harrison S.W., Priess J., Weintraub H. Cell 63:907-919(1990). 
[ 5] Riechmann V., van Cruechten I., Sablitzky F. Nucleic Acids Res. 22:749-755(1994). 

276. HMG14 and HMG17 signature 

High mobility group (HMG) proteins are a family of relatively low molecular weight non- 
histone components in chromatin. HMG14 and HMG 17 [1], two related proteins of about 100 
amino acid residues, bind to the inner side of the nucleosomal DNA thus altering the 
interaction between the DNA and the histone octamer. These two proteins may be involved in 
the process which maintains transcribable genes in a unique chromatin conformation. The 
trout nonhistone chromosomal protein H6 (histone T) also belongs to this family. As a 
signature pattern a conserved stretch of 10 residues located in the N-terminal section of 
HMG 14 and HMG17 was selected. 
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Consensus pattern: R-R-S-A-R-L-S-A-[RK]-P- 

[ 1] Bustin M., Reeves R. Prog. Nucleic Acid Res. Mol. Biol. 54:35-100(1996). 

277. Hydroxymethylglutaryl-coenzyme A lyase active site (HMGL1) 
3-hydroxy-3-methylglutaryl-coenzyme A lyase (HMG-CoA lyase or HL) (EC 
4J^t)catalyzes the transformation of HMG-CoA into acetyl-CoA and acetoacetate. In 
vertebrates it is a mitochondrial enyme which is involved in ketogenesis and in leucine 
catabolism [1], In some bacteria, such as Pseudomonas mevalonii, it is involved in 
mevalonate catabolism (gene mvaB). A cysteine has been shown[2], in mvaB, to be required 
for the activity of the enzyme. The region around this residue is perfectly conserved and is 
used as a signature pattern. 

Consensus pattern: S-V-A-G-L-G-G-C-P-Y [C is the active site residue]- 

[ 1] Mitchell G.A., Robert M.-F., Hruz P.W., Wang S., Fontaine G., Behnke C.E., Mende- 
Mueller L.M., Schappert K., Lee C, Gibson K.M., Miziorko H.M. J. Biol. Chem, 268:4376- 
4381(1993). 

[ 2] Hruz P.W., Narasimhan C, Miziorko H.M. Biochemistry 31:6842-6847(1992). 

Alpha-isopropylmalate and homocitrate synthases signatures (HMGL2) 
The following enzymes have been shown [1] to be functionally as well as evolutionary 
related: - Alpha-isopropylmalate synthase (EC 4.1.3.12) which catalyzes the first step in the 
biosynthesis of leucine, the condensation of acetyl-CoA and alpha- ketoisovalerate to form 2- 
isopropylmalate synthase. - Homocitrate synthase (EC 4.1.3.21 ) (gene nifV) which is 
involved in the biosynthesis of the iron-molybdenum cofactor of nitrogenase and catalyzes 
the condensation of acetyl-CoA and alpha-ketoglutarate into homocitrate. - Soybean late 
nodulin 56. - Methanococcus jannaschii hypothetical proteins MJ0503, MJ1195 and MJ1392. 
Two conserved regions were selected as signature patterns for these enzymes. The first region 
is located in the N-terminal section while the second region is located in the central section 
and contains two conserved histidine residues which could be implicated in the catalytic 
mechanism. 
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Consensus pattern: L-R-[DE]-G-x-Q-x(10)-K- 

Consensus pattern: [LIVMFW]-x(2)-H-x-H-[DN]-D-x-G-x-[GAS]-x-[GASLI]- 

[ 1] Wang S.-Z., Dean D.R., Chen J.-S., Johnson J.L. J. Bacteriol. 173:3041-3046(1991). 

278. (HMG COA synt) Hydroxymethylglutaryl-coenzyme A synthase active site 
Hydroxymethylglutaryl-coenzyme A synthase (EC 4.1.3.5 ) (HMG-CoA synthase) catalyzes 
the condensation of acetyl-CoA with acetoacetyl-CoA to produce HMG- CoA and CoA [l].In 
vertebrates there are two isozymes located in different subcellular compartments: a cytosolic 
form which is the starting point of the mevalonate pathway which leads to cholesterol and 
other sterolic and isoprenoid compounds and a mitochondrial form responsible for ketone 
body biosynthesis. HMG- CoA is also found in other eukaryotes such as insect, plants and 
fungi. A cysteine is known to act as the catalytic nucleophile in the first step of the reaction, 
the acetylation of the enzyme by acetyl-CoA. The conserved region was used around this 
active site residue as a signature pattern. 

Consensus pattern: N-x-[DN]-[IV]-E-G-[IV]-D-x(2)-N-A-C-[FY]-x-G [C is the active site 
residue] - 

[ 1] Rokosz L.L., Boulton D.A., Butkiewicz EA., Sanyal G., Cueto M.A., Lachance P.A., 
Hermes J.D. Arch. Biochem. Biophys. 312:1-13(1994). 

279. HMG (high mobility group) box 

280. HSF-type DNA-binding domain signature 

Heat shock factor (HSF) is a DNA-binding protein that specifically binds heat shock 
promoter elements (HSE). HSE is a palindromic element rich with repetitive purine and 
pyrimidine motifs: 5 T -nGAAnnTTCnnGAAnnTTCn-3 HSF is expressed at normal 
temperatures but is activated by heat shock or chemical stressors [1,2]. The sequences of HSF 
from various species show extensive similarity in a region of about 90 amino acids, which 
has been shown [3] to bind DNA. Some other proteins also contain a HSF domain, these are: 
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- Yeast SFL1, a protein involved in cell surface assembly and regulation of the gene related 
to flocculation (asexual cell aggregation) [4]. - Yeast transcription factor SKN7 (or BRY1 or 
POS9), which binds to the promoter elements SCB and MCB essential for the control of Gl 
cyclins expression [5]. - Yeast MGA1. - Yeast hypothetical protein YJR147w. A pattern from 

5 the most conserved part of the HSF DNA-binding domain was derived, its central region. 

Consensus pattern: L-x(3)-[FY]-K-H-x-N-x-[STAN]-S-F-[LIVM]-R-Q-L-[NH]-x-Y-x- 
[FYW]-[RKH]-K-[LIVM]- 

10 [1] Sorger P.K. Cell 65:363-366(1991). 

[ 2] Mager W.H., Moradas Ferreira P. Biochem. J. 290:1-13(1993). 

[ 3] Vuister G.W., Kim S.-J., Orosz A., Marquardt J., Wu C, Bax A. Nat. Struct. Biol. 1:605- 
613(1994). 

[ 4] Fujita A., Kikuchi Y., Kuhara S., Misumi Y., Matsumoto S., Kobayashi H. Gene 85:321- 
15 328(1989). 

[ 5] Morgan B.A., Bouquin N., Merrill G.F., Johnston L.H. EMBO J. 14:5679-5689(1995). 

281. Heat shock hsp20 proteins family profile 

2 0 Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by 
inducing the synthesis of proteins collectively known as heat-shock proteins (hsp) [1]. 
Amongst them is a family of proteins with an average molecular weight of 20 Kd ? known as 
the hsp20 proteins [2 to 5]. These seem to act as chaperones that can protect other proteins 
against heat-induced denaturation and aggregation. Hsp20 proteins seem to form large 

2 5 heterooligomeric aggregates; their family is currently composed of the following members: - 
Vertebrate heat shock protein hsp27 (hsp25) ? induced by a variety of environmental stresses. 

- Drosophila heat shock proteins hsp22, hsp23, hsp26, hsp27, hsp67BA and BC. - 
Caenorhabditis elegans hspl6 multigene family. - Fungal HSP26 (budding yeast) and hsp30 
(Neurospora crassa and Aspergillus Nidulans). - Plant small hsp's. Plants have four classes of 

30 hsp20: classes I and II which are cytoplasmic, class III which is chloroplastic and class IV 
which is found in the endomembrane. - Alpha-cry stallin A and B chains. Alpha-cry stallin is 
an abundant constituent of the eye lens of most vertebrate species. Its main function appears 
to be to maintain the correct refractive index of the lens. It is also found in other tissues 
where it seems to act as a chaperone [6]. - Schistosoma mansoni major egg antigen p40. 
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Structurally, p40 is built of two tandem hsp20 domains. - A variety of prokaryotic proteins: 
ibpA and ibpB from Escherichia coli, hspl8 from Clostridium acetobutylicum, spore protein 
SP21 (hspA) from Stigmatella aurantiaca, Mycobacterium leprae 18 Kd antigen and 
Mycobacterium tuberculosis 14 Kd antigen. - Methanococcus jannaschii hypothetical protein 
5 MJ0285. Structurally, this family is characterized by the presence of a conserved C-terminal 
domain of about 100 residues. The profile developed to detect members of the hsp20 family 
is based on an alignment of this domain. 

-Sequences known to belong to this class detected by the profile: ALL. 
[ 1] Lindquist S., Craig E.A. Annu. Rev. Genet. 22:631-677(1988).[ 2] de Jong W.W., 
1 0 Leunissen J.A.M., Voorter C.E.M. Mol. Biol. EvoL 10:103-126(1993).[ 3] Caspers G.J., 

Leunissen J.A.M., de Jong W.W. J. Mol. Evol. 40:238-248(1995).[ 4] Jaenicke R., Creighton 
T.E. Curr. Biol. 3:234-235(1993).[ 5] Jakob U., Buchner J. Trends Biochem. ScL 19:205- 
211(1994).[ 6] Groenen P.J.T.A., Merck K.B., de Jong W.W., Bloemendal H. Eur. J. 
Biochem. 225:1-9(1994). 



282. Heat shock hsp70 proteins family signatures 

Prokaryotic and eukaryotic organisms respond to heat shock or other environmental 
stress by the induction of the synthesis of proteins collectively known as heat-shock proteins 
2 0 (hsp) [1]. Amongst them is a family of proteins with an average molecular weight of 70 Kd, 
known as the hsp70proteins [2,3,4]. in most species, there are many proteins that belong to 
the hsp70 family. Some of them are expressed under unstressed conditions. Hsp70proteins 
can be found in different cellular compartments (nuclear, cytosolic, mitochondrial, 
endoplasmic reticulum, etc.). Some of the hsp70 family proteinsare listed below: - In 

2 5 Escherichia coli and other bacteria, the main hsp70 protein is known as the dnaK protein. A 

second protein, hscA, has been recently discovered. dnaK is also found in the chloroplast 
genome of red algae. - In yeast, at least ten hsp70 proteins are known to exist: SSA1 to SSA4, 
SSB1, SSB2, SSC1, SSD1 (KAR2), SSE1 (MSB) and SSE2. - In Drosophila, there are at 
least eight different hsp70 proteins: HSP70, HSP68, and HSC-1 to HSC-6. - In mammals, 

3 0 there are at least eight different proteins: HSPA1 to HSPA6, HSC70, and GRP78 (also known 

as the immunoglobulin heavy chain binding protein (BiP)). - In the sugar beet yellow virus 
(SBYV), a hsp70 homolog has been shown [5] to exist. - In archaebacteria, hsp70 proteins 
are also present [6] .All proteins belonging to the hsp70 family bind ATP. A variety of 
functions has been postulated for hsp 70 proteins. It now appears [7] that some hsp70proteins 
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play an important role in the transport of proteins across membranes. They also seem to be 
involved in protein folding and in the assembly/disassembly of protein complexes [8]. Three 
signature patterns for the hsp70 family of proteins were derived; the first centered on a 
conserved pentapeptide found in the N-terminal section of these proteins; the two others on 
5 conserved regions located in the central part of the sequence. 

Consensus pattern: [IV]-D-L-G-T-[ST]-x-[SC] - 

Consensus pattern: [LIVMF]-[LIVMFY]-[DN]-[LIVMFS]-G-[GSH]-[GS]-[AST]-x(3)- [ST]- 
[LIVM]-[LIVMFC]- 

1 0 Consensus pattern: [LIVMY]-x-[LIVMF]-x-G-G-x-[ST]-x-[LIVM]-P-x-[LIVM]-x- 
[DEQKRSTA]- 

[ 1] Lindquist S., Craig E.A. Annu. Rev. Genet. 22:631-677(1988). 
[ 2] Pelham H.R.B. Cell 46:959-961(19861 
15 [3] Pelham H.R.B. Nature 332:776-77(1988).[ 4] Craig E.A. BioEssays 11:48-52(1989). 
[ 5] Agranovsky A.A., Boyko V.P., Karasev A.V., Koonin E.V., Dolja V.V. J. Mol. Biol. 
217:603-610(1991). 

[ 6] Gupta R.S., Singh B. J. Bacteriol. 174:4594-4605(1992). 

[ 7] Deshaies R.J., Koch B.D., Schekmam R. Trends Biochem. Sci. 13:384-388(1988). 
2 0 [8] Craig E.A., Gross C.A. Trends Biochem. Sci. 16:135-140(1991). 



283. Heat shock hsp90 proteins family signature 

Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by 
2 5 the induction of the synthesis of proteins collectively known as heat- shock proteins (hsp) [1]. 
Amongst them is a family of proteins, with an average molecular weight of 90 Kd, known as 
the hsp90proteins. Proteins known to belong to this family are: - Escherichia coli and other 
bacteria heat shock protein c62.5 (gene htpG). - Vertebrate hsp 90-alpha (hsp 86) and hsp 90- 
beta (hsp 84). - Drosophila hsp 82 (hsp 83). - Trypanosoma cruzi hsp 85. - Plants Hsp82 or 
30 Hsp83. - Yeast and other fungi HSC82, and HSP82. - The endoplasmic reticulum protein 
'endoplasmic (also known as Erp99 in mouse, GRP94 in hamster, and hsp 108 in 
chicken) .The exact function of hsp90 proteins is not yet known. In higher eukaryotes, hsp 90 
has been found associated with steroid hormone receptors, with tyrosine kinase oncogene 
products of several retroviruses, with eIF2alpha kinase, and with actin and tubulin. Hsp90 are 
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probable chaperonins that possess ATPase activity [2,3]. As a signature pattern for the hsp90 
family of proteins, a highly conserved region found in the N-terminal part of these proteins 
was selected. 



5 Consensus pattern: Y-x-[NQH]-K-[DE]-[IVA]-F-L-R-[ED] - 

[ 1] Lindquist S., Craig E.A. Annu. Rev. Genet. 22:631-677(1988). 

[ 2] Nadeau K., Das A., Walsh C.T. J. Biol. Chem. 268:1479-1487(1993). 

[ 3] Jakob U., Buchner J. Trends Biochem. Sci. 19:205-211(1994). 



284. Helix-turn-helix (HTH3) 

This large family of DNA binding helix-turn helix proteins includes Cro 
Swiss:P03036 and CI Swiss:P03034 . 

15 



285. Heme oxygenase signature 

Heme oxygenase (EC 1.14.99.3 ) (HO) [1] is the microsomal enzyme that, in animals, carries 
out the oxidation of heme, it cleaves the heme ring at the alpha methene bridge to form 

2 0 biliverdin and carbon monoxide. Biliverdin is subsequently converted to bilirubin by 

biliverdin reductase. In mammals there are three isozymes of heme oxygenase: HO-1 to HO- 
3. The first two isozymes differ in their tissue expression and their inducibility: HO-1 is 
highly inducible by its substrate heme and by various non-heme substances, while HO-2 is 
non-inducible. It has been suggested [2] that HO-2 could be implicated in the production of 

2 5 carbon monoxide in the brain where it is said to act as a neurotransmitter.In the genome of 
the chloroplast of red algae as well as in cyanobacteria, there is a heme oxygenase (gene 
pbsA) that is the key enzyme in the synthesis of the chromophoric part of the photosynthetic 
antennae [3]. An heme oxygenase is also present in the bacteria Corynebacterium diphtheriae 
(gene hmuO), where it is involved in the acquisition of iron from the host heme [4] .There is, 

30 in the central section of these enzymes, a well conserved region centered on a histidine 

residue which is proposed to play a key role in binding the substrate heme at the active center 
of the enzyme. This region was used as a signature pattern. 



Consensus pattern: L-[IV]-A-H-[STACH]-Y-[STV]-[RT]-Y-[LIVM]-G [H binds the heme] - 
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[ 1] Maines M.D. FASEB J. 2:2557-2568(1988). 
[ 2] Barinaga M. Science 259:309-309(1993). 

[ 3] Richaud C. ? Zabulon G. Proc. NatL Acad. ScL U.S.A. 94:11736-11741(1997). 
5 [ 4] Schmitt MP. J. Bacteriol. 179:838-845(1997). 

286. Hepatitis core antigen. 

1 0 The core antigen of hepatitis viruses possesses a carboxyl 
terminus rich in arginine. On this basis it was predicted 
that the core antigen would bind DNA [1]. There is some 
experimental evidence to support this [2]. 

15 [1] Pasek M, Goto T, Gilbert W, Zink B, Schaller H, Mckay P, 
Leadbetter G, Murray K; Nature 1979;282:575-579, [2] 
Gallina A, Bonelli F, Zentilin L, Rindi G, Muttini M, 
Milanesi G; J Virol 1989;63:4645-4652. 

20 

287. Histidine biosynthesis protein 

Proteins involved in steps 4 and 6 of the histidine biosynthesis pathway are contained 
in this family. Histidine is formed by several complex and distinct biochemical reactions 
catalysed by eight enzymes. The enzymes in this Pfam entry are called His6 and His7 in 
2 5 eukaryotes and His A and HisF in prokaryotes. 

[1] Fani R ? Tamburini E, Mori E, Lazcano A, Lio P, Barberio C ? Casalone E, 
Cavalieri D, Perito B, Polsinelli M, Gene 1997;197:9-17. [2] Fani R, Lio P, Chiarelli I, 
Bazzicalupo M, J Mol Evol 1994;38:489-495. 

30 

288. Histone deacetylase family 

Histones can be reversibly acetylated on several lysine residues. Regulation of 
transcription is caused in part by this mechanism. Histone deacetylases catalyse the removal 
of the acetyl group. Histone deacetylases are related to other proteins [1]. 
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Leipe DD, Landsman D, Nucleic Acids Res 1997;25:3693-3697. 

289. Histidinol dehydrogenase signature 
5 Histidinol dehydrogenase (EC 1.1.1.23 ) (HDH) catalyzes the terminal step in the biosynthesis 
of histidine in bacteria, fungi, and plants, the four-electron oxidation of L-histidinol to 
histidine.In bacteria HDH is a single chain polypeptide; in fungi it is the C-terminal domain 
of a multifunctional enzyme which catalyzes three different steps of histidine biosynthesis; 
and in plants it is expressed as nuclear encoded protein precursor which is exported to the 
1 0 chloroplast [l].As a signature pattern a highly conserved region located in the central part of 
HDH was selected. This region does not correspond to the part of the enzyme that, in most, 
but not all HDH sequences contains a cysteine residue which, in Salmonella typhimurium, 
has been said [2] to be important for the catalytic activity of the enzyme. 

1 5 Consensus pattern: I-D-x(2)-A-G-P-[ST]-E-[LIVS]-[LIVMA](3)-[AC]-x(3)-A-x(4)- [LIVM]- 
[AV]-[SACL]-[DE]-[LIVMFC]-[LIVM]-[SA]-x(2)-E-H- 

[ 1] Nagai A., Ward E., Beck J., Tada S., Chang J.-Y., Scheidegger A., Ryals J. Proc. Natl. 
Acad. Sci. U.S.A. 88:4133-4137(1991). 
2 0 [2] Grubmeyer C.T., Gray W.R. Biochemistry 25:4778-4784(1986). 



290. Homoserine dehydrogenase signature 

Homoserine dehydrogenase (EC 1.1.1.3 ) (HDh) [1,2] catalyzes NAD-dependent reduction of 

2 5 aspartate beta-semialdehyde into homoserine. This reaction is the third step in a pathway 

leading from aspartate to homoserine. The latter participates in the biosynthesis of threonine 
and then isoleucine as well as in that of methionine. HDh is found either as a single chain 
protein as in some bacteria and yeast, or as a bifunctional enzyme consisting of an N-terminal 
aspartokinase domain and a C-terminal HDh domain as in bacteria such as Escherichia coli 

3 0 and in plants. As a signature pattern, the best conserved region of Hdh has been selected. This 

is a segment of 23 to 24 residues located in the central section and that contains two 
conserved aspartate residues. 
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Consensus pattern: A-x(3)-G-[LIVMFY]-[STAG]-x(2,3)-[DNS]-P-x(2)-D-[LIVM]-x-G- x- 
D-x(3)-K- 

[ 1] Thomas D. ? Barbey R., Surdin-Kerjan Y. FEES Lett. 323:289-293(1993). 
5 [ 2] Cami B. ? Clepet C, Patte L-C. Biochimie 75:487-495(1993). 

291. haloacid dehalogenase-like hydrolase 

This family is structurally different from the alpha/ beta hydrolase family 
1 0 ( abhydrolase ). This family includes L-2-haloacid dehalogenase, epoxide hydrolases and 
phosphatases. The structure of the family consists of two domains. One is an inserted four 
helix bundle, which is the least well conserved region of the alignment, between residues 16 
and 96 of Swiss:P24069 . The rest of the fold is composed of the core alpha/beta domain. 
[1] Hisano T, Hata Y, Fujii T, Liu JQ, Kurihara T, Esaki N, Soda K, J Biol Chem 1996; 
15 271:20322-20330. 

292, DEAD and DEAH box families ATP-dependent helicases signatures (helicase_C) 

A number of eukaryotic and prokaryotic proteins have been characterized [1,2,3] on the basis 

2 0 of their structural similarity. They all seem to be involved in ATP-dependent, nucleic-acid 

unwinding. Proteins currently known to belong to this family are: - Initiation factor eIF-4A. 
Found in eukaryotes, this protein is a subunit of a high molecular weight complex involved in 
5 T cap recognition and the binding of mRNA to ribosomes. It is an ATP-dependent RNA- 
helicase. - PRP5 and PRP28. These yeast proteins are involved in various ATP -requiring 
25 steps of the pre-mRNA splicing process. - PUO, a mouse protein expressed specifically 

during spermatogenesis. - An3, a Xenopus putative RNA helicase, closely related to P110. - 
SPP81/DED1 and DBP1, two yeast proteins probably involved in pre-mRNA splicing and 
related to PUO. - Caenorhabditis elegans helicase glh-1. - MSS116, a yeast protein required 
for mitochondrial splicing. - SPB4 ? a yeast protein involved in the maturation of 25S 

3 0 ribosomal RNA. - p68, a human nuclear antigen. p68 has ATPase and DNA-helicase 

activities in vitro. It is involved in cell growth and division. - Rm62 (p62), a Drosophila 
putative RNA helicase related to p68. - DBP2, a yeast protein related to p68. - DHH1, a yeast 
protein. - DRS1, a yeast protein involved in ribosome assembly. - MAK5, a yeast protein 
involved in maintenance of dsRNA killer plasmid, - ROK1, a yeast protein. - stel3, a fission 
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yeast protein. - Vasa, a Drosophila protein important for oocyte formation and specification 
of embryonic posterior structures. - Me31B, a Drosophila maternally expressed protein of 
unknown function. - dbpA, an Escherichia coli putative RNA helicase. - deaD, an Escherichia 
coli putative RNA helicase which can suppress a mutation in the rpsB gene for ribosomal 
5 protein S2. - rhlB ? an Escherichia coli putative RNA helicase. - rhlE, an Escherichia coli 
putative RNA helicase. - srmB, an Escherichia coli protein that shows RNA-dependent 
ATPase activity. It probably interacts with 23S ribosomal RNA. - Caenorhabditis elegans 
hypothetical proteins T26G10.1, ZK512.2 and ZK686.2. - Yeast hypothetical protein 
YHR065c. - Yeast hypothetical protein YHR169w. - Fission yeast hypothetical protein 

1 0 SpAC31A2.07c. - Bacillus subtilis hypothetical protein yxiN. All these proteins share a 

number of conserved sequence motifs. Some of them are specific to this family while others 
are shared by other ATP-binding proteins or by proteins belonging to the helicases 
"superfamily' [4.E1]. One of these motifs, called the 'D-E-A-D-box', represents a special 
version of the B motif of ATP-binding proteins. Some other proteins belong to a subfamily 

1 5 which have His instead of the second Asp and are thus said to be 'D-E-A-H-box' proteins 

[3,5,6,E1]. Proteins currently known to belong to this subfamily are: - PRP2, PRP16, PRP22 
and PRP43. These yeast proteins are all involved in various ATP-requiring steps of the pre- 
mRNA splicing process. - Fission yeast prhl ? which my be involved in pre-mRNA splicing, - 
Male-less (mle), a Drosophila protein required in males, for dosage compensation of X 

2 0 chromosome linked genes. - RAD3 from yeast. RAD3 is a DNA helicase involved in excision 
repair of DNA damaged by UV light ? bulky adducts or cross-linking agents. Fission yeast 
radl5 (rhp3) and mammalian DNA excision repair protein XPD (ERCC-2) are the homologs 
of RAD3. - Yeast CHL1 (or CTF1) ? which is important for chromosome transmission and 
normal cell cycle progression in G(2)/M. - Yeast TPS1. - Yeast hypothetical protein 

2 5 YKL078w. - Caenorhabditis elegans hypothetical proteins C06E1.10 and K03HL2. - 

Poxviruses' early transcription factor 70 Kd subunit which acts with RNA polymerase to 
initiate transcription from early gene promoters. - I8 ? a putative vaccinia virus helicase. - 
hrpA, an Escherichia coli putative RNA helicase. Signature patterns were developed for both 
subfamilies. 

30 

Consensus pattern: [LIVMF](2)-D-E-A-D-[RKEN]^x-[LIVMFYGSTN]- 

Consensus pattern: [GSAH]-x-[LIVMF](3)-D-E-[ALIV]-H-[NECR] - 

Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 

'A' (P-loop) (see the relevant entry < PDOC00017 
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[ 1] Schmid S.R., Under P. Mol. Microbiol. 6:283-292(1992). 

[ 2] Under P., Lasko P., Ashburner M., Leroy P., Nielsen P.J., Nishi K., Schnier J., Slonimski 
P.P. Nature 337:121-122(1989). 

[ 3] Wassarman D.A., Steitz J.A. Nature 349:463-464(1991). 

[ 4] Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 

[ 5] Harosh I. ? Deschavanne P. Nucleic Acids Res. 19:6331-6331(1991). 

[ 6] Koonin E.V., Senkevich T.G. J. Gen. Virol. 73:989-993(1992). 

293. Heme-binding domain in cytochrome b5 and oxidoreductases (heme_l) 

Cytochrome b5 is a membrane-bound hemo protein which acts as an electron carrier 
for several membrane-bound oxygenases [1]. There are two homologous forms of b5 ? one 
found in microsomes and one found in the outer membrane of mitochondria. Two conserved 
histidine residues serve as axial ligands for the heme group. The structure of a number of 
oxidoreductases consists of the juxtaposition of a heme-binding domain homologous to that 
of b5 and either a flavodehydrogenase or a molybdopterin domain. These enzymes are: 

- Lactate dehydrogenase (EC 1.1.2.3) [2], an enzyme that consists of a 
flavodehydrogenase domain and a heme-binding domain called cytochrome b2. 
Nitrate reductase (EC 1.6.6.1 V a key enzyme involved in the first step of nitrate 
assimilation in plants, fungi and bacteria [3,4]. Consists of a molybdopterin 
domain (see < PDOC00484 >V a heme-binding domain called cytochrome b557, as 
well as a cytochrome reductase domain. 

- Sulfite oxidase (EC 1.8.3.1 ) [5], which catalyzes the terminal reaction in the 
oxidative degradation of sulfur-containing amino acids. Also consists of a 
molybdopterin domain and a heme-binding domain. 

This family of proteins also includes: 

- TU-36B, a Drosophila muscle protein of unknown function [6]. 

- Fission yeast hypothetical protein SpAClF12.10c. 

- Yeast hypothetical protein YMR073c. 

- Yeast hypothetical protein YMR272c. 

A segment was used which includes the first of the two histidine heme ligands, as a 
signature pattern for the heme-binding domain of cytochrome b5 family. 
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Consensus pattern: [FY]-[LIVMK]-x(2)-H-P-[GA]-G [H is a heme axial ligand]- 

[1] Ozols J. Biochim. Biophys. Acta 997:121-130(1989). 
[2] Guiard B. EMBO J. 4:3265-3272(1985). 

[3] Calza R., Huttner R, Vincentz M., Rouze P., Galangau F., Vaucheret H., Cherel L, Meyer 

C, Kronenberger J., Caboche M. Mol. Gen. Genet. 209:552-562(1987). 

[4] Crawford N.M., Smith M., Bellissimo D., Davis R.W. Proc. Natl. Acad. Sci. U.S.A. 

85:5006-5010(1988). 

[5] Guiard B., Lederer F. Eur. J. Biochem. 100:441-453(1979). 

[6] Levin R J., Boychuk P.L., Croniger CM., Kazzaz J.A., Rozek C.E. Nucleic Acids Res. 
17:6349-6367(1989). 

294. Hexapeptide-repeat containing-transferases signature 

On the basis of sequence similarity, a number of transferases have been proposed [1,2,3,4] to 
belong to a single family. These proteins are: - Serine acetyltransferase (EC 2.3.1.30) (SAT) 
(gene cysE), an enzyme involved in cysteine biosynthesis. - Azotobacter chroococcum 
nitrogen fixation protein nifP. NifP is most probably a SAT involved in the optimization of 
nitrogenase activity. - Escherichia coli thiogalactoside acetyltransferase (EC 2.3.1.18) (gene 
lacA), an enzyme involved in the biosynthesis of lactose. - UDP-N-acetylglucosamine 
acyltransferase (EC 2.3.1.129 ) (gene lpxA), an enzyme involved in the biosynthesis of lipid 
A, a phosphorylated glycolipid that anchors the lipopolysaccharide to the outer membrane of 
the cell. - UDP-3-0-[3-hydroxymyristoyl] glucosamine N-acyltransferase (EC 2.3.1.-) (gene 
lpxD or firA), which is also involved in the biosynthesis of lipid A. - Chloramphenicol 
acetyltransferase (CAT) (EC 2.3.1.28 ) from Agrobacterium tumefaciens, Bacillus sphaericus, 
Escherichia coli plasmid IncFII NR79, Pseudomonas aeruginosa, Staphylococcus aureus 
plasmid pIP630. These CAT are not evolutionary related to the main family of CAT (see 
<PDOC00093>). - Rhizobium nodulation protein nodL. NodL is an acetyltransferase 
involved in the O-acetylation of Nod factors. - Bacterial maltose O-acetyltransferase (EC 
2.3.1.79 ). - Bacterial tetrahydrodipicolinate N-succinyltransferase (EC 2.3.1.117) (gene 
dapD) which catalyzes the fourth step in the biosynthesis of diaminopimelate and lysine from 
aspartate semialdehyde. - Bacterial N-acetylglucosamine-l-phosphate uridyltransferase (EC 
2.7.7.23 ) (gene glmU or gcaD or tms), an enzyme involved in peptidoglycan and 
lipopolysaccharide biosynthesis. - Staphylococcus aureus protein capG which is involved in 
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biosynthesis of type 1 capsular polysaccharide. - Yeast hypothetical protein YJL218w, which 
is highly similar to Escherichia coli lacA. - Fission yeast hypothetical protein 
SpAC18B11.09c. - Methanococcus jannaschii hypothetical protein MJ1064.These proteins 
have been shown [3,4] to contain a repeat structure composed of tandem repeats of a [LIV]- 
G-x(4) hexapeptide which, in the tertiary structure of IpxA [5], has been shown to form a left- 
handed parallel beta helix. Our signature pattern is based on a fourfold repeat of this 
hexapeptide. 

Consensus pattern: [LIV]-[GAED]-x(2)-[STAV]-x-[LIV]-x(3)-[LIVAC]-x-[LIV]- [GAED]- 
x(2)-[STAVR]-x-[LIV]-[GAED]-x(2)-[STAV]-x-[LIV]- x(3)-[LIV]- 

[ 1] Downie J.A. Mol. Microbiol. 3:1649-1651(1989). 

[ 2] Parent R., Roy P.H. J. Bacteriol. 174:2891-2897(1992). 

[ 3] Vaara M. FEMS Microbiol. Lett. 97:249-254(1992). 

[ 4] Vuorio R„ Haerkonen T., Tolvanen M., Vaara M. FEBS Lett. 337:289-292(1994). 
[ 5] Raetz C.R.H., Roderick S.L. Science 270:997-1000(1995). 

295. Hexokinases signature. Hexokinase (EC 2.7.1JL) [1,2] is an important glycolytic enzyme 
that catalyzes the phosphorylation of keto- and aldohexoses (e.g. glucose, mannose and 
fructose) using MgATP as the phosphoryl donor. In vertebrates there are four major 
isoenzymes, commonly referred as types 1,11, III and IV. Type IV hexokinase, which is often 
incorrectly designated glucokinase [3], is only expressed in liver and pancreatic beta-cells 
and plays an important role in modulating insulin secretion; it is a protein of a molecular 
mass of about 50 Kd. Hexokinases of types I to III, which have low Km values for glucose, 
have a molecular mass of about 100 Kd. Structurally they consist of a very small N-terminal 
hydrophobic membrane-binding domain followed by two highly similar domains of 450 
residues. The first domain has lost its catalytic activity and has evolved into a regulatory 
domain. In yeast there are three different isozymes: hexokinase PI (gene HXK1), PII(gene 
HXKB), and glucokinase (gene GLK1). All three proteins have a molecular mass of about 50 
Kd. All these enzymes contain one (or two in the case of types I to III isozymes)strongly 
conserved region which has been shown [4] to be involved in substrate binding. A pattern 
from that region has been derived 
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Consensus pattern: [LIVM]-G-F-[TN]-F-S-[FY]-P-x(5)-[LIVM]-[DNST]-x(3)-[LIVM]- x(2)- 
W-T-K-x-[LF]- 

[ 1] Middleton RJ. Biochem. Soc. Trans. 18:180-183(1990).[ 2] Griffin L.D., Gelb B.D., 
Wheeler D.A., Davison D., Adams V., McCabe E.R. Genomics 11:1014-1024(1991).[ 3] 
Cornish-Bowden A., Luz Cardenas M. Trends Biochem. Sci. 16:281-282(1991).[ 4] Schirch 
D.M., Wilson J.E. Arch. Biochem. Biophys. 254:385-396(1987). 

296. Histone H2A signature (hisl) 

Histone H2A is one of the four histones, along with H2B, H3 and H4, which forms the 
eukaryotic nucleosome core. Using alignments of histone H2Asequences [1,2,E1] as a 
signature pattern, a conserved region in the N-terminal part of H2A. This region is conserved 
both in classical S-phase regulated H2A's and in variant histone H2A's which are synthesized 
throughout the cell cycle. 

Consensus pattern: [AC]-G-L-x-F-P-V- 

[ 1] Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

[ 2] Thatcher T.H., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

Histone H4 signature (his2) 

Histone H4 is one of the four histones, along with H2A, H2B and H3, which forms 
the eukaryotic nucleosome core. Along with H3, it plays a central role in nucleosome 
formation. The sequence of histone H4 has remained almost invariant in more then 2 billion 
years of evolution [1,E1]. The region used as a signature pattern is a pentapeptide found in 
positions 14 to 18 of all H4sequences. It contains a lysine residue which is often acetylated 
[2] and a histidine residue which is implicated in DNA-binding [3]. 

Consensus pattern: G-A-K-R-H- 

[ 1] Thatcher T.H., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[ 2] Doenecke D., Gallwitz D. Mol. Cell. Biochem. 44:113-128(1982). 

[ 3] Ebralidse K.K., Grachev S.A., Mirzabekov A.D. Nature 331:365-367(1988). 
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Histone H3 signatures (his3) 

Histone H3 is one of the four histones, along with H2A ? H2B and H4 ? which forms the 
eukaryotic nucleosome core. It is a highly conserved protein of 135 amino acid residues 
[l,2,El].The following proteins have been found to contain a C-terminal H3-like domain: - 
Mammalian centromeric protein CENP-A [3]. Could act as a core histone necessary for the 
assembly of centromeres. - Yeast chromatin-associated protein CSE4 [4]. - Caenorhabditis 
elegans chromosome III encodes two highly related proteins (F54C8.2 and F58A4.3) whose 
C-terminal section is evolutionary related to the last 100 residues of H3. The function of these 
proteins is not yet known. Two signature patterns were developed, The first one corresponds 
to a perfectly conserved heptapeptide in the N-terminal part of H3. The second one is derived 
from a conserved region in the central section of H3. 

Consensus pattern: K-A-P-R-K-Q-L- 

Consensus pattern: P-F-x-[RA]-L-[VA]-[KRQ]-[DEG]-[IV]- 

[ 1] Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

[ 2] Thatcher T.H., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[ 3] Sullivan K.F, Hechenberger M., Masri K. J. Cell Biol. 127:581-592(1994). 

[ 4] Stoler S., Keith K.C., Curnick K.E, Fitzgerald-Hayes M. Genes Dev. 9:573-586(1995). 

Histone H2B signature (his4) 

Histone H2B is one of the four histones, along with H2A, H3 and H4, which forms 
the eukaryotic nucleosome core. Using alignments of histone H2Bsequences [1,2,E1], a 
conserved region was selected in the C-terminal part ofH2B. 

Consensus pattern: [KR]-E-[LIVM]-[EQ]-T-x(2)-[KR]-x-[LIVM](2)-x-[PAG]-[DE]-L- x- 
[KR]-H-A-[LIVM]-[STA]-E-G- 

[ 1] Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

[ 2] Thatcher T.H., Gorovsky MA. Nucleic Acids Res. 22:174-179(1994). 



297. 'Homeobox' domain signature and profile (homel) 
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The 'homeobox 1 is a protein domain of 60 amino acids [1 to 5,E1] first identified in a number 
of Drosophila homeotic and segmentation proteins. It has since been found to be extremely 
well conserved in many other animals, including vertebrates. This domain binds DNA 
through a helix-turn-helix type of structure. Some of the proteins which contain a homeobox 
domain play an important role in development. Most of these proteins are known to be 
sequence specific DNA-binding transcription factors. The homeobox domain has also been 
found to be very similar to a region of the yeast mating type proteins. These are sequence- 
specific DNA-binding proteins that act as master switches in yeast differentiation by 
controlling gene expression in a cell type-specific fashion. A schematic representation of the 
homeobox domain is shown below. The helix-turn-helix region is shown by the symbols TT 
(for helix), and V (for turn). 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxHHHHHHHHtttHHHHHHHHHxxxxxxxxxx 1 1 1 1 M I 1 
10 20 30 40 50 60 The pattern to detect homeobox sequences that was developed is 24 
residues long and spans positions 34 to 57 of the homeobox domain. 

Consensus pattern: [LIVMFYG]-[ASLVR]-x(2)-[LIVMSTACN]-x-[LIVM]-x(4)-[LIV]- 
[RKNQESTAIY]-[LIVFSTNKH]-W-[FYVC]-x-[NDQTAH]-x(5)- [RKNAIMW] - 

[ 1] Gehring WJ. (In) Guidebook to the homebox genes, Duboule D., Ed., ppl-10, Oxford 
University Press, Oxford, (1994). 

[ 2] Buerglin T.R. (In) Guidebook to the homebox genes, Duboule D., Ed. ? pp25-72, Oxford 
University Press, Oxford, (1994). 

[ 3] Gehring WJ. Trends Biochem. ScL 17:277-280(1992). 

[ 4] Gehring WJ,, Hiromi Y. Annu. Rev. Genet. 20:147-173(1986). 

[ 5] Schofield P.N. Trends NeuroscL 10:3-6(1987). 

Homeobox 1 antennapedia-type protein signature (home2) 

The homeotic Hox proteins are sequence-specific transcription factors. They are part of a 
developmental regulatory system that provides cells with specific positional identities on the 
anterior-posterior (A-P) axis [1]. The hox proteins contain a 'homeobox' domain. In 
Drosophila and other insects, there are eight different Hox genes that are encoded in two gene 
complexes, ANT-C and BX-C. In vertebrates there are 38 genes organized in four complexes. 
In six of the eight Drosophila Hox genes the homeobox domain is highly similar and a 
conserved hexapeptide is found five to sixteen amino acids upstream of the homeobox 
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domain. The six Drosophila proteins that belong to this group are antennapedia (Antp), 
abdominal-A (abd-A), deformed (Dfd), proboscipedia (pb),sex combs reduced (scr) and 
ultrabithorax (ubx) and are collectively known as the 'antennapedia' subfamily. In vertebrates 
the corresponding Hox genes are known [2] as Hox-A2, A3, A4,A5, A6, A7, Hox-Bl, B2, 
B3, B4, B5, B6, B7, B8, Hox-C4, C5 ? C6, C8 ? Hox-Dl,D3, D4 and D8.Caenorhabditis 
elegans lin-39 and mab-5 are also members of the 'antennapedia' subfamily. As a signature 
pattern for this subfamily of homeobox proteins, the conserved hexapeptide was used. 

Consensus pattern: [LIVMFE]-[FY]-P-W-M-[KRQTA]- 

[ 1] McGinnis W., Krumlauf R. Cell 68:283-302(1 992V 
[ 2] Scott M.P. Cell 71:551-553(1992). 

'Homeobox 1 engrailed-type protein signature (home3) 

Most proteins which contain a 'homeobox* domain can be classified [1,2], on the basis 
of their sequence characteristics, in three subfamilies: engrailed, antennapedia and paired. 
Proteins currently known to belong to the engrailed subfamily are: - Drosophila segmentation 
polarity protein engrailed (en) which specifies the body segmentation pattern and is required 
for the development of the central nervous system. - Drosophila invected protein (inv). - Silk 
moth proteins engrailed and invected, which may be involved in the compartmentalization of 
the silk gland. - Honeybee E30 and E60. - Grasshopper (Schistocerca americana) G-En. - 
Mammalian and birds En-1 and En-2. - Zebrafish Eng-1, -2 and -3. - Sea urchin (Tripneusteas 
gratilla) SU-HB-en. - Leech (Helobdella triserialis) Ht-En. - Caenorhabditis elegans ceh- 
16.Engrailed homeobox proteins are characterized by the presence of a conserved region of 
some 20 amino-acid residues located at the C-terminal of the 'homeobox' domain. As a 
signature pattern for this subfamily of proteins, a stretch of eight perfectly conserved residues 
in this region was used. 

Consensus pattern: L-M-A-[EQ]-G-L-Y-N- 

[ 1] Scott M.P., Tamkun J.W., Hartzell G.W. Ill Biochim. Biophys. Acta 989:25-48(1989). 
[ 2] Gehring WJ. Science 236:1245-1252(1987). 
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298. Isocitrate lyase signature (ICL) 

Isocitrate lyase (EC 4.1.3,1 ) [1,2] is an enzyme that catalyzes the conversion of isocitrate to 
succinate and glyoxylate. This is the first step in the glyoxylate bypass, an alternative to the 
tricarboxylic acid cycle in bacteria, fungi and plants. A cysteine, a histidine and a glutamate 
or aspartate have been found to be important for the enzyme's catalytic activity. Only one 
cysteine residue is conserved between the sequences of the fungal, plant and bacterial 
enzymes; it is located in the middle of a conserved hexapeptide that can be used as a 
signature pattern for this type of enzyme. 

Consensus pattern: K-[KR]-C-G-H-[LMQ] [C is a putative active site residue]- 
[ 1] Beeching J.R. Protein Seq. Data AnaL 2:463-466(1989). 

[ 2] Atomi H., Ueda M., Hikida M., Hishida T., Teranishi Y., Tanaka A. J. Biochem. 
107:262-266(1990). 

299. Initiation factor 2 subunit 

This family includes initiation factor 2B alpha, beta and delta subunits from 
eukaryotes, related proteins from archaebacteria and IF-2 from prokaryotes. Initiation factor 
2 binds to Met-tRNA, OTP and the small ribosomal subunit. 

[1] Kyrpides NC, Woese CR, Proc Natl Acad Sci U S A 1998;95:3726-3730. 

300. Initiation factor 3 signature 

Initiation factor 3 (IF-3) (gene infC) [1] is one of the three factors required for the initiation 
of protein biosynthesis in bacteria. IF-3 is thought to function as a fidelity factor during the 
assembly of the ternary initiation complex which consist of the 30S ribosomal subunit, the 
initiator tRNA and the messenger RNA. IF-3 binds to the 30S ribosomal subunit; it is a basic 
protein of 141 to 212 residues. The chloroplast initiation factor IF-3(chl) is a protein that 
enhances the poly(A,U,G)-dependent binding of the initiator tRNA to chloroplast 
ribosomaBOs subunits. In its mature form it is a protein of about 400 residues whose central 
section is evolutionary related to the sequence of bacterial IF-3 [2] As a signature pattern a 
highly conserved region was selected located in the central section of bacterial IF-3 and of 
IF-3(chl). 
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Consensus pattern: [KR]-[LIVM](2)-[DN]-[FY]-[GSN]-[KR]-[LIVMFYS]-x-[FY]- 
[DEQTH]-x(2)-[KRQ]- 

[ 1] Liveris D., Schwartz J.J., Geertman R., Schwartz I. FEMS Microbiol. Lett. 112:211- 
216(1993). 

[ 2] Lin Q., Ma L., Burkhart W., Spremulli L.L. J. Biol. Chem. 269:9436-9444(1994). 

301. Imidazoleglycerol-phosphate dehydratase signatures (IGPD) 
Imidazoleglycerol-phosphate dehydratase (EC 4.2.1.19^ is the enzyme that catalyzes the 
seventh step in the biosynthesis of histidine in bacteria, fungi and plants. In most organisms it 
is a monofunctional protein of about 22 to29 Kd. In some bacteria such as Escherichia coli it 
is the C-terminal domain of a bifunctional protein that include a histidinol-phosphatase 
domain [1]. Two signature patterns were developed that each include two consecutive 
histidine residues. 

Consensus pattern: [LIVMY]-[DE]-x-H-H-x(2)-E-x(2)-[GCA]-[LIVM]-[STAC]-[LIVM]- 
Consensus pattern: G-x-[DN]-x-H-H-x(2)-E-[STAGC]-x-[FY]-K - 

[ 1] Carlomagno M.S., Chiariotti L., Alifano P., Nappo A.G., Bruni C.B. J. Mol. Biol. 
203:585-606(1988). 

302. Indole-3-glycerol phosphate synthase signature ( IGPS) 

Indole-3-glycerol phosphate synthase (EC 4.1.1.48^ (IGPS) catalyzes the fourth step in the 
biosynthesis of tryptophan: the ring closure of l-(2-carboxy-phenylamino)-l-deoxyribulose 
into indol-3-glycerol-phosphate.In some bacteria, IGPS is a single chain enzyme. In others - 
such as Escherichia coli - it is the N-terminal domain of a bifunctional enzyme that also 
catalyzes N-(5'-phosphoribosyl)anthranilate isomerase (PRAI) activity, the third step of 
tryptophan biosynthesis. In fungi, IGPS is the central domain of a trifunctional enzyme that 
also contains a PRAI C-terminal domain and a glutamine amidotransferase N-terminal 
domain. The N-terminal section of IGPS contains a highly conserved region which X-ray 
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crystallography studies [1] have shown to be part of the active site cavity. This region was 
used as a signature pattern for IGPS. 

Consensus pattern: [LIVMFY]-[LIVMC]-x-E-[LIVMFYC]-K-[KRSP]-[STAK]-S-P-[ST]- 
x(3)-[LIVMFYST]- 

[ 1] Wilmanns M., Priestle J.P., Niermann T., Jansonius J.N. J. Mol. Biol. 223:477- 
507(1992). 

303. (IL2) Interleukin 2. 31 members 

304. (ILVD EDD) Dihydroxy-acid and 6-phosphogluconate dehydratases. Two dehydratases 
have been shown [1] to be evolutionary related: - Dihydroxy-acid dehydratase (EC 4,2,1,9 .) 
(gene ilvD or ILV3) which catalyzes the fourth step in the biosynthesis of isoleucine and 
valine, the dehydratation of 2,3-dihydroxy-isovaleic acid into alpha-ketoisovaleric acid. - 6- 
phosphogluconate dehydratase (EC 4.2.1.12) (gene edd) which catalyzes the first step in the 
Entner-Doudoroff pathway, the dehydratation of 6-phospho- D-gluconate into 6-phospho-2- 
dehydro-3-deoxy-D-gluconate. - Escherichia coli hypothetical protein yjhG. Both enzymes 
are proteins of about 600 amino acid residues. Two highly conserved regions have been 
developed as signature patterns. The first pattern is located in the N-terminal part and 
contains a cysteine that could be involved in the binding of a 2Fe-2S iron-sulfur cluster [2], 
The second pattern is located in the C-terminal half. 

Consensus pattern: C-D-K-x(2)-P-[GA]-x(3)-[GA] [The C could be a 2Fe-2S ligand] 
Consensus pattern: [SA]-L-[LIVM]-T-D-[GA]-R-[LIVMF]-S-[GA]-[GAV]-[ST]- 

[ 1] Egan S.E., Fliege R., Tong S., Shibata A., Wolf R.E. Jr., Conway T. J. Bacteriol. 
174:4638-4646(1992).[ 2] Velasco J.A., Cansado J., Pena M.C., Kawakami T., Laborda J., 
Notario V. Gene 137:179-185(1993). 



305. IMP dehydrogenase / GMP reductase signature 
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IMP dehydrogenase (EC 1.1.1.205 ) (IMPDH) catalyzes the rate-limiting reaction of de novo 
GTP biosynthesis, the NAD-dependent reduction of IMP into XMP [1]. Inhibition of IMP 
dehydrogenase activity results in the cessation of DNA synthesis. As IMP dehydrogenase is 
associated with cell proliferation, it is a possible target for cancer chemotherapy. Mammalian 
and bacterial IMPDHs are tetramers of identical chains. There are two IMP dehydrogenase 
isozymes in humans [2].GMP reductase (EC 1 .6.6.8) catalyzes the irreversible and NADPH- 
dependent reductive deamination of GMP into IMP [3]. It converts nucleobase, nucleoside 
and nucleotide derivatives of G to A nucleotides, and maintains intracellular balance of A and 
G nucleotides. IMP dehydrogenase and GMP reductase share many regions of sequence 
similarity. One of these regions is centered on a cysteine residue thought [3] to be involved in 
binding IMP. This region was used as a signature pattern. 

Consensus pattern: [LIVM]-[RK]-[LIVM]-G-[LIVM]-G-x-G-S-[LIVM]-C-x-T [C is the 
putative IMP -binding residue]- 

[ 1] Collart F.R., Huberman E. J. Biol. Chem. 263:15769-15772(1988). 

[ 2] Natsumeda Y., Ohno S., Kawasaki H., Konno Y., Weber G., Suzuki K. J. Biol. Chem. 

265:5292-5295(1990). 

[ 3] Andrews S.C., Guest J.R. Biochem. J. 255:35-43(1988). 

306. (IPPc) Inositol polyphosphate phosphatase family, catalytic domain 
[1] York JD, Ponder JW, Chen ZW, Mathews FS, Majerus PW; 

Biochemistry 1994;33:13164-13171. [2] Jefferson AB, Auethavekiat V, Pot DA, Williams 
LT, Majerus PW; J Biol Chem 1997;272:5983-5988. [3] Zhang X, Jefferson AB, 
Auethavekiat V, Majerus PW; Proc Natl Acad Sci U S A 1995;92:4853-4856. [4] York JD, 
Majerus PW. Proc Natl Acad Sci U S A 1990;87:9548-9552. [5] Neuwald AF, York JD, 
Majerus PW; 

FEBS Lett 1991;294:16-18. 



307. IQ calmodulin-binding motif 
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[1] Xie X, Harrison DH, Schlichting I, Sweet RM, Kalabokis VN, 
Szent-Gyorgyi AG, Cohen C; Nature 1994;368:306-312. 
[2] Rhoads AR, Friedberg F; FASEB J 1997;11:331-340. 

308. Inosine-uridine preferring nucleoside hydrolasefamily signature (IU nuc hydro) 
Inosine-uridine preferring nucleoside hydrolase (EC 3.2.2.1) (IU-nucleosidehydrolase or 
IUNH) is an enzyme first identified in protozoan [1] that catalyzes the hydrolysis of all of the 
commonly occuring purine and pyrimidine nucleosides into ribose and the associated base, 
but has a preference for inosine and uridine as substrates. This enzyme is important for these 
parasitic organisms, which are deficient in de novo synthsis of purines, to salvage the host 
purine nucleosides. IUNH from Crithidia fasciculata has been sequenced and characterized, it 
is an homotetrameric enzyme of subunits of 34 Kd. An histidine has been shown to be 
important for the catalytic mechanism, it acts a proton donor to activate the hypoxanthine 
leaving group. IUNH is evolutionary related to a number of uncharacterized proteins from 
various biological sources, notably: - Escherichia coli hypothetical protein yaaF. - 
Escherichia coli hypothetical protein ybeK. - Escherichia coli hypothetical protein yeiK. - 
Fission yeast hypothetical protein SpAC17G8.02. - Yeast hypothetical protein YDR400w. - 
An hypothetical protein from the archaebacteria Desulfurolobus ambivalens. As a signature 
pattern for these proteins, a highly conserved region was selected located in the N-terminal 
extremity. This region contains four conserved aspartates that have been shown [2] to be 
located in the active site cavity. 

Consensus pattern: D-x-D-[PT]-[GA]-x-D-D-[TAV]-[VI]-A - 

[ 1] Gopaul D.N., Meyer S.L., Degano NL, Sacchettini J.C., Schramm V.L. Biochemistry 
35:5963-5970(1996). 

[ 2] Degano M., Gopaul D.N., Scapin G. ? Schramm V.L., Sacchettini J.C. Biochemistry 
35:5971-5981(1996). 

309. (Insulinase) 

Insulinase family, zinc-binding region signature 
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(aka Peptidase_M16) 

A number of proteases dependent on divalent cations for their activity have been shown [1,2] 
to belong to one family, on the basis of sequence similarity. These enzymes are listed below. 

- Insulinase (EC 3.4.24,56) (also known as insulysin or insulin-degrading enzyme or IDE), a 
cytoplasmic enzyme which seems to be involved in the cellular processing of insulin, 
glucagon and other small polypeptides. 

- Escherichia coli protease III (EC 3.4.24.55) (pitrilysin) (gene ptr), a periplasmic enzyme 
that degrades small peptides. 

- Mitochondrial processing peptidase (EC 3.4.24.64) (MPP). This enzyme removes the 
transit peptide from the precursor form of proteins imported from the cytoplasm across the 
mitochondrial inner membrane. It is composed of two nonidentical homologous subunits 
termed alpha and beta. The beta subunit seems to be catalytically active while the alpha 
subunit has probably lost its activity. 

- Nardilysin (EC 3.4.24.61) (N-arginine dibasic convertase or NRD convertase) this 
mammalian enzyme cleaves peptide substrates on the N-terminus of Arg residues in dibasic 
stretches. 

- Klebsiella pneumoniae protein pqqF. This protein is required for the biosynthesis of the 
coenzyme pyrrolo-quinoline-quinone (PQQ). It is thought to be protease that cleaves peptide 
bonds in a small peptide (gene pqqA) thus providing the glutamate and tyrosine residues 
necessary for the synthesis of PQQ. 

- Yeast protein AXL1, which is involved in axial budding [3]. 

- Eimeria bovis sporozoite developmental protein. 

- Escherichia coli hypothetical protein yddC and HI1368, the corresponding Haemophilus 
influenzae protein. 

- Bacillus subtilis hypothetical protein ymxG. 

- Caenorhabditis elegans hypothetical proteins C28F5.4 and F56D2.1. 

It should be noted that in addition to the above enzymes, this family also includes the core 
proteins I and II of the mitochondrial bcl complex (also called cytochrome c reductase or 
complex III), but the situation as to the activity or lack of activity of these subunits is quite 
complex: 
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- In mammals and yeast, core proteins I and II lack enzymatic activity. 

- In Neurospora crassa and in potato core protein I is equivalent to the beta subunit of MPP. 

- In Euglena gracilis, core protein I seems to be active, while subunit II is inactive. 

These proteins do not share many regions of sequence similarity; the most noticeable is in the 
N-terminal section. This region includes a conserved histidine followed, two residues later by 
a glutamate and another histidine. In pitrilysin, it has been shown [4] that this H-x-x-E-H 
motif is involved in enzyme activity; the two histidines bind zinc and the glutamate is 
necessary for catalytic activity. Non active members of this family have lost from one to three 
of these active site residues. We developed a signature pattern that detect active members of 
this family as well as some inactive members. 

Consensus pattern G-x(8 ? 9)-G-x^STA]-H-[LIVMFY]-[LIVMC]-[DERN]-[HRKL]- 
[LMFAT]-x-[LFSTH]-x-[GSTAN]-[GST] [The two H are zinc ligands] [E is the active site 
residue] Sequences known to belong to this class detected by the pattern ALL active 
members as well as all MPP alpha subunits and core II subunits. Does not detect inactive core 
I subunits. 

Note: these proteins belong to family M16 in the classification of peptidases [5]. 

[ 1] Rawlings N.D., Barrett A.J. Biochem. J. 275:389-391(1991). 

[ 2] Braun H.-P., Schmitz U.K. Trends Biochem. Sci. 20:171-175(1995). 

[ 3] Becker A.B, Roth R.A. Proc. Natl. Acad. Sci. U.S.A. 89:3835-3839(1992). 

[ 4] Fujita A., Oka C, Arikawa Y., Katagai T., Tonouchi A., Kuhara S., Misumi Y. Nature 

372:567-570(1994). 

[ 5] Rawlings N.D., Barrett A. J. Meth. Enzymol. 248:183-228(1995). 

310. Involucrin repeat 

Eckert RL, Yaffe MB, Crish JF, Murthy S, Rorke EA ? Welter JF, J Invest Dermatol 
1993;100:613-617. 
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311. Isochorismatase family. This family are hydrolase enzymes. 

Romao MJ, Turk D, Gomis-Ruth FX, Huber R, Schumacher G, Mollering H, Russmann L, J 
Mol Biol 1992;226:1111-1130. 

312. Inositol monophosphatase family signatures (inositol_P) 

It has been shown [1] that several proteins share two sequence motifs. Two of these proteins 
are enzymes of the inositol phosphate second messenger signaling pathway: - Vertebrate and 
plants inositol monophosphatase (EC 3.1 .3.25V - Vertebrate inositol polyphosphate 1- 
phosphatase (EC 3.1.3.57) .The function of the other proteins is not yet clear: - Bacterial 
protein cysQ. CysQ could help to control the pool of PAPS (3'-phosphoadenoside 5'- 
phosphosulfate), or be useful in sulfite synthesis. - Escherichia coli protein suhB. Mutations 
in suhB results in the enhanced synthesis of heat shock sigma factor (htpR). - Neurospora 
crassa protein Qa-X. Probably involved in quinate metabolism. - Emericella nidulans protein 
qutG. Probably involved in quinate metabolism. - Yeast protein HAL2/MET22 [2] involved 
in salt tolerance as well as methionine biosynthesis. - Yeast hypothetical hypothetical protein 
YHR046c. - Caenorhabditis elegans hypothetical protein F13G3.5. - A Rhizobium 
leguminosarum hypothetical protein encoded upstream of the pss gene for exopolysaccharide 
synthesis. - Methanococcus jannaschii hypothetical protein MJ0109.It is suggested [1] that 
these proteins may act by enhancing the synthesis or degradation of phosphorylated 
messenger molecules. From the X-ray structure of human inositol monophosphatase [3], it 
seems that some of the conserved residues are involved in binding a metal ion and/or the 
phosphate group of the substrate. 

Consensus pattern: [FWV]-x(0,l)-[LIVM]-D-P-[LIVM]-D-[SG]-[ST]-x(2)-[FY]-x- 
[HKRNSTY] [The first D and the T bind a metal ion]- 

Consensus pattern: [WV]-D-x-[AC]-[GSA]-[GSAPV]-x-[LIVACP]-[LIV]-[LIVAC]-x(3)- 
[GH]-[GA]- 

[ 1] Neuwald A.F., York J.D., Majerus P.W. FEBS Lett. 294:16-18(1991). 

[ 2] Glaeser H.-U., Thomas D., Gaxiola R., Montrichard F., Surdin-Kerjan Y., Serrano R. 

EMBO J. 12:3105-3110(1993). 
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[ 3] Bone R., Springer LP., Atack J.R. Proc. Natl. Acad. Sci. U.S.A. 89:10031-10035(1992). 

313. Ion transport protein 

This family contains Sodium, Potassium, Calcium ion channel This family is 6 
transmembrane helices in which the last two helices flank a loop which determines ion 
selectivity. In some sub-families (e.g. Na channels) the domain is repeated four times, 
whereas in others (e.g. K channels) the protein forms as a tetramer in the membrane. A 
bacterial structure of the protein is known for the last two helices but is not the Pfam family 
due to it lacking the first four helices 

314. Isocitrate and isopropylmalate dehydrogenases signature (isodh) 

Isocitrate dehydrogenase (IDH) [1,2] is an important enzyme of carbohydrate metabolism 
which catalyzes the oxidative decarboxylation of isocitrate into alpha-ketoglutarate. IDH is 
either dependent on NAD+ (EC 1.1.1.41 ) or on NADP+(EC 1.1.1.42 ). In eukaryotes there are 
at least three isozymes of IDH: two are located in the mitochondrial matrix (one NAD+- 
dependent, the other NADP-f -dependent), while the third one (also NADP+-dependent) is 
cytoplasmic. In Escherichia coli the activity of a NADP+-dependent form of the enzyme is 
controlled by the phosphorylation of a serine residue; the phosphorylated form of IDH is 
completely inactivated. 3-isopropylmalate dehydrogenase (EC 1.11.85) (IMDH) [3,4] 
catalyzes the third step in the biosynthesis of leucine in bacteria and fungi, the oxidative 
decarboxylation of 3-isopropylmalate into 2-oxo-4-methylvalerate. Tartrate dehydrogenase 
(EC 1.11.93 ) [5] catalyzes the reduction of tartrate to oxaloglycolate. These enzymes are 
evolutionary related [1,3,4,5]. The best conserved region of these enzymes is a glycine-rich 
stretch of residues located in the C-terminal section. This region was used as a signature 
pattern. 

Consensus pattern: [NS]-[LIMYT]-[FYDN]-G-[DNT]-[IMVY]-x-[STGDN]-[DN]-x(2)- 
[SGAP]-x(3,4)-G-[STG]-[LIVMPA]-G-[LIVMF]- 

[ 1] Hurley J.H., Thorsness P.E., Ramalingam V., Helmers N.H., Koshland D.E. Jr., Stroud 

R.1VL Proc. NatL Acad. Sci. U.S.A. 86:8635-8639(1989). 

[ 2] Cupp J.R., McAlister-Henn L. J. Biol. Chem. 266:22199-22205(1991). 
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[ 3] Imada K., Sato M., Tanaka N., Katsube Y., Matsuura Y., Oshima T. J. Mol. Biol. 
222:725-738(1991). 

[ 4] Zhang T., Koshland D.E. Jr. Protein Sci. 4:84-92(1995). 

[ 5] Tipton P.A., Beecher B.S. Arch. Biochem. Biophys. 313:15-21(1994). 

315. Jacalin-like lectin domain. 

Proteins containing this domain are lectins. It is found in 

1 to 6 copies in these proteins. The domain is also found in the animal prostatic spermine- 
binding protein ( Swiss:P15501 ). 

[1] Sankaranarayanan R, Sekar K, Banerjee R, Sharma V, Surolia 
A, Vijayan M; Nat Struct Biol 1996;3:596-603. 

316. KH domain 

KH motifs probably bind RNA directly. Auto antibodies to Nova, a KH domain 
protein, cause paraneoplastic opsoclonus ataxia. 
[1] Burd CG, Dreyfuss G, Science 1994;265:615-621. 

[2] Musco G, Stier G, Joseph C, Castiglione Morelli MA, Nilges M, Gibson TJ, Pastore A, 
Cell 1996;85:237-245. 

317. Kelch motif 

The kelch motif was initially discovered in Kelch (Swiss:O04652l In this protein 
there are six copies of the motif. It has been shown that Swiss:O04652 is related to Galactose 
Oxidase [1] for which a structure has been solved [2]. The kelch motif forms a beta sheet. 
Several of these sheets associate to form a beta propeller structure as found in ngur, 

[1] Bork P, Doolittle RF, J Mol Biol 1994;236:1277-1282. [2] Ito N, Phillips SE, 
Stevens C, Ogel ZB, McPherson MJ, Keen, JN, Yadav KD, Knowles PF, Nature 
1991;350:87-90. 
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318. Soybean trypsin inhibitor (Kunitz) protease inhibitors family signature 

The soybean trypsin inhibitor (Kunitz) family [1] is one of the numerous families of 
proteinase inhibitors. It comprise plant proteins which have inhibitory activity against serine 
proteinases from the trypsin and subtilisin families, thiol proteinases and aspartic proteinases 
as well as some proteins that are probably involved in seed storage. This family is currently 
known to group the following proteins: - Trypsin inhibitors A, B ? C, KTI1, and KTI2 from 
soybean. - Trypsin inhibitor DE3 from coral beans (Erythrina sp.). - Trypsin inhibitor DE5 
from sandal bead tree. - Trypsin inhibitors 1A (WTI-1A), IB (WTI-1B), and 2 (WTI-2) from 
goa bean. - Trypsin inhibitor from Acacia confusa. - Trypsin inhibitor from silk tree. - 
Chymotrypsin inhibitor 3 (WCI-3) from goa bean. - Cathepsin D inhibitors PDI and NDI 
from potato [2], which inhibit both cathepsin D (aspartic proteinase) and trypsin. - Alpha- 
amylase/subtilisin inhibitors from barley and wheat. - Albumin-1 (WBA-1) from goa bean 
seeds [3]. - Miraculin from Richadella dulcifica [4], a sweet taste protein. - Sporamin from 
sweet potato [5], the major tuberous root protein. - Thiol proteinase inhibitor PCPI 8.3 (P340) 
from potato tuber [6]. - Wound responsive protein gwin3 from poplar tree [7]. - 21 Kd seed 
protein from cocoa [8] .All these proteins contain from 170 to 200 amino acid residues and 
one or twointrachain disulfide bonds. The best conserved region is found in their N-terminal 
section and is used as a signature pattern 

Consensus pattern: [LIVM]-x-D-x-[EDNTY]-[DG]-[RKHDENQ]-x-[LIVM]-x(5)-Y-x» 
[LIVM] - 

[ 1] Laskowski M. ? Kato I. Annu. Rev. Biochem. 49:593-626(1980). 

[ 2] Ritonja A., Krizaj I., Mesko P., Kopitar M., Lucovnik P., Strukelj B., Pungercar J. ? Buttle 

DJ, Barrett A.J., Turk V. FEBS Lett. 267:13-15(1990). 

[ 3] Kortt AA, Strike P.M., de Jersey J. Eur. J. Biochem. 181:403-408(1989). 

[ 4] Theerasilp S. ? Hitotsuya H,, Nakajo S., Nakaja K., Nakamura Y., Kurihara Y. L Biol. 

Chem. 264:6655-6659(1989). 

[ 5] Hattori T., Yoshida N., Nakamura K. Plant Mol. Biol. 13:563-572(1989). 

[ 6] Krizaj I., Drobnic-Kosorok M., Brzin J., Jerala R. ? Turk V. FEBS Lett. 333:15-20(1993). 

[ 7] Bradshaw H.D., Hollick J.B., Parsons T J. ? Clarke H.R.G., Gordon M.P. Plant Mol. Biol. 

14:51-59(1989). 

[ 8] Tai H., McHenry L., Fritz P.J., Furtek D.B. Plant Mol. Biol. 16:913-915(1991). 
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319. Beta-ketoacyl synthases active site 

Beta-ketoacyl-ACP synthase (KAS) [1] is the enzyme that catalyzes the condensation of 
malonyl-ACP with the growing fatty acid chain. It is found as a component of the following 
enzymatic systems: - Fatty acid synthetase (FAS), which catalyzes the formation of long- 
chain fatty acids from acetyl-CoA, malonyl-CoA and NADPH. Bacterial and plant 
chloroplast FAS are composed of eight separate subunits which correspond to different 
enzymatic activities; beta-ketoacyl synthase is one of these polypeptides. Fungal FAS 
consists of two multifunctional proteins, FAS1 and FAS2; the beta-ketoacyl synthase domain 
is located in the C-terminal section of FAS2. Vertebrate FAS consists of a single 
multifunctional chain; the beta-ketoacyl synthase domain is located in the N-terminal section 
[2]. - The multifunctional 6-methy salicylic acid synthase (MS AS) from Penicillium patulum 
[3]. This is a multifunctional enzyme involved in the biosynthesis of a polyketide antibiotic 
and which has a KAS domain in its N-terminal section. - Polyketide antibiotic synthase 
enzyme systems. Polyketides are secondary metabolites produced by microorganisms and 
plants from simple fatty acids. KAS is one of the components involved in the biosynthesis of 
the Streptomyces polyketide antibiotics granatacin [4], tetracenomycin C [5] and 
erythromycin. - Emericella nidulans multifunctional protein Wa. Wa is involved in the 
biosynthesis of conidial green pigment. Wa is protein of 216 Kd that contains a KAS domain. 
- Rhizobium nodulation protein nodE, which probably acts as a beta-ketoacyl synthase in the 
synthesis of the nodulation Nod factor fatty acyl chain. - Yeast mitochondrial protein CEM1. 
The condensation reaction is a two step process: the acyl component of an activated acyl 
primer is transferred to a cysteine residue of the enzyme and is then condensed with an 
activated malonyl donor with the concomitant release of carbon dioxide. The sequence 
around the active site cysteine is well conserved and can be used as a signature pattern. 

Consensus pattern: G-x(4)-[LIVMFAP]-x(2)-[AGC]-C-[STA](2)-[STAG]-x(3)-[LIVMF] [C 
is the active site residue] 

[ 1] Kauppinen S. ? Siggaard-Andersen M., von Wettstein-Knowles P. Carlsberg Res. 
Commun. 53:357-370(1988). 

[ 2] Witkowski A., Rangan V.S., Randhawa Z.I., Amy C.M., Smith S. Eur. J. Biochem. 
198:571-579(1991). 
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[ 3] Beck J., Ripka S., Siegner A., Schiltz E., Schweizer E. Eur. J. Biochem. 192:487- 
498(1990). 

[ 4] Bibb M.J., Biro S., Motamedi H., Collins J.F., Hutchinson C.R. EMBO J. 8:2727- 
2736(1989). 

[ 5] Sherman D.H., Malpartida R, Bibb M.J., Kieser H.M., Bibb M.J., Hopwood D.A. EMBO 
J. 8:2717-2725(1989). 

320. Kinesin motor domain signature and profile 

Kinesin [1,2,3] is a microtubule-associated force-producing protein that mayplay a role in 
organelle transport. Kinesin is an oligomeric complex composedof two heavy chains and two 
light chains. The kinesin motor activity isdirected toward the microtubule's plus end.The 
heavy chain is composed of three structural domains: a large globular N-terminal domain 
which is responsible for the motor activity of kinesin (it isknown to hydrolyze ATP, to bind 
and move on microtubules), a central alpha-helical coiled coil domain that mediates the 
heavy chain dimerization; and asmall globular C-terminal domain which interacts with other 
proteins (such asthe kinesin light chains), vesicles and membranous organelles.A number of 
proteins have been recently found that contain a domain similarto that of the kinesin 'motor' 
domain [1,4,E1_]: - Drosophila claret segregational protein (ncd). Ned is required for normal 
chromosomal segregation in meiosis, in females, and in early mitotic divisions of the embryo. 
The ncd motor activity is directed toward the microtubule's minus end. - Drosophila kinesin- 
like protein (nod). Nod is required for the distributive chromosome segregation of 
nonexchange chromosomes during meiosis. - Human CENP-E [4]. CENP-E is a protein that 
associates with kinetochores during chromosome congression, relocates to the spindle 
midzone at anaphase, and is quantitatively discarded at the end of the cell division. CENP-E 
is probably an important motor molecule in chromosome movement and/ or spindle 
elongation. - Human mitotic kinesin-like protein-1 (MKLP-1), a motor protein whose activity 
is directed toward the microtubule's plus end. - Yeast KAR3 protein, which is essential for 
yeast nuclear fusion during mating. KAR3 may mediate microtubule sliding during nuclear 
fusion and possibly mitosis. - Yeast CIN8 and KIP1 proteins which are required for the 
assembly of the mitotic spindle. Both proteins seem to interact with spindle microtubules to 
produce an outwardly directed force acting upon the poles. - Fission yeast cut7 protein, which 
is essential for spindle body duplication during mitotic division. - Emericella nidulans bimC, 
which plays an important role in nuclear division. - Emericella nidulans klpA. - 
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Caenorhabditis elegans unc-104 ? which may be required for the transport of substances 
needed for neuronal cell differentiation. - Caenorhabditis elegans osm-3. - Xenopus Eg5, 
which may be involved in mitosis. - Arabidopsis thaliana KatA, KatB and katC. - 
Chlamydomonas reinhardtii FLA10/KHP1 and KLP1. Both proteins seem to play a role in 
the rotation or twisting of the microtubules of the flagella. - Caenorhabditis elegans 
hypothetical protein T09A5.2.The kinesin motor domain is located in the N-terminal part of 
most of theabove proteins, with the exception of KAR3, klpA, and ncd where it is locatedin 
the C-terminal section.The kinesin motor domain contains about 330 amino acids. An ATP- 
binding motifof type A is found near position 80 to 90, the C-terminal half of the domainis 
involved in microtubule-binding. The signature pattern for that domain isderived from a 
conserved decapeptide inside the microtubule-binding part. 

Consensus pattern: [GSA]-[KRHPSTQVM]-[LIVMF]~x-[LIVMF]-[IVC]-D-L-[AH]-G- 
[SAN]-E 

[ 1] Bloom G.S., Endow S.A. Protein Prof. 2:1109-1171(1995). 

[ 2] Vallee R.B., Shpetner H.S. Annu. Rev. Biochem. 59:909-932(1990). 

[ 3] Brady S.T. Trends Cell Biol. 5:159-164(1995). 

[ 4] Endow S.A. Trends Biochem. Sci. 16:221-225(1991).[E1] 

321. Ribosomal protein L15 signature 

Ribosomal protein L15 is one of the proteins from the large ribosomal subunit. In Escherichia 
coli, L15 is known to bind the 23S rRNA. It belongs to a family of ribosomal proteins which, 
on the basis of sequence similarities [1], groups: - Eubacterial L15. - Plant chloroplast L15 
(nuclear-encoded). - Archaebacterial L15. - Vertebrate L27a. - Tetrahymena thermophila 
L29. - Fungi L27a (L29, CRP-1, CYH2).L15 is a protein of 144 to 154 amino-acid residues. 
As a signature pattern, a conserved region was selected in the C-terminal section of these 
proteins. 

Consensus pattern: K-[LIVM](2)-[GASL]-x-[GT]-x-[LIVMA]-x(2,5)-[LIVM]-x- [LIVMF]- 
x(3,4)-[LIVMFCA]-[ST]-x(2)-A-x(3)-[LIVM]-x(3)-G 

[ 1] Otaka E., Hashimoto T., Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 
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322. LBP / BPI / CETP family signature 

The following mammalian lipid-binding serum glycoproteins belong to the same family 
[1,2,3]: - Lipopolysaccharide-binding protein (LBP). LBP binds to the lipid A moiety of 
bacterial lipopolysaccharides (LPS), a glycolipid present in the outer membrane of all Gram- 
negative bacteria. The LBP/LPS complex seems to interact with the CD14 receptor and may 
be responsible for the secretion of alpha-TNF. - Bactericidal permeability-increasing protein 
(BPI). Like LBP, BPI binds LPS and has a cytotoxic activity on Gram-negative bacteria. - 
Cholesteryl ester transfer protein (CETP). CETP is involved in the transfer of insoluble 
cholesteryl esters in reverse cholesterol transport. - Phospholipid transfer protein (PLTP). 
May play a key role in extracellular phospholipid transport and modulation of HDL particles. 
These proteins are structurally related and share many regions of sequencesimilarities. As a 
signature pattern one of these regions was selected, which is located in the N-terminal section 
of these proteins; a region which could be involved in the binding to the lipids [2]. 

Consensus pattern: [PA]-[GA]-[LIVMC]-x(2)-R-[IV]-[ST]-x(3)-L-x(5)-[EQ]-x(4)- [LIVM]- 
[EQK]-x(8)-P 

[ 1] Schumann R.R., Leong S.R., Flaggs G.W., Gray P.W. ? Wright S.D., Mathison LC, 
Tobias P.S., Ulevitch R J. Science 249:1429-1431(1990). 

[ 2] Gray P.W., Flaggs G., Leong S.R., Gumina R.J., Weiss L, Ooi C.E., Elsbach P. J. Biol. 
Chem. 264:9505-9509(1989). 

[ 3] Day J.R., Albers J J., Lofton-Day C.E., Gilbert T.L., Ching A.F.T., Grant F.J., O'Hara 
P J., Marcovina S.M., Adolphson J.L. J. Biol. Chem. 269:9388-9391(1994). 

323. LIM domain signature and profile 

Recently [1,2] a number of proteins have been found to contain a conserved cysteine-rich 
domain of about 60 amino-acid residues. These proteins are: - Caenorhabditis elegans mec-3; 
a protein required for the differentiation of the set of six touch receptor neurons in this 
nematode. - Caenorhabditis elegans lin-11; a protein required for the asymmetric division of 
vulval blast cells. - Vertebrate insulin gene enhancer binding protein isl-1. Isl-1 binds to one 
of the two cis-acting protein-binding domains of the insulin gene. - Vertebrate homeobox 
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proteins lim-1, lim-2 (lim-5) and lim3. - Vertebrate lmx-1, which acts as a transcriptional 
activator by binding to the FLAT element; a beta-cell-specific transcriptional enhancer found 
in the insulin gene. - Mammalian LH-2, a transcriptional regulatory protein involved in the 
control of cell differentiation in developing lymphoid and neural cell types. - Drosophila 
5 protein apterous, required for the normal development of the wing and halter imaginal discs. - 
Vertebrate protein kinases LIMK-1 and LIMK-2. - Mammalian rhombotins. Rhombotin 1 
(RBTN1 or TTG-1) and rhombotin-2 (RBTN2 or TTG-2) are proteins of about 160 amino 
acids whose genes are disrupted by chromosomal translocations in T-cell leukemia. - 
Mammalian and avian cysteine-rich protein (CRP), a 192 amino-acid protein of unknown 

10 function. Seems to interact with zyxin. - Mammalian cysteine-rich intestinal protein (CRIP), 
a small protein which seems to have a role in zinc absorption and may function as an 
intracellular zinc transport protein. - Vertebrate paxillin, a cytoskeletal focal adhesion protein. 
- Mouse testin. Mouse testin should not be confused with rat testin which is a thiol protease 
homolog. - Sunflower pollen specific protein SF3. - Chicken zyxin. Zyxin is a low-abundance 

1 5 adhesion plaque protein which has been shown to interact with GRP. - Yeast protein LRG1 
which is involved in sporulation [4]. - Yeast rho-type GTPase activating protein 
RGA1/DBM1. - Caenorhabditis elegans homeobox protein ceh-14. - Caenorhabditis elegans 
homeobox protein unc-97. - Yeast hypothetical protein YKR090w. - Caenorhabditis elegans 
hypothetical proteins C28H8.6.These proteins generally have two tandem copies of a domain, 

2 0 called LIM (forLin-11 Isl-1 Mec-3) in their N-terminal section. Zyxin and paxillin 

areexceptions in that they contains respectively three and four LIM domains attheir C- 
terminal extremity. In apterous, isl-1, LH-2, lin-11, lim-1 to lim-3Jmx-l and ceh-14 and mec- 
3 there is a homeobox domain some 50 to 95 amino acids after theLIM domains.In the LIM 
domain, there are seven conserved cysteine residues and ahistidine. The arrangement 

2 5 followed by these conserved residues is C-x(2)-C-x(16,23)-H-x(2)-[CH]-x(2)-C-x(2)-C- 

x(16 5 21)-C-x(2,3)-[CHD]. The LIM domainbinds two zinc ions [5]. LIM does not bind DNA, 
rather it seems to act asinterface for protein-protein interaction. A pattern was developed that 
spans the first half of the LIM domain. 

3 0 Consensus pattern: C-x(2)-C-x(15,21)-[FYWH]-H-x(2)-[CH]-x(2)-C-x(2)-C-x(3)- [LIVMF] 

[The 5 Cs and the H bind zinc] 

[ 1] Freyd G. 5 Kim S.K., Horvitz H.R. Nature 344:876-879(1990). 

[ 2] Baltz R. ; Evrard J.-L., Domon C, Steinmetz A. Plant Cell 4:1465-1466(1992). 
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[ 3] Sanchez-Garcia L, Rabbitts T.H. Trends Genet. 10:315-320(1994). 

[ 4] Mueller A., Xu G., Wells R., Hollenberg CP., Piepersberg W. Nucleic Acids Res. 

22:3151-3154(1994). 

[ 5] Michelsen J.W., Schmeichel K.L., Beckerle M.C., Winge D.R. Proc. Natl. Acad. ScL 
U.S.A. 90:4404-4408(1993). 

324. (LRR) Leucine Rich Repeat 

CAUTION: This Pfam may not find all Leucine Rich Repeats in a protein. Leucine Rich 
Repeats are short sequence motifs present in a number of proteins with diverse functions and 
cellular locations. These repeats are usually involved in protein-protein interactions. Each 
Leucine Rich Repeat is composed of a beta-alpha unit. These units form elongated non- 
globular structures. Leucine Rich Repeats are often flanked by cysteine rich domains. 
Number of members: 3017 

[1] The leucine-rich repeat: a versatile binding motif. Kobe B, Deisenhofer J; Trends 
Biochem Sci 1994;19:415-421. [2] Crystal structure of porcine ribonuclease inhibitor, a 
protein with leucine-rich repeats. Kobe B, Deisenhofer J; Nature 1993;366:751-756. 

325. Plant lipid transfer protein family signature (LTP) 

Plant cells contain proteins, called lipid transfer proteins (LTP) [1,2,3], which are able 

to facilitate the transfer of phospholipids and other lipidsacross membranes. These proteins, 

whose subcellular location is not yet known, could play a major role in membrane biogenesis 

by conveying phospholipids such as waxes or cutin from their site of biosynthesis to 

membranes unable to form these lipids. Plant LTP's are proteins of about 9 Kd (90 amino 

acids) which contain eight conserved cysteine residues all involved in disulfide bridges, as 

shown in the following schematic representation. 

^ _^ j _| |_ | | | 1 1 **** * 

xCxxxxCxxxxxxCCxxxxxxxxCxCxxxxxxxxxxxCxxxxxxCxx | 1 1 | + 1 -— + | +--- 

+ 

! C: conserved cysteine involved in a disulfide bond. 
'*': position of the pattern. 
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Consensus pattern: [LIVM]-[PA]-x(2>C-x-[LIVM]-x-[LIVM]-x-[LIVMFY]-x-[LIV^^ 
[ST]-x(3)-[DN]-C-x(2)-[LIVM] [The two C's are involved in disulfide bonds] 

[1] Wirtz K.WA. Annu. Rev. Biochem. 60:73-99(1991). 
[2] Arondel V., Kader J.C. Experientia 46:579-585(1990). 

[3] Ohlrogge J.B., Browse J. ? Somerville C.R. Biochim. Biophys. Acta 1082:1-26(1991). 

326. (LAMP) Lysosome-associated membrane glycoproteins signatures 
Lysosome-associated membrane glycoproteins (lamp) [1] are integral membrane proteins, 
specific to lysosomes, and whose exact biological function is not yet clear. Structurally, the 
lamp proteins consist of two internally homologous lysosome-luminal domains separated by 
a proline-rich hinge region; at the C-terminal extremity there is a transmembrane region 
followed by a very short cytoplasmic tail. In each of the duplicated domains, there are two 
conserved disulfide bonds. This structure is schematically represented in the figure below. +-- 
.„+ +_+ +_+ +_+ 1 1 1 1 1 1 1 1 

xCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxxxCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxx <- 

xHingex ><TMxC> In mammals, there are two 

closely related types of lamp: lamp-1 and lamp-2. In chicken lamp-1 is known as 
LEPlOO.The macrophage protein CD68 (or macrosialin) [2] is a heavily glycosylatedintegral 
membrane protein whose structure consists of a mucin-like domain followed by a proline-rich 
hinge; a single lamp-like domain; a transmembrane region and a short cytoplasmic tail. Two 
signature patterns for this family of proteins were developed. The first oneis centered on the 
first conserved cysteine of the duplicated domains. The second corresponds to a region that 
includes the extremity of the second domain, the totality of the transmembrane region and the 
cytoplasmic tail. 

Consensus pattern: [STA]-C-[LIVM]-[LIVMFW]-A-x-[LIVMFYW]-x(3)-[LIVMFYW]- 
x(3)-Y [C is involved in a disulfide bond] - 

Consensus pattern: C-x(2)-D-x(3,4)-[LIVM](2)-P-[LIVM]-x-[LIVM]-G-x(2)-[LIVM]- x-G- 
[LIVM](2)-x-[LIVM](4)-A-[FY]-x-[LIVM]-x(2)-[KR]-[RH]- x(l,2)-[STAG](2)-Y-[EQ] [C 
is involved in a disulfide bond] 



[ 1] Fukuda M. J. Biol. Chem. 266:21327-21330(1991). 
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[ 2] Holness C.L., da Silva R.P., Fawcett J., Gordon S., Simmons D.L. J. Biol. Chem. 
268:9661-9666(1993). 

5 327. Lipolytic enzymes "G-D-S-L" family, serine active site 

Recently [1], a family of lipolytic enzymes has been characterized. This family 
currently consist of the following proteins: 

- Aeromonas hydrophila lipase/phosphatidylcholine-sterol acyltransferase. 

- Xenorhabdus luminescens lipase 1. 
10 - Vibrio mimicus arylesterase. 

- Escherichia coli acyl-coA thioesterase I (gene tesA). 

- Vibrio parahaemolyticus thermolabile hemolysin/atypical phospholipase. 

- Rabbit phospholipase AdRab-B, an intestinal brush border protein with esterase and 
phospholipase A/lysophospholipase activity that could be involved in the uptake of dietary 

1 5 lipids. AdRab-B contains four repeats of about 320 amino acids. 

- Arabidopsis thaliana and Brassic napus anther-specific proline-rich protein APG. 

- A Pseudomonas putida hypothetical protein in trpE-trpG intergenic region. A serine has 
been identified a part of the active site in the Aeromonas, Vibrio mimicus and Escherichia 
coli enzymes. It is located in a conserved sequence motif that can be used as a signature 

2 0 pattern for these proteins. 

-Consensus pattern: [LIVMFYAG](4)-G-D-S-[LIVM]-x(l,2)-[TAG]-G 
[S is the active site residue] 

25 

328. (Lipoprotein 4) Prokaryotic membrane lipoprotein lipid attachment site 
In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptide, which 
is cleaved by a specific lipoprotein signal peptidase (signalpeptidase II). The peptidase 
recognizes a conserved sequence and cuts upstreamof a cysteine residue to which a glyceride- 

3 0 fatty acid lipid is attached [l].Some of the proteins known to undergo such processing 

currently include (forrecent listings see [1,2,3]): - Major outer membrane lipoprotein 
(murein-lipoproteins) (gene lpp). - Escherichia coli lipoprotein-28 (gene nlpA). - Escherichia 
coli lipoprotein-34 (gene nlpB). - Escherichia coli lipoprotein nlpC. - Escherichia coli 
lipoprotein nlpD. - Escherichia coli osmotically inducible lipoprotein B (gene osmB). - 
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Escherichia coli osmotically inducible lipoprotein E (gene osmE). - Escherichia coli 
peptidoglycan-associated lipoprotein (gene pal). - Escherichia coli rare lipoproteins A and B 
(genes rplA and rplB). - Escherichia coli copper homeostasis protein cutF (or nlpE). - 
Escherichia coli plasmids traT proteins. - Escherichia coli Col plasmids lysis proteins. - A 
number of Bacillus beta-lactamases. - Bacillus subtilis periplasmic oligopeptide-binding 
protein (gene oppA). - Borrelia burgdorferi outer surface proteins A and B (genes ospA and 
ospB). - Borrelia hermsii variable major protein 21 (gene vmp21) and 7 (gene vmp7). - 
Chlamydia trachomatis outer membrane protein 3 (gene omp3). - Fibrobacter succinogenes 
endoglucanase cel-3. - Haemophilus influenzae proteins Pal and Pep. - Klebsiella pullulunase 
(gene pulA). - Klebsiella pullulunase secretion protein pulS. - Mycoplasma hyorhinis protein 
p37. - Mycoplasma hyorhinis variant surface antigens A, B ? and C (genes vlpABC). - 
Neisseria outer membrane protein H.8. - Pseudomonas aeruginosa lipopeptide (gene lppL). - 
Pseudomonas solanacearum endoglucanase egL - Rhodopseudomonas viridis reaction center 
cytochrome subunit (gene cytC). - Rickettsia 17 Kd antigen. - Shigella flexneri invasion 
plasmid proteins mxiJ and mxiM. - Streptococcus pneumoniae oligopeptide transport protein 
A (gene amiA). - Treponema pallidium 34 Kd antigen. - Treponema pallidium membrane 
protein A (gene tmpA). ■ Vibrio harveyi chitobiase (gene chb). - Yersinia virulence plasmid 
protein yscJ. - Halocyanin from Natrobacterium pharaon is [4], a membrane associated 
copper- binding protein. This is the first archaebacterial protein known to be modified in such 
a fashion).From the precursor sequences of all these proteins, a consensus pattern and a set of 
rules to identify this type of post-translational modification was derived. 

Consensus pattern: {DERK}(6)-[LIVMFWSTAG](2)-[LIVMFYSTAGCQ]-[AGS]-C [C is 
the lipid attachment site] Additional rules: 1) The cysteine must be between positions 15 and 
35 of the sequence in consideration. 2) There must be at least one Lys or one Arg in the first 
seven positions of the sequence. 

[ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
[ 2] Klein P., Somorjai R.L., Lau P.C.K. Protein Eng. 2:15-20(1988). 
[ 3] von Heijne G. Protein Eng. 2:531-534(1989). 

[ 4] Mattar S., Scharf B. ? Kent S.B.H., Rodewald K. ? Oesterhelt D. ? Engelhard M. J. BioL 
Chem. 269:14939-14945(1994). 
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329, (Lopoprotein 5) Prokaryotic membrane lipoprotein lipid attachment site. In prokaryotes, 
membrane lipoproteins are synthesized with a precursor signal peptide, which is cleaved by a 
specific lipoprotein signal peptidase (signal peptidase II). The peptidase recognizes a 
conserved sequence and cuts upstream of a cysteine residue to which a glyceride-fatty acid 
5 lipid is attached [lJ.Some of the proteins known to undergo such processing currently include 
(for recent listings see [1,2,3]): - Major outer membrane lipoprotein (murein-lipoproteins) 
(gene lpp). - Escherichia coli lipoprotein-28 (gene nlpA). - Escherichia coli lipoprotein-34 
(gene nlpB). - Escherichia coli lipoprotein nlpC. - Escherichia coli lipoprotein nlpD. - 
Escherichia coli osmotically inducible lipoprotein B (gene osmB). - Escherichia coli 

1 0 osmotically inducible lipoprotein E (gene osmE). - Escherichia coli peptidoglycan-associated 
lipoprotein (gene pal). - Escherichia coli rare lipoproteins A and B (genes rplA and rplB). - 
Escherichia coli copper homeostasis protein cutF (or nlpE). - Escherichia coli plasmids traT 
proteins. - Escherichia coli Col plasmids lysis proteins. - A number of Bacillus beta- 
lactamases. - Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). - 

1 5 Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). - Borrelia 

hermsii variable major protein 21 (gene vmp21) and 7 (gene vmp7). - Chlamydia trachomatis 
outer membrane protein 3 (gene omp3). - Fibrobacter succinogenes endoglucanase cel-3. - 
Haemophilus influenzae proteins Pal and Pep. - Klebsiella pullulunase (gene pulA). - 
Klebsiella pullulunase secretion protein pulS. - Mycoplasma hyorhinis protein p37. - 

2 0 Mycoplasma hyorhinis variant surface antigens A, B, and C (genes vlp ABC). - Neisseria 
outer membrane protein H.8. - Pseudomonas aeruginosa lipopeptide (gene lppL). - 
Pseudomonas solanacearum endoglucanase egl. - Rhodopseudomonas viridis reaction center 
cytochrome subunit (gene cytC). - Rickettsia 17 Kd antigen. - Shigella flexneri invasion 
plasmid proteins mxiJ and mxiM. - Streptococcus pneumoniae oligopeptide transport protein 

25 A (gene amiA). - Treponema pallidium 34 Kd antigen. - Treponema pallidium membrane 

protein A (gene tmpA). - Vibrio harveyi chitobiase (gene chb). - Yersinia virulence plasmid 
protein yscJ. - Halocyanin from Natrobacterium pharaonis [4], a membrane associated 
copper- binding protein. This is the first archaebacterial protein known to be modified in such 
a fashion).From the precursor sequences of all these proteins, a consensus pattern and a set of 

30 rules to identify this type of post-translational modification have been developed. 

Consensus pattern: {DERK}(6)-[LIVMFWSTAG](2)-[LIVMFYSTAGCQ]-[AGS]-C [C is 
the lipid attachment site] Additional rules: 1) The cysteine must be between positions 15 and 
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35 of the sequence in consideration. 2) There must be at least one Lys or one Arg in the first 
seven positions of the sequence. 

[ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomernbr. 22:451-471(1990).[ 2] Klein P., Somorjai 
R.L., Lau P.CL Protein Eng. 2:15-20(1988).[ 3] von Heijne G. Protein Eng. 2:531- 
534(1989).[ 4] Mattar S., Scharf B., Kent S.B.H., Rodewald K., Oesterhelt D., Engelhard M. 
J. Biol. Chem. 269:14939-14945(1994). 

330. (Lum binding) Riboflavin synthase alpha chain family Lum-binding site signature 
The following proteins have been shown [1,2] to be structurally and evolutionary related: - 
Riboflavin synthase alpha chain (RS-alpha) (gene ribC in Escherichia coli, ribB in Bacillus 
subtilis and Photobacterium leiognathi, RIBS in yeast). This enzyme synthesizes riboflavin 
from two moles of 6,7- dimethyl-8-(l'-D-ribityl)lumazine (Lum), a pteridine-derivative. - 
Photobacterium phosphoreum lumazine protein (LumP) (gene luxL). LumP is a protein that 
modulates the color of the bioluminescence emission of bacterial luciferase. In the presence 
of LumP, light emission is shifted to higher energy values (shorter wavelength). LumP binds 
non-covalently to 6,7-dimethyl-8-(l'-D-ribityl) lumazine. - Vibrio fischeri yellow fluorescent 
protein (YFP) (gene luxY). Like LumP, YFP modulates light emission but towards a longer 
wavelength. YFP binds non-covalently to FMN. These proteins seem to have evolved from 
the duplication of a domain of aboutlOO residues. In its C-terminal section, this domain 
contains a conserved motif [KR]-V-N-[LI]-E which has been proposed to be the binding site 
for Lum.RS-alpha which binds two molecules of Lum has two perfect copies of this motif, 
while LumP which binds one molecule of Lum, has a Glu instead of Lys/Arg in the first 
position of the second copy of the motif. Similarity, YFP, which binds to one molecule of 
FMN, also seems to have a potentially dysfunctional binding site by substitution of Gly for 
Glu in the last positionof the first copy of the motif. Our signature pattern includes the Lum- 
binding motif. 

Consensus pattern: [LIVMF]-x(5)-G-[STADNQ]-[KREQIYW]-V-N-[LIVM]-E 

[ 1] O'Kane D.J., Woodward B., Lee J., Prasher D.C. Proc. Natl. Acad. Sci. U.S.A. 88:1100- 
1104(1991). 

[ 2] O'Kane D.J., Prasher D.C. Mol. Microbiol. 6:443-449(1992). 
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331. Lysyl oxidase putative copper-binding region signature 

Lysyl oxidase (LOX) [1] is an extracellular copper-dependent enzyme that catalyzes the 
oxidative deamination of peptidyl lysine residues in precursors of various collagens and 
elastins. The deaminated lysines are then able to form aldehyde cross-links. LOX binds a 
single copper atom which seems to reside within an octahedral coordination complex which 
includes at least three histidine ligands. Fourhistidine residues are clustered in a central 
region of the enzyme. This region is thought to be involved in cooper-binding and is called 
the 'copper-talon* [1], This region was used as a signature pattern. 

Consensus pattern: W-E-W-H-S-C-H-Q-H-Y-H 

[ 1] Krebs C.J., Krawetz S.A. Biochim. Biophys. Acta 1202:7-12(1993). 

332. Metallo-beta-lactamase superfamily (lactamase_B) 

[1] : Neuwald AR Liu JS ? Lipman DJ, Lawrence CE, Nucleic Acids Res 
1997;25:1665-1677. [2] Carfi A, Pares S, Duee E, Galleni M, Duez C, Frere JM ? Dideberg O, 
EMBO J 1995;14:4914-4921. 

333. L-lactate dehydrogenase active site (Idhl) 

L-lactate dehydrogenase (EC 1.1.L27) (LDH) [1] catalyzes the reversible NAD-dependent 
interconversion of pyruvate to L-lactate. In vertebrate muscles and in lactic acid bacteria it 
represents the final step in anaerobic glycolysis. This tetrameric enzyme is present in 
prokaryotic and eukaryotic organisms. Invertebrates there are three isozymes of LDH: the M 
form (LDH-A), found predominantly in muscle tissues; the H form (LDH-B), found in heart 
muscle and the X form (LDH-C), found only in the spermatozoa of mammals and birds. In 
birds and crocodilian eye lenses, LDH-B serves as a structural protein and is known as 
epsilon-crystallin [2].L-2-hydroxyisocaproate dehydrogenase (EC 1.1.1.-) (L-hicDH) [3] 
catalyzes the reversible and stereospecific interconversion between 2-ketocarboxylic acids 
and L-2-hydroxy-carboxylic acids. L-hicDH is evolutionary related to LDH's. As a signature 
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for LDH's a region was selected that includes a conserved histidine which is essential to the 
catalytic mechanism. 

Consensus pattern: [LIVMA]-G-[EQ]-H-G-[DN]-[ST] [H is the active site residue] - 

[ 1] Abad-Zapatero C, Griffith J .P., Sussman J.L., Rossmann M.G. J. Mol. Biol. 198:445- 
467(1987). 

[ 2] Hendriks W., Mulders J.W.M., Bibby M.A., Slingsby C, Bloemendal H., de Jong W.W. 

Proc. Natl. Acad. Sci. U.S.A. 85:7114-7118(1988). 

[ 3] Lerch H.-P., Frank R. s Collins J. Gene 83:263-270(1989). 

Malate dehydrogenase active site signature (ldh2) 

Malate dehydrogenase (EC 1.1.1.37) (MDH) [1,2] catalyzes the interconversion of malate to 
oxaloacetate utilizing the NAD/NADH cofactor system. The enzyme participates in the citric 
acid cycle and exists in all aerobic organisms. While prokaryotic organisms contains a single 
form of MDH, in eukaryotic cells there are two isozymes: one which is located in the 
mitochondrial matrix and the other in the cytoplasm. Fungi and plants also harbor a 
glyoxysomal form which functions in the glyoxylate pathway. In plants chloroplast there is 
an additional NADP-dependent form of MDH (EC 1.1.1.82) which is essential for both the 
universal C3 photosynthesis (Calvin) cycle and the more specializedC4 cycle. As a signature 
pattern for this enzyme a region was chosen that includes two residues involved in the 
catalytic mechanism [3]: an aspartic acid which is involved in a proton relay mechanism, and 
an arginine which binds the substrate. 

Consensus pattern: [LIVM]-T-[TRKMN]-L-D-x(2)-R-[STA]-x(3)-[LIVMFY] [D and R are 
the active site residues] - 

[ 1] McAlister-Henn L. Trends Biochem. Sci. 13:178-181(1988). 

[ 2] Gietl C. Biochim. Biophys. Acta 1100:217-234(1992). 

[ 3] Birktoft J.J., Rhodes G., Banaszak L.J. Biochemistry 28:6065-6081(1989). 

[ 4] Cendrin F., Chroboczek J., Zaccai G., Eisenberg H., Mevarech M. Biochemistry 

32:4308-4313(1993). 
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334. Legume lectins signatures 

Leguminous plants synthesize sugar-binding proteins which are called legume lectins [1,2]. 
These lectins are generally found in the seeds. The exact function of legume lectins is not 
known but they may be involved in the attachment of nitrogen-fixing bacteria to legumes and 
in the protection against pathogens. Legume lectins bind calcium and manganese (or other 
transition metals). Legume lectins are synthesized as precursor proteins of about 230 to 260 
amino acid residues. Some legume lectins are proteolytically processed to produce two 
chains: beta (which corresponds to the N-terminal) and alpha (C- terminal). The lectin 
concanavalin A (conA) from jack bean is exceptional in that the two chains are transposed 
and ligated (by formation of a new peptide bond). The N-terminus of mature conA thus 
corresponds to that of the alpha chain and the C-terminus to the beta chain. Two signature 
patterns specific to legume lectins have been developed: the first is located in the C-terminal 
section of the beta chain and contains a conserved aspartic acid residue important for the 
binding of calcium and manganese; the second one is located in the N-terminal of the alpha 
chain. 

Consensus pattern: [LIV]-[STAG]-V-[DEQV]-[FLI]-D-[ST] [D binds manganese and 
calcium] - 

Consensus pattern: [LIV]-x-[EDQ]-[FYWKR]-V-x-[LIVF]-G-[LF]-[ST]- 

[ 1] Sharon N., Lis H. FASEB J. 4:3198-320(1990). 

[ 2] Lis H. ; Sharon N. Annu. Rev. Biochem. 55:33-37(1986). 

335. CoA-ligases (ligases- CoA) 

This family includes the CoA ligases Succinyl-CoA synthetase alpha: and beta chains, 
malate CoA ligase and ATP-citrate lyase. Some members of the family utilise ATP others use 
GTP. 

[1] Wolodko WT, Fraser ME, James MN, Bridger WA, J Biol Chem 1994;269:10883- 

10890. 



336. linker histone HI and H5 family 
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Linker histone HI is an essential component of chromatin structure. HI links 
nucleosomes into higher order structures Histone HI is replaced by histone H5 in some cell 
types. 

[1] Ramakrishnan V, Finch JT, Graziano V, Lee PL ? Sweet RM, Nature 
5 1993;362:219-223. 

337. Lipocalin signature (lipl) 

Proteins which transport small hydrophobic molecules such as steroids, bilins, retinoids, and 
1 0 lipids share limited regions of sequence homology and a common tertiary structure 

architecture [1 to 5]. This is an eight stranded antiparallel beta-barrel with a repeated + 1 
topology enclosing a internal ligand binding site [1,3]. The name 'lipocalin' has been 
proposed [5] for this protein family. Proteins known to belong to this family are listed below 
(references are only provided for recently determined sequences). - Alpha- 1 -microglobulin 
1 5 (protein HC), which seems to bind porphyrin. - Alpha-l-acid glycoprotein (orosomucoid), 
which can bind a remarkable array of natural and synthetic compounds [6]. - Aphrodisin 
which, in hamsters, functions as an aphrodisiac pheromone. - Apolipoprotein D, which 
probably binds heme-related compounds. - Beta-lactoglobulin, a milk protein whose 
physiological function appears to bind retinol. - Complement component C8 gamma chain, 
2 0 which seems to bind retinol [7]. - Crustacyanin [8], a protein from lobster carapace, which 
binds astaxanthin, a carotenoid. - Epididymal-retinoic acid binding protein (E-RABP) [9] 
involved in sperm maturation. - Insectacyanin, a moth bilin-binding protein, and a related 
butterfly bilin- binding protein (BBP). - Late Lactation protein (LALP), a milk protein from 
tammar wallaby [10]. - Neutrophil gelatinase-associated lipocalin (NGAL) (p25) (SV-40 

2 5 induced 24p3 protein) [11]. - Odorant-binding protein (OBP), which binds odorants. - Plasma 

retinol-binding proteins (PRBP). - Human pregnancy-associated endometrial alpha-2 
globulin. - Probasin (PB), a rat prostatic protein. - Prostaglandin D synthase (EC 5.3.99.2 ) 
(GSH-independent PGD synthetase), a lipocalin with enzymatic activity [12]. - Purpurin, a 
retinal protein which binds retinol and heparin. - Quiescence specific protein p20K from 

3 0 chicken (embryo CH21 protein). - Rodent urinary proteins (alpha-2-microglobulin), which 

may bind pheromones. - VNSP 1 and 2, putative pheromone transport proteins from mouse 
vomeronasal organ [13]. - Von Ebner's gland protein (VEGP) [14] (also called tear lipocalin), 
a mammalian protein which may be involved in taste recognition. - A frog olfactory protein, 
which may transport odorants. - A protein found in the cerebrospinal fluid of the toad Bufo 
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Marinus with a supposed function similar to transthyretin in transport across the blood brain 
barrier [15]. - Lizard's epididymal secretory protein IV (LESP IV), which could transport 
small hydrophobic molecules into the epididymal fluid during sperm maturation [16]. - 
Prokaryotic outer-membrane protein blc [17].The sequences of most members of the family, 
the core or kernal lipocalins, are characterized by three short conserved stretches of residues 
[3,18]. Others, the outlier lipocalin group, share only one or two of these [3,18]. A signature 
pattern was built around the first, common to all outlier and kernallipocalins, which occurs 
near the start of the first beta-strand. 

Consensus pattern: [DENG]-x-[DENQGSTARK]-x(0,2)-[DENQARK]-[LIVFY]-{CP}-G- 
{C}- W-[FYWLRH]-x-[LIVMTA]- 

Note: it is suggested, on the basis of similarities of structure, function, and sequence, that this 
family forms an overall superfamily, called the calycins, with the avidin/streptavidin 
< PDOC00499 > and the cytosolic fatty- acid binding proteins <PDOC00188> families [3,19] 

[ 1] Cowan S.W., Newcomer M.E., Jones T.A. Proteins 8:44-61(1990). 

[ 2] Igaraishi M., Nagata A., Toh H., Urade H., Hayaishi N. Proc. Natl. Acad. Sci. U.S.A. 

89:5376-5380(1992). 

[ 3] Flower D.R., North A.C.T., Attwood T.K. Protein Sci. 2:753-761(1993). 
[ 4] Godovac-Zimmermann J. Trends Biochem. Sci. 13:64-66(1988). 
[ 5] Pervaiz S., Brew K. FASEB J. 1:209-214(1987). 

[ 6] Kremer J.M.H., Wilting J., Janssen L.H.M. Pharmacol. Rev. 40:1-47(1989). 

[ 7] Haefliger J.-A., Peitsch M.C., Jenne D., Tschopp J. Mol. Immunol. 28:123-131(1991). 

[ 8] Keen J.N., Caceres I., Eliopoulos E.E., Zagalsky P.F., Findlay J.B.C. Eur. J. Biochem. 

197:407-417(1991). 

[ 9] Newcomer M.E. Structure 1:7-18(1993). 

[10] Collet C, Joseph R. Biochhn. Biophys. Acta 1167:219-222(1993). 

[11] Kjeldsen L., Johnsen A.H., Sengelov H., Borregaard N. J. Biol. Chem. 268:10425- 

10432(1993). 

[12] Peitsch M.C., Boguski M.S. Trends Biochem. Sci. 16:363-363(1991). 

[13] Miyawaki A., Matsushita Y.R., Ryo Y., Mikoshiba T. EMBO J. 13:5835-5842(1994). 

[14] Kock K., Ahlers C, Schmale H. Eur. J. Biochem. 221:905-916(1994). 

[15] Achen M.G., Harms P.J., Thomas T., Richardson S.J., Wettenhall R.E.H., Schreiber G. 

J. Biol. Chem. 267:23170-23174(1992). 
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[16] Morel L., Dufarre J.-P., Depeiges A. J. Biol. Chem. 268:10274-10281(1993). 
[17] Bishop R.E., Penfold S.S., Frost L.S., Holtje J.V., Weiner J.H. J. Biol. Chem. 
270:23097-231 03(1 995). 

[18] Flower D.R., North A.C.T., Attwood T.K. Biochem. Biophys. Res. Commun. 180:69- 
74(1991). 

[19] Flower D.R. FEBS Lett. 333:99-102(1993). 
Cytosolic fatty-acid binding proteins signature (lip2) 

A number of low molecular weight proteins which bind fatty acids and other organic anions 
are present in the cytosol [1,2]. Most of them are structurally related and have probably 
diverged from a common ancestor. This structure is a ten stranded antiparallel beta-barrel, 
albeit with a wide discontinuity between the fourth and fifth strands, with a repeated + 1 
topology enclosing an internal ligand binding site [2,7], Proteins known to belong to this 
family include: - Six, tissue-specific, types of fatty acid binding proteins (FABPs) found in 
liver, intestine, heart, epidermal, adipocyte, brain/retina. Heart FABP is also known as 
mammary-derived growth inhibitor (MDGI), a protein that reversibly inhibits proliferation of 
mammary carcinoma cells. Epidermal FABP is also known as psoriasis-associated FABP [3]. 
- Insect muscle fatty acid-binding proteins. - Testis lipid binding protein (TLBP). - Cellular 
retinol-binding proteins I and II (CRBP). - Cellular retinoic acid-binding protein (CRABP). - 
Gastrotropin, an ileal protein which stimulates gastric acid and pepsinogen secretion. It seems 
that gastrotropin binds to bile salts and bilirubins. - Fatty acid binding proteins MFB1 and 
MFB2 from the midgut of the insect Manduca sexta [4]. In addition to the above cytosolic 
proteins, this family also includes: - Myelin P2 protein, which may be a lipid transport 
protein in Schwann cells. P2 is associated with the lipid bilayer of myelin. - Schistosoma 
mansoni protein Sml4 [5] which seems to be involved in the transport of fatty acids. - 
Ascaris suum pl8 a secreted protein that may play a role in sequestering potentially toxic 
fatty acids and their peroxidation products or that may be involved in the maintenance of the 
impermeable lipid layer of the eggshell. - Hypothetical fatty acid-binding proteins F40F4.2, 
F40F4.3, F40F4.4 and ZK742.5 from Caenorhabditis elegans. As a signature pattern for these 
proteins a segment from the N-terminal extremity was use. 

Consensus pattern: [GSATVK]-x-[FYW]-x-[LIVMF]-x(4)-[NHG]-[FY]-[DE]-x- [LIVMFY]- 
[LIVM]-x(2)-[LIVMAKR]- 
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Note: it is suggested, on the basis of similarities of structure, function, and sequence, that this 
family forms an overall superfamily, called the calycins, with the lipocalin < PDOC00187 > 
and avidin/streptavidin <PDOC00499> families [6,7]. 

[ 1] Bernier I., Jolles P. Biochimie 69:1127-1152(1987). 

[ 2] Veerkamp J.H., Peeters R.A., Maatman R.G.H.J. Biochim. Biophys. Acta 1081:1- 
24(1991). 

[ 3] Siegenthaler G., Hotz R., Chatellard-Gruaz D., Didierjean L., Hellman U., Saurat J.-H. 
Biochem. J. 302:363-371(1994). 

[ 4] Smith A.F., Tsuchida K., Hanneman E., Suzuki T.C., Wells M.A. J. Biol. Chem. 
267:380-384(1992). 

[ 5] Moser D., Tendler M., Griffiths G., Klinkert M.-Q. J. Biol. Chem. 266:8447-8454(1991). 
[ 6] Flower D.R., North A.C.T, Attwood T.K. Protein Sci. 2:753-761(1993). 
[ 7] Flower D.R. FEBS Lett. 333:99-102(1993). 

338. Lipoxygenases iron-binding region signatures 

Lipoxygenases (EC 1.13.11.-) are a class of iron-containing dioxygenases which catalyzes the 
hydroperoxidation of lipids, containing a cis,cis-l,4-pentadiene structure. They are common 
in plants where they may be involved in a number of diverse aspects of plant physiology 
including growth and development, pest resistance, and senescence or responses to wounding 
[1]. In mammals a number of lipoxygenases isozymes are involved in the metabolism of 
prostaglandins and leukotrienes [2]. Sequence data is available for the following 
lipoxygenases: - Plant lipoxygenases (EC 1.13.11.12) . Plants express a variety of cytosolic 
isozymes as well as what seems [3] to be a chloroplast isozyme. - Mammalian arachidonate 
5-lipoxygenase (EC 1.13.11.34V - Mammalian arachidonate 12-lipoxygenase (EC 
1.13.11.31) . - Mammalian erythroid cell-specific 15 -lipoxygenase (EC 1.13.11.33) .The iron 
atom in lipoxygenases is bound by four ligands, three of which are histidine residues [4]. Six 
histidines are conserved in all lipoxygenase sequences, five of them are found clustered in a 
stretch of 40 amino acids. This region contains two of the three zinc-ligands; the other 
histidines have been shown [5] to be important for the activity of lipoxygenases. As 
signatures for this family of enzymes two patterns in the region of the histidine cluster were 
selected. The first pattern contains the first three conserved histidines and the second pattern 
includes the fourth and the fifth. 



Attorney No. 2750-1237P 



334 



Consensus pattern: H-[EQ]-x(3)-H-x-[LM]-[NQRC]-[GST]-H-[LIVMSTAC](3)-E [The 
second and third H's bind iron]- 

Consensus pattern: [LIVMA]-H-P-[LIVM]-x-[KRQ]-[LIVMF](2)-x-[AP]-H- 

[ 1] Vick B.A., Zimmerman D.C. (In) Biochemistry of plants: A comprehensive treatise, 
Stumpf P.K., Ed., Vol. 9, pp.53-90, Academic Press, New- York, (1987). 
[ 2] Needleman P., Turk J., Jakschik B.A., Morrison A.R., Lefkowith J.B. Annu. Rev. 
Biochem. 55:69-102(1986). 

[ 3] Peng Y.L., Shirano Y., Ohta H., Hibino T., Tanaka K., Shibata D. J. Biol. Chem. 
269:3755-3761(1994). 

[ 4] Boyington J.C., Gaffney B.J., Amzel L.M. Science 260:1482-1486(1993). 

[ 5] Steczko J., Donoho G.P., Clemens J.C., Dixon J.E., Axelrod B. Biochemistry 31:4053- 

4057(1992). 

339. Fumarate lyases signature (lyase_l) 

A number of enzymes, belonging to the lyase class, for which fumarate is a substrate have 
been shown [1,2] to share a short conserved sequence around a methionine which is probably 
involved in the catalytic activity of this type of enzymes. These enzymes are: - Fumarase (EC 
4.2.1.2 ) (fumarate hydratase), which catalyzes the reversible hydration of fumarate to L- 
malate. There seem to be 2 classes of fumarases: class I are thermolabile dimeric enzymes (as 
for example: Escherichia coli fumC); class II enzymes are thermostable and tetrameric and 
are found in prokaryotes (as for example: Escherichia coli fumA and fumB) as well as in 
eukaryotes. The sequence of the two classes of fumarases are not closely related. - Aspartate 
ammonia-lyase (EC 4.3.1.1 ) (aspartase), which catalyzes the reversible conversion of 
aspartate to fumarate and ammonia. This reaction is analogous to that catalyzed by fumarase, 
except that ammonia rather than water is involved in the trans-elimination reaction. - 
Arginosuccinase (EC 4.3.2.1) (argininosuccinate lyase), which catalyzes the formation of 
arginine and fumarate from argininosuccinate, the last step in the biosynthesis of arginine. - 
Adenylosuccinase (EC 4.3.2.2 ) (adenylosuccinate lyase) [3], which catalyzes the eight step in 
the de novo biosynthesis of purines, the formation of 5'-phosphoribosyl-5-amino-4- 
imidazolecarboxamide and fumarate from l-(5- phosphoribosyl)-4-(N-succino-carboxamide). 
That enzyme can also catalyzes the formation of fumarate and AMP from adenylosuccinate. - 
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Pseudomonas putida 3-carboxy-cis,cis-muconate cycloisomerase (EC 5.5.1.2) (3- 
carboxymuconate lactonizing enzyme) (gene pcaB) [4], an enzyme involved in aromatic 
acids catabolism 

Consensus pattern: G-S-x(2)-M-x(2)-K-x-N- 

[ 1] Woods S.A., Shwartzbach S.D., Guest J.R. Biochim. Biophys. Acta 954:14-26(1988). 

[ 2] Woods S.A., Miles J.S., Guest J.R. FEMS Microbiol. Lett. 51:181-186(1988). 

[ 3] Zalkin H., Dixon J.E. Prog. Nucleic Acid Res. Mol. Biol. 42:259-287(1992). 

[ 4] Williams S.E., Woolridge E.M., Ransom S.C., Landro J.A., Babbitt P.C., Kozarich J.W. 

Biochemistry 31:9768-9776(1992). 



340. MCM family signature and profile 

Proteins shown to be required for the initiation of eukaryotic DNA replication share a highly 
conserved domain of about 210 amino-acid residues [1,2,3]- The latter shows some 
similarities [4] with that of various other families of DNA-dependent ATPases. Eukaryotes 
seem to possess a family of six proteins that contain this domain. They were first identified in 
yeast where most of them have a direct role in the initiation of chromosomal DNA replication 
by interacting directly with autonomously replicating sequences (ARS). They were thus 
called 'minichromosome maintenance proteins' with gene symbols prefixed by MCM. These 
six proteins are: - MCM2, also known as cdcl9 (in S.pombe) [El]. - MCM3, also known as 
DNA polymerase alpha holoenzyme-associated protein PI, RLF beta subunit or ROA. - 
MCM4, also known as CDC54, cdc21 (in S.pombe) or dpa (in Drosophila). - MCM5, also 
known as CDC46 or nda4 (in S.pombe). - MCM6, also known as mis5 (in S.pombe). - 
MCM7, also known as CDC47 or Prolifera (in A.thaliana).This family is also present in 
archebacteria. In Methanococcus jannaschiithere are four members: MJ0363, MJ0961, 
MJ1489 and MJECL13.The presence of a putative ATP-binding domain implies that these 
proteins maybe involved in an ATP-consuming step in the initiation of DNA replication in 
eukaryotes. As a signature pattern, a perfectly conserved region was selected that represents a 
special version of the B motif found in ATP-binding proteins. 



Consensus pattern: G-[IVT]-[LVAC](2)-[IVT]-D-[DE]-[FL]-[DNST] 
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[ 1] Coxon A., Maundrell K., Kearsey S.E. Nucleic Acids Res. 20:5571-5577(1992). 

[ 2] Hu B., Burkhart R., Schulte D., Musahl C, Knippers R. Nucleic Acids Res. 21:5289- 

5293(1993). 

[ 3] Tye B.-K. Trends Cell BioL 4:160-166(1994). 

[ 4] Koonin E.V. Nucleic Acids Res. 21:2541-2547(1993). 

341. Macrophage migration inhibitory factor family signature (MIF) 
A protein called macrophage migration inhibitory factor (MIF) [1] seems to exert an 
important role in host inflammatory responses. It play a pivotal role in the host response to 
endotoxic shock and appears to serve as a pituitary "stress" hormone that regulates systemic 
inflammatory responses. MIF is a secreted protein of 115 residues which is not processed 
from a larger precursor. D-dopachrome tautomerase [2] is a mammalian cytoplasmic enzyme 
involved in melanin biosynthesis and that tautomerizes D-dopachrome with concomitant 
decarboxylation to give 5 ,6-dihydroxy indole (DHI). It is a protein of 117 residues highly 
related to MIF. It must be noted that MIF binds glutathione and has been said to be related to 
glutathione S-transferases. This assertion has been later disproved [3] As a signature pattern 
for these proteins, a conserved region was selected located in the central section. 

Consensus pattern: [DE]-P-C-A-x(3)-[LIVM]-x-S-I-G-x-[LIVM]-G- 
[ 1] Bucala R. Immunol. Lett. 43:23-26(1994). 

[ 2] Odh G., Hindemith A., Rosengren A.-M., Rosengren E., Rorsman H. Biochem. Biophys. 

Res. Commun. 197:619-624(1993). 

[ 3] Pearson W.R. Protein Sci. 3:525-527(1994). 

342. MIP family signature 

Recently the sequence of a number of different proteins, that all seem to be transmembrane 
channel proteins, has been found to be highly related [1 to 4].These proteins are listed below. 
- Mammalian major intrinsic protein (MIP). MIP is the major component of lens fiber gap 
junctions. Gap junctions mediate direct exchange of ions and small molecule from one cell to 
another. - Mammalian aquaporins [5]. These proteins form water-specific channels that 
provide the plasma membranes of red cells and kidney proximal and collecting tubules with 
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high permeability to water, thereby permitting water to move in the direction of an osmotic 
gradient. - Soybean nodulin-26, a major component of the peribacteroid membrane induced 
during nodulation in legume roots after Rhizobium infection. - Plants tonoplast intrinsic 
proteins (TIP). There are various isoforms of TIP: alpha (seed), gamma, Rt (root), and Wsi 
(water-stress induced). These proteins may allow the diffusion of water, amino acids and/or 
peptides from the tonoplast interior to the cytoplasm. - Bacterial glycerol facilitator protein 
(gene glpF), which facilitates the movement of glycerol across the cytoplasmic membrane. - 
Salmonella typhimurium propanediol diffusion facilitator (gene pduF). - Yeast FPS1, a 
glycerol uptake/efflux facilitator protein. - Drosophila neurogenic protein big brain' (bib). 
This protein may mediate intercellular communication; it may functions by allowing the 
transport of certain molecules(s) and thereby sending a signal for an exodermal cell to 
become an epidermoblast instead of a neuroblast. - Yeast hypothetical protein YFL054c. - A 
hypothetical protein from the pepX region of lactococcus lactis. The MIP family proteins 
seem to contain six transmembrane segments. Computer analysis shows that these protein 
probably arose by a tandem, intragenic duplication event from an ancestral protein that 
contained three transmembrane segments. As a signature pattern a well conserved region was 
selected which is located in a probable cytoplasmic loop between the second and third 
transmembrane regions. 

Consensus pattern: [HNQA]-x-N-P-[STA]-[LIVMF]-[ST]-[LIVMF]-[GSTAFY]- 

[ 1] Reizer J., Reizer A., Saier M.H. Jr. CRC Crit. Rev. Biochem. 28:235-257(1993). 
[ 2] Baker M.E., Saier M.H. Jr. Cell 60:185-1 86(1990^ 

[ 3] Pao G.M., Wu L.-F., Johnson K.D., Hoefte H., Chrispeels M.J., Sweet G., Sandal N.N., 
Saier M.H. Jr. Mol. Microbiol. 5:33-37(1991). 

[ 4] Wistow G.J., Pisano M.M., Chepelinsky A.B. Trends Biochem. Sci. 16:170-171(1991). 
[ 5] Chrispeels M.J., Agre P. Trends Biochem. Sci. 19:421-425(1994). 

343. Mandelate racemase / muconate lactonizing enzyme family signatures 
Mandelate racemase (EC 5.1.2.2) (MR) and muconate lactonizing enzyme(EC 5.5.1.1 ) 
(MLE) are two bacterial enzymes involved in aromatic acid catabolism. They catalyze 
mechanistically distinct reactions yet they are related at the level of their primary, quaternary 
(homooctamer) and tertiary structures [1,2] .A number of other proteins also seem to be 
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evolutionary related to these two enzymes. These are: - The various plasmid-encoded 
chloromuconate cycloisomerases (EC 5.5.1.7) . - Escherichia coli protein rspA [3], rspA 
seems to be involved in the degradation of homoserine lactone (HSL) or of one of its 
metabolite. - Escherichia coli hypothetical protein ycjG. - Escherichia coli hypothetical 
protein yidU. - A hypothetical protein from Streptomyces ambofaciens [4]. Two signature 
patterns have been developed for these enzymes; both contain conserved acidic residues. The 
second pattern contains an aspartate and a glutamate which are ligands for either a 
magnesium ion (in MR) or a manganese ion (inMLE). 

Consensus pattern: A-x-[SAGCN]-[SAG]-[LIVM]-[DEQ]-x-A-[LA]-x-[DE]-[LIA]-x- [GA]- 
[KRQ]-x(4)-[PSA]-[LIV]-x(2)-L-[LIVMF]-G- 

Consensus pattern: [LIVF]-x(2)-D-x-[NH]-x(7)-[ACL]-x(6)-[LIVMF]-x(7)-[LIVM]- E- 
[DENQ]-P [D and E bind a divalent metal ion]- 

[ 1] Neidhart D.J., Kenyon G.L., Gerlt J.A., Petsko G.A. Nature 347:692-694(1990). 

[ 2] Petsko G.A., Kenyon G.L., Gerlt J.A., Ringe D., Kozarich J.W. Trends Biochem. Sci. 

18:372-376(1993). 

[ 3] Huisman G.W., Kolter R. Science 265:537-539(1994). 

[ 4] Schneider D., Aigle B., Leblond P., Simonet J.M., Decaris B. J. Gen. Microbiol. 
139:2559-2567(1993). 

344. Merozoite Surface Antigen 2 (MSA-2) family 

Thomas AW, Carr DA, Carter JM, Lyon JA, Mol Biochem Parasitol 1990;43:211- 

220. 

345. MSP (Major sperm protein) domain. 

Major sperm proteins are involved in sperm motility. These proteins oligomerise to 
form filaments. Partial matches to this domain are also found in other non MSP proteins. 
These include Swiss:P40075 and Swiss:P34593. 

[1] Bullock TL, Roberts TM, Stewart M, J Mol Biol 1996;263:284-296. [2] King KL, 
Stewart M, Roberts TM, Seavy M, J Cell Sci 1992;101:847-857. 
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346. (Matrix) Viral matrix protein. Found in Morbillivirus and paramyxovirus, pneumovirus. 
Number of members: 105 

347. O-methyltransferase (methyltransf) 

This family includes a range of O-methyltransferases. These enzymes utilise S- 
adenosyl methionine. 

[1] Keller NP, Dischinger HC, Bhatnagar D, Cleveland TE, Ullah AH, Appl Environ 
Microbiol 1993;59:479-484. 

348. Magnesium chelatase, subunit Chll 

Magnesium-chelatase is a three-component enzyme that catalyses the insertion of 
Mg2+ into protoporphyrin IX. This is the first unique step in the synthesis of 
(bacterio)chlorophyll. Due to this, it is thought that Mg-chelatase has an important role in 
channeling inter- mediates into the (bacterio)chlorophyll branch in response to conditions 
suitable for photosynthetic growth. Chll and BchD have molecular weight between 38-42 
kDa. 

[1] Walker CJ, Willows RD, Biochem J 1997;327:321-333. [2] Petersen BL, Jensen 
PE, Gibson LC, Stummann BM, Hunter CN, Henningsen KW, J Bacteriol 1998;180:699-704. 

349. Plasmid recombination enzyme (Mob_Pre) 

With some plasmids, recombination can occur in a site specific manner that is 
independent of RecA. In such cases, the recombination event requires another protein called 
Pre. Pre is a plasmid recombination enzyme. This protein is: also known as Mob (conjugative 
mobilization). 

[1] Priebe SD, Lacks SA, J Bacteriol 1989;171:4778-4784. 



350. Monooxygenase 

This family includes diverse enzymes that utilise FAD. 
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[1] Gatti DL, Palfey BA, Lah MS, Entsch B, Massey V, Ballou DP, Ludwig ML, 
Science 1994;266:110-114. 

351. Mov34 family 

Members of this family are found in proteasome regulatory subunits, eukaryotic 
initiation factor 3 (eIF3) subunits and regulators of transcription factors. 

[1] Aravind L, Ponting CP, Protein Sci 1998;7:1250-1254. [2] Hershey JW, Asano K, 
Naranda T, Vornlocher HP, Hanachi P, Merrick WC, Biochimie 1996;78:903-907. 

352. Myc amino-terminal region (Myc_N_term) 

The myc family belongs to the basic helix-loop-helix leucine zipper class of 
transcription factors, see HLH . Myc forms a heterodimer with Max, and this complex 
regulates cell growth through direct activation of genes involved in cell replication [2]. 

[1] Facchini LM, Penn LZ ? FASEB J 1998;12:633-651. [2] Grandori C, Eisenman 
RN, Trends Biochem Sci 1997;22:177-181. 

353. (Metallothio_2) Metallothionein. Members of this family are metallothioneins. These 
proteins are cysteine rich proteins that bind to heavy metals. Members of this family appear 
to be closest to Class II metallothioneins, seed metalthio. Number of members: 55 

[1] Medline: 98267202. Characterization of gene repertoires at mature stage of citrus fruits 
through random sequencing and analysis of redundant metallothionein-like genes expressed 
during fruit development. Moriguchi T, Kita M, Hisada S, Endo-Inagaki T, Omura M; Gene 
1998;211:221-227. 

354. MAGE family 

The MAGE (melanoma antigen-encoding gene) family are expressed 
in a wide variety of tumors but not in normal cells, with the 
exception of the male germ cells, placenta, and, possibly, 
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cells of the developing embryo. The cellular function of 
this family is unknown. 

[1] McCurdy DK, Tai LQ, Nguyen J, Wang Z, Yang HM, Udar N, 
Naiem F, Concannon P, Gatti RA; Mol Genet Metab 1998;63:3-13. 

355. Malic enzymes signature. Malic enzymes, or malate oxidoreductases, catalyze the 
oxidative decarboxylation of malate into pyruvate important for a wide range of metabolic 
pathways. There are three related forms of malic enzyme [1,2,3]: - NAD-dependent malic 
enzyme (EC 1.1.138), which uses preferentially NAD and has the ability to decarboxylate 
oxaloacetate (OAA). It is found in bacteria and insects. - NAD-dependent malic enzyme (EC 
1.1.1,39 ), which uses preferentially NAD and is unable to decarboxylate OAA. It is found in 
the mitochondrial matrix of plants and is a heterodimer of highly related subunits. - NADP- 
dependent malic enzyme (EC 1,1.1.40 ), which has a preference for NADP and has the ability 
to decarboxylate OAA. This form has been found in fungi, animals and plants. In mammals, 
there are two isozymes: one, mitochondrial and the other, cytosolic. Plants also have two 
isozymes: chloroplastic and cytosolic. There are two other proteins which are closely 
structurally related to malicenzymes: - Escherichia coli protein sfcA, whose function is not 
yet known but which could be an NAD or NADP-dependent malic enzyme. - Yeast 
hypothetical protein YKL029c, a probable malic enzyme. There are three well conserved 
regions in the enzyme sequences. Two of them seem to be involved in binding NAD or 
NADP. The significance of the third one, located in the central part of the enzymes, is not yet 
known. This region has been developed as a signature pattern for these enzymes. 

Consensus pattern: F-x-[DV]-D-x(2)-G-T-[GSA]-x-[IV]-x-[LIVMA]-[GAST](2)- 
[LIVMF](2> 

[ 1] Artus N.N., Edwards G.E. FEBS Lett. 182:225-233(1985).[ 2] Loeber G., Infante A.A., 
Maurer-Fogy I., Krystek E., Dworkin MJB. J. BioL Chem. 266:3016-3021(1991). [ 3] Long 
J.J., Wang J.-L., Berry J.O. J. Biol. Chem. 269:2827-2833(1994). 



356. (matrixin) 
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Matrixins cysteine switch (aka peptidase_M10) 

Mammalian extracellular matrix metalloproteinases (EC 3.4.24.-), also known as matrixins 
[1] (see <PDOC00129>), are zinc-dependent enzymes. They are secreted by cells in an 
inactive form (zymogen) that differs from the mature enzyme by the presence of an N- 
terminal propeptide. A highly conserved octapeptide is found two residues downstream of the 
C-terminal end of the propeptide. This region has been shown to be involved in 
autoinhibition of matrixins [2,3]; a cysteine within the octapeptide chelates the active site 
zinc ion, thus inhibiting the enzyme. This region has been called the 'cysteine switch' or 
'autoinhibitor region'. 

A cysteine switch has been found in the following zinc proteases: 

- MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24.17) (stromelysin-1). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

- MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 

- MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). 

- MMP-10 (EC 3.4.24.22) (stromelysin-2). 

- MMP-11 (EC 3.4.24.-) (stromelysin-3). 

- MMP-12 (EC 3.4.24.65) (macrophage metalloelastase). 

- MMP-13 (EC 3.4.24.-) (collagenase 3). 

- MMP-14 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 1). 

- MMP-15 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 2). 

- MMP-16 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 3). 

- Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4]. 

- Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5]. 

Consensus patternP-R-C-[GN]-x-P-[DR]-[LIVSAPKQ] [C chelates the zinc ion] Sequences 
known to belong to this class detected by the pattern ALL, except for cat MMP-7 and mouse 
MMP-11. 
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[ 1] Woessner J. Jr. FASEB J. 5:2145-2154(1991). 

[ 2] Sanchez-Lopez R., Nicholson R., Gesnel M.C., Matrisian L.M., Breathnach R. J. Biol. 
Chem. 263:11892-11899(1988). 

[ 3] Park A.J., Matrisian L.M., Kells A.F., Pearson R., Yuan Z., Navre M. J. Biol. Chem. 
266:1584-1590(1991). 

[ 4] Lepage T., Gache C. EMBO J. 9:3003-3012(1990). 

[ 5] Kinoshita T., Fukuzawa H., Shimada T., Saito T., Matsuda Y. Proc. Natl. Acad. Sci. 
U.S.A. 89:4693-4697(1992). 

357. Vertebrate metallothioneins signature (metalthio) 

Metallothioneins (MT) [1,2,3] are small proteins which bind heavy metals such as zinc, 
copper, cadmium, nickel, etc., through clusters of thiolate bonds. MT's occur throughout the 
animal kingdom and are also found in higher plants, fungi and some prokaryotes. On the 
basis of structural relationships MT's have been subdivided into three classes. Class I includes 
mammalian MT's as well as MT's from crustacean and molluscs, but with clearly related 
primary structure. Class II groups together MT's from various species such as sea urchins, 
fungi, insects and cyanobacteria which display none or only very distant correspondence to 
class I MT's. Class III MT's are atypical polypeptides containing gamma-glutamylcysteinyl 
units. Vertebrate class I MT's are proteins of 60 to 68 amino acid residues, 20 of these 
residues are cysteines that bind to 7 bivalent metal ions. As a signature pattern a region that 
spans 19 residues and which contains seven of the metal-binding cysteines was chosen, this 
region is located in the N-terminal section of class-I MT's. 

Consensus pattern: C-x-C-[GSTAP]-x(2)-C-x-C-x(2)-C-x-C-x(2)-C-x-K- 

[ 1] Hamer D.H. Annu. Rev. Biochem. 55:913-951(1986). 

[ 2] Kagi J.H.R., Schaffer A. Biochemistry 27:8509-8515(1988). 

[ 3] Binz P.-A. Thesis, 1996, University of Zurich. 



358. Mitochondrial energy transfer proteins signature (mito_ carr) 
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Different types of substrate carrier proteins involved in energy transfer are found in the inner 
mitochondrial membrane [1 to 5], These are: - The ADP,ATP carrier protein (AAC) 
(ADP/ATP translocase) which exports ATP into the cytosol and imports ADP into the 
mitochondrial matrix. The sequence of AAC has been obtained from various mammalian, 
plant and fungal species. - The 2-oxoglutarate/malate carrier protein (OGCP), which exports 
2-oxoglutarate into the cytosol and imports malate or other dicarboxylic acids into the 
mitochondrial matrix. This protein plays an important role in several metabolic processes 
such as the malate/aspartate and the oxoglutarate/isocitrate shuttles. - The phosphate carrier 
protein, which transports phosphate groups from the cytosol into the mitochondrial matrix. - 
The brown fat uncoupling protein (UCP) which dissipates oxidative energy into heat by 
transporting protons from the cytosol into the mitochondrial matrix. - The tricarboxylate 
transport protein (or citrate transport protein) which is involved in citrate-H+/malate 
exchange. It is important for the bioenergetics of hepatic cells as it provides a carbon source 
for fatty acid and sterol biosyntheses, and NAD for the glycolytic pathway. - The Grave's 
disease carrier protein (GDC), a protein of unknown function recognized by IgG in patients 
with active Grave's disease. - Yeast mitochondrial proteins MRS3 and MRS4. The exact 
function of these proteins is not known. They suppress a mitochondrial splice defect in the 
first intron of the COB gene and may act as carriers, exerting their suppressor activity by 
modulating solute concentrations in the mitochondrion. - Yeast mitochondrial FAD carrier 
protein (gene FLX1). - Yeast protein ACR1 [6], which seems essential for acetyl-CoA 
synthetase activity. - Yeast protein PET8. - Yeast protein PMT. - Yeast protein RIM2. - Yeast 
protein YHM1/SHM1. - Yeast protein YMC1. - Yeast protein YMC2. - Yeast hypothetical 
proteins YBR291c, YEL006w, YER053c, YFR045w, YHR002w, and YIL006w. - 
Caenorhabditis elegans hypothetical protein KllH3.3.Two other proteins have been found to 
belong to this family, yet are not localized in the mitochondrial inner membrane: - Maize 
amyloplast Brittle-1 protein. This protein, found in the endosperm of kernels, could play a 
role in amyloplast membrane transport. - Candida boidinii peroxisomal membrane protein 
PMP47 [7]. PMP47 is an integral membrane protein of the peroxisome and it may play a role 
as a transporter. These proteins all seem to be evolutionary related. Structurally, they 
consistof three tandem repeats of a domain of approximately one hundred residues. Each of 
these domains contains two transmembrane regions. As a signature pattern, one of the most 
conserved regions in the repeated domain was selected, located just after the first 
transmembrane region. 
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Consensus pattern: P-x-[DE]-x-[LIVAT]-[RK]-x-[LRH]-[LIVMFY]-[QGAIVM]- 

[ 1] Klingenberg M. Trends Biochem. Sci. 15:108-112(1990). 

[ 2] Walker J.E. Curr. Opin. Struct. Biol. 2:519-526(1992). 

[ 3] Kuan J., Saier M.H. Jr. CRC Crit. Rev. Biochem. 28:209-233(1993). 

[ 4] Kuan J., Saier M.H. Jr. Res. Microbiol. 144:671-672(1993). 

[ 5] Nelson D.R., Lawson J.E., Klingenberg M., Douglas M.G. J. Mol. Biol. 230:1159- 

1170(1993). 

[ 6] Palmieri F. FEBS Lett. 346:48-54(1994). 

[ 7] Jank B., Habermann B., Schweyen R.J., Link TA. Trends Biochem. Sci. 18:427- 
428(1993). 

359. Prokaryotic molybdopterin oxidoreductases signatures (molybdopterin) 
A number of different prokaryotic oxidoreductases that require and bind amolybdopterin 
cofactor have been shown [1,2,3] to share a number of regions of sequence similarity. These 
enzymes are: - Escherichia coli respiratory nitrate reductase (EC 1.7.99.4). This enzyme 
complex allows the bacteria to use nitrate as an electron acceptor during anaerobic growth. 
The enzyme is composed of three different chains: alpha, beta and gamma. The alpha chain 
(gene narG) is the molybdopterin-binding subunit. Escherichia coli encodes for a second, 
closely related, nitrate reductase complex which also contains a molybdopterin-binding alpha 
chain (gene narZ). - Escherichia coli anaerobic dimethyl sulfoxide reductase (DMSO 
reductase). DMSO reductase is the terminal reductase during anaerobic growth on various 
sulfoxide and N-oxide compounds. DMSO reductase is composed of three chains: A, B and 
C. The A chain (gene dmsA) binds molybdopterin. - Escherichia coli biotin sulfoxide 
reductases (genes bisC and bisZ). This enzyme reduces a spontaneous oxidation product of 
biotin, BDS, back to biotin. It may serve as a scavenger, allowing the cell to use biotin 
sulfoxide as a biotin source. - Methanobacterium formicicum formate dehydrogenase (EC 
1.2.1.2) . The alpha chain (gene fdhA) of this dimeric enzyme binds a molybdopterin cofactor. 
- Escherichia coli formate dehydrogenases -H (gene fdhF), -N (gene fdnG) and -O (gene 
fdoG). These enzymes are responsible for the oxidation of formate to carbon dioxide. In 
addition to molybdopterin, the alpha (catalytic) subunit also contains an active site, 
selenocysteine. - Wolinella succinogenes polysulfide reductase chain. This enzyme is a 
component of the phosphorylative electron transport system with polysulfide as the terminal 
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acceptor. It is composed of three chains: A, B and C. The A chain (gene psrA) binds 
molybdopterin. - Salmonella typhimurium thiosulfate reductase (gene phsA). - Escherichia 
coli trimethylamine-N-oxide reductase (EC 1.6.6.9 ) (gene tor A) [4]. - Nitrate reductase (EC 
1 .7.99.4 ^ from Klebsiella pneumoniae (gene nasA), Alcaligenes eutrophus, Escherichia coli, 
Rhodobacter sphaeroides, Thiosphaera pantotropha (gene napA), and Synechococcus PCC 
7942 (gene narB).These proteins range from 715 amino acids (fdhF) to 1246 amino acids 
(narZ) insize. Three signature patterns for these enzymes were derived. The first is based on a 
conserved region in the N-terminal section and contains two cysteine residues perhaps 
involved in binding the molybdopterin cofactor. It should be noted that this region is not 
present in bisC. The second pattern is derived from a conserved region located in the central 
part of these enzymes. 

Consensus pattern: [STAN]-x-[CH]-x(2,3)-C-[STAG]-[GSTVMF]-x-C-x-[LIVMFYW]-x- 
[LIVMA]-x(3,4)-[DENQKHT]- 

Consensus pattern: [STA]-x-[STAC](2)-x(2)-[STA]-D-[LIVMY](2)-L-P-x-[STAC](2)- x(2)- 
E- 

Consensus pattern: A-x(3)-[GDT]-I-x-[DNQTK]-x-[DEA]-x-[LIVM]-x-[LIVMC]-x- [NS]- 
x(2)-[GS]-x(5)-A-x-[LIVM]-[ST]- 

[ 1] Wootton J.C., Nicolson R.E., Cock J.M., Walters D.E., Burke J.F., Doyle W.A., Bray 
R.C. Biochim. Biophys. Acta 1057:157-185(1991). 

[ 2] Bilous P.T., Cole S.T., Anderson W.F., Weiner J.H. Mol. Microbiol. 2:785-795(1988). 
[ 3] Trieber C.A., Rothery RA, Weiner J.H. J. Biol. Chem. 269:7103-7109(1994). 
[ 4] Mejean V., Lobbi-Nivol C, Lepelletier M., Giordano G., Chippaux M., Pascal M.-C. 
Mol. Microbiol. 11:1169-1179(1994). 



360. Bacterial mutT domain signature 

The bacterial mutT protein is involved in the GO system [1] responsible for removing an 
oxidatively damaged form of guanine (8-hydroxyguanine or7,8-dihydro-8-oxoguanine) frorr 
DNA and the nucleotide pool. 8-oxo-dGTP is inserted opposite to dA and dC residues of 
template DNA with almost equal efficiency thus leading to A.T to G.C transversions. MutT 
specifically degrades 8-oxo-dGTP to the monophosphate with the concomitant release of 
pyrophosphate. MutT is a small protein of about 12 to 15 Kd. It has been shown [2,3] that a 
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region of about 40 amino acid residues, which is found in the N-terminal part of mutT, can 
also be found in a variety of other prokaryotic, viral, and eukaryotic proteins. These proteins 
are: 

Streptomyces pneumoniae mutX. 

- A mutT homolog from plasmid pSAM2 of Streptomyces ambofaciens. 

- Bartonella bacilliformis invasion protein A (gene invA). 

- Escherichia coli dATP pyrophosphohydrolase. 

- Protein D250 from African swine fever viruses. 

- Proteins D9 and D10 from a variety of poxviruses. 

- Mammalian 7 ? 8-dihydro-8-oxoguanine triphosphatase (EC 3.1.6.-) [4]. 

- Mammalian diadenosine 5V5 m -Pl,P4-tetraphosphate asymmetrical hydrolase 
(Ap4Aase) (EC 3.6.1.17 ) [5], which cleaves A-5'-PPPP-5 A to yield AMP and 
ATP. 

- A protein encoded on the antisense RNA of the basic fibroblast growth factor 
gene in higher vertebrates. 

Yeast protein YS Al . 

- Escherichia coli hypothetical protein yfaO. 

- Escherichia coli hypothetical protein ygdU and HI0901, the corresponding 
Haemophilus influenzae protein. 

- Escherichia coli hypothetical protein yjaD and HI0432, the corresponding 
Haemophilus influenzae protein. 

- Escherichia coli hypothetical protein yrfE. 

- Bacillus subtilis hypothetical protein yqkG. 

- Bacillus subtilis hypothetical protein yzgD. 

- Yeast hypothetical protein YGL067w. 

It is proposed [2] that the conserved domain could be involved in the active center of 
a family of pyrophosphate-releasing NTPases. As a signature pattern the core region of the 
domain was selected; it contains four conserved glutamate residues. 

Consensus pattern: G-x(5)-E-x(4)-[STAGC]-[LIVMAC]-x-R-E-[LIVMFT]-x-E-E- 



[1] Michaels MX., Miller J.H. J. Bacteriol. 174:6321-6325(1992). 
[2] Koonin E.V. Nucleic Acids Res. 21:4847-4847(1993). 
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[3] Mejean V., Salles C, Bullions M.J., Bessman M.J., Claverys J.-P. Mol. Microbiol. 
11:323-330(1994). 

[4] Sakumi K., Furuichi M., Tsuzuki T., Kakuma T., Kawabata S., Maki H., Sekiguchi M. J. 
Biol. Chem. 268:23524-23530(1993). 

[5] Thome N.M.H., Hankin S., Wilkinson M.C., Nunez C, Barraclough R., McLennan A.G. 
Biochem. J. 311:717-721(1995). 

361. Myb DNA-binding domain repeat signatures 

The retroviral oncogene v-myb , and its cellular counterpart c-myb, encodenuclear DNA- 
binding proteins that specifically recognize the sequence YAAC(G/T)G [1]. The myb family 
also includes the following proteins: - Drosophila D-myb [2]. - Vertebrate myb-like proteins 
A-myb and B-myb [3]. - Maize CI protein, a trans-acting factor which controls the 
expression of genes involved in anthocyanin biosynthesis. - Maize P protein [4], a trans- 
acting factor which regulates the biosynthetic pathway of a flavonoid-derived pigment in 
certain floral tissues. - Arabidopsis thaliana protein GL1 [5], required for the initiation of 
differentiation of leaf hair cells (trichomes). - A number of myb/cl -related proteins in maize 
and barley, whose roles are not yet known [4]. - Yeast BAS1 [7], a transcriptional activator 
for the HIS4 gene. - Yeast REB1 [8], which recognizes sites within both the enhancer and the 
promoter of rRNA transcription, as well as upstream of many genes transcribed by RNA 
polymerase II. - Fission yeast cdc5, a possible transcription factor whose activity is required 
for cell cycle progression and growth during G2. - Fission yeast mybl, which regulates 
telomere length and function. - Yeast hypothetical protein YMR213w.One of the most 
conserved regions in all of these proteins is a domain of 160amino acids. It consists of three 
tandem repeats of 51 to 53 amino acids. In myb, this repeat region has been shown [9] to be 
involved in DNA-binding. The major part of the first repeat is missing in retroviral v-myb 
sequences and in plant myb-related proteins. Yeast REB1 differs from the other proteins in 
this family in having a single myb-like domain. As shown in the following schematic 
representation, two signature patterns for myb-like domains were developed; the first is 
located in the N-terminal section, the second spans the C-terminal extremity of the domain. 
xxxxxxxxxWxxxEDxxxxxxxxxxxxxxWxxIxxxxxxRxxxxxxxxWxxxx ********* 
************************'*' : Position of the patterns. 



Consensus pattern: W-[ST]-x(2)-E-[DE]-x(2)-[LIV]- 

Consensus pattern: W-x(2)-[LI]-[SAG]-x(4,5)-R-x(8)-[YW]-x(3)-[LIVM]- 
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Note: this pattern detects the three copies of the domain in myb, d-myb, A-myb and B-myb; 
the second of the two complete copies of plant myb-related proteins, and the last two copies 
of yeastBASl 

[ 1] Biednkapp H., Borgmeyer U., Sippel A.E., Klempnauer K.-H. Nature 335:835- 
837(1988). 

[ 2] Peters C.W.B., Sippel A.E., Vingron M., Klempnauer K.-H. EMBO J. 6:3085- 
3090(1987). 

[ 3] Nomura N., Takahashi M., Matsui M., Ishii S., Date T., Sasamoto S., Ishizaki R. Nucleic 
Acids Res. 16:11075-11090(1988). 

[ 4] Grotewold E., Athma P., Peterson T. Proc. Natl. Acad. Sci. U.S.A. 88:4587-4591(1991). 
[ 5] Oppenheimer D.G., Herman P.L., Sivakumaran S., Esch J., Marks M.D. Cell 67:483- 
493(1.991). 

[ 6] Marocco A., Wissenbach M., Becker D., Paz-Ares J., Saedler H., Salamini F., Rohde W. 
Mol. Gen. Genet. 216:183-187(1989). 

[ 7] Tice-Baldwin K., Fink G.R., Arndt K.T. Science 246:931-935(1989). 
[ 8] Ju Q., Morrow B.E., Warner J.R. Mol. Cell. Biol. 10:5226-5234(1990). 
[ 9] Klempnauer K.-H., Sippel A.E. EMBO J. 6:2719-2725(1987). 

362. NAD-dependent glycerol-3-phosphate dehydrogenase signature 
NAD-dependent glycerol-3-phosphate dehydrogenase (EC 1.1.1.8) (GPD) catalyzes the 
reversible reduction of dihydroxy acetone phosphate to glycerol-3- phosphate. It is a 
eukaryotic cytosolic homodimeric protein of about 40 Kd. As a signature pattern a glycine- 
rich region that is probably [1] involved in NAD-binding was selected. 

Consensus pattern: G-[AT]-[LIVM]-K-[DN]-[LIVM](2)-A-x-[GA]-x-G-[LIVMF]-x- [DE]- 
G-[LIVM]-x-[LIVMFYW]-G-x-N- 

[ 1] Otto J., Argos P., Rossmann M.G. Eur. J. Biochem. 109:325-330(1980). 



363. Nucleosome assembly protein (NAP) 
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It is thought that NAPs may be involved in regulating gene expression as a result of 

histone accessibility [1]. 

[1] Rodriguez P, Munroe D, Prawitt D, Chu LL, Brie E, Kim J, Reid LH, Davies C, 
Nakagama H, Loebbert R, Winterpacht A, Petruzzi MJ, Higgins MJ, Nowak N, Evans G, 
Shows T, Weissman BE, Zabel B, Housman DE, Pelletier J, Genomics 1997;44:253-265. [2] 
Schnieders F, Dork T, Arnemann J, Vogel T, Werner M, Schmidtke J; Hum Mol Genet 
1996;5:1801-1807. 

364. NB-ARC domain 

van der Biezen EA, Jones JD, Curr Biol 1998;8:226-227. 

365. Nucleoside diphosphate kinases active site 

Nucleoside diphosphate kinases (EC 2.7.4.6 ) (NDK) [1] are enzymes required for the 
synthesis of nucleoside triphosphates (NTP) other than ATP. They provide NTPs for nucleic 
acid synthesis, CTP for lipid synthesis, UTP for polysaccharide synthesis and GTP for 
protein elongation, signal transduction and microtubule polymerization. In eukaryotes, there 
seems to be a small family of NDK isozymes each of which acts in a different subcellular 
compartment and/or has a distinct biological function. Eukaryotic NDK isozymes are 
hexamers of two highly related chains (Aand B) [2]. By random association (A6, A5B...AB5, 
B6), these two kinds of chain form isoenzymes differing in their isoelectric point. NDK are 
proteins of 17 Kd that act via a ping-pong mechanism in which a histidine residue is 
phosphorylated, by transfer of the terminal phosphate group from ATP. In the presence of 
magnesium, the phosphoenzyme can transfer its phosphate group to any NDP, to produce an 
NTP.NDK isozymes have been sequenced from prokaryotic and eukaryotic sources. It has 
also been shown [3] that the Drosophila awd (abnormal wing discs) protein, is a microtubule- 
associated NDK. Mammalian NDK is also known as metastasis inhibition factor nm23.The 
sequence of NDK has been highly conserved through evolution. There is a single histidine 
residue conserved in all known NDK isozymes, which is involved in the catalytic mechanism 
[2]. Our signature pattern contains this residue. 

Consensus pattern: N-x(2)-H-[GA]-S-D-[SA]-[LIVMPKNE] [H is the putative active site 
residue]- 
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[ 1] Parks R., Agarwal R. (In) The Enzymes (3rd edition) 8:307-334(1973). 

[ 2] Gilles A.-M., Presecan R, Vonica A., Lascu I. J. Biol. Chem. 266:8784-8789(1991). 

[ 3] Biggs J., Hersperger E., Steeg P.S., Liotta L.A., Shearn A. Cell 63:933-940(1990). 

366. Nitrite and sulfite reductases iron-sulfur/siroheme-binding site (NIRSIR) 
Nitrite reductases (NiR) [1] catalyze the reduction of nitrite into ammonium, the second step 
in the assimilation of nitrate. There are two types of NiR: the higher plant chloroplastic form 
of NiR (EC 1.7.7.1 ) is a monomeric protein that uses reduced ferredoxin as the electron 
donor; while fungal and bacterial NiR (EC 1.6.6.4 ) are homodimeric proteins that uses 
NAD(P)H as the electron donor. Both forms of NiR contain a siroheme-Fe and iron-sulfur 
centers. Sulfite reductase (NADPH) (EC 1.8.1.2) (SIR) [2] is the bacterial enzyme that 
catalyzes the reduction of sulfite to sulfide. SIR is an oligomeric enzyme with a subunit 
composition of alpha(8)-beta(4), the alpha component is a flavoprotein (SIR-FP), while the 
beta component is a siroheme, iron-sulfurprotein (SIR-HP).Sulfite reductase (ferredoxin) (EC 
1.8.7.1 ) [3] is a cyanobacterial and plant monomeric enzyme that also catalyzes the reduction 
of sulfite to sulfide. Anaerobic sulfite reductase (EC 1.8.1.-) (ASR) [4], a bacterial enzyme 
that catalyzes the NADH-dependent reduction of sulfite to sulfide. ASR is an oligomeric 
enzyme composed of three different subunits. The C component (geneasrC) seems to be a 
siroheme, iron-sulfur protein. These enzymes share a region of sequence similarity in their C- 
terminal half; this region which spans about 80 amino acids includes four conserved cysteine 
residues. Two of the Cys are grouped together at the beginning of the domain, and the two 
others are grouped in the middle of the domain. The cysteines are involved in the binding of 
the iron-sulfur center; the last one also binds the siroheme group [2]. A signature pattern from 
the region around the second cluster of cysteines was derived. 

Consensus pattern: [STV]-G-C-x(3)-C-x(6)-[DE]-[LIVMF]-[GAT]-[LIVMF] [The two C's 
are ison-sulfur ligands]- 

[ 1] Campbell W.H., Kinghorn J.R. Trends Biochem. Sci. 15:315-319(1990). 

[ 2] Crane B.R., Siegel L.M., Getzoff E.D. Science 270:59-67(1995). 

[ 3] Gisselmann G., Klausmeier P., Schwenn J.D. Biochim. Biophys. Acta 1144:102- 

106(1993). 
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[ 4] Huang C.J., Barrett E.L. J. Bacteriol. 173:1544-1553(1991). 

367. (NMT) Myristoyl-CoArprotein N-myristoyltransferase signatures. Myristoyl-CoA: 
protein N-myristoyltransferase (EC 2.3.1.97) (Nmt) [1] is the enzyme responsible for 
transferring a myristate group on the N-terminal glycine of a number of cellular eukaryotic 
and viral proteins. Nmt is a monomeric protein of about 50 to 60 Kd whose sequence appears 
to be well conserved. Two highly conserved regions have been developed as signature 
patterns. The first one is located in the central section, the second in the C-terminal part. 

Consensus pattern: E-I-N-F-L-C-x-H-K- 
Consensus pattern: K-F-G-x-G-D-G- 

[ 1] Rudnick D.A., McWherter C.A., Gokel G.W., Gordon J.I. Adv. Enzymol. 67:375- 
430(1993). 

368. ADP-glucose pyrophosphorylase signatures (NTPjransferase) 

ADP-glucose pyrophosphorylase (glucose- 1 -phosphate adenylyltransferase) [1,2](EC 
in.l.TI\ catalyzes a very important step in the biosynthesis of alpha 1,4-glucans (glycogen 
or starch) in bacteria and plants: synthesis of the activated glucosyl donor, ADP-glucose, 
from glucose-l-phosphate and ATPADP-glucose pyrophosphorylase is a tetrameric 
allosterically regulated enzyme. It is a homotetramer in bacteria while in plant chloroplasts 
and amyloplasts, it is a heterotetramer of two different, yet evolutionary related, subunits. 
There are a number of conserved regions in the sequence of bacterial and plant ADP-glucose 
pyrophosphorylase subunits. Three of these regions were selected as signature patterns. The 
first two are N-terminal and have been proposed to be part of the allosteric and/or substrate- 
binding sites in the Escherichia coli enzyme (gene glgC). The third pattern corresponds to a 
conserved region in the central part of the enzymes. 



Consensus pattern: [AG]-G-G-x-G-[STK]-x-L-x(2)-L-[TA]-x(3)-A-x-P-A-[LV] - 

Consensus pattern: W-[FY]-x-G-[ST]-A-[DNSH]-[AS]-[LIVMFYW]- 

Consensus pattern: [APV]-[GS]-M-G-[LIVMN]-Y-[IVC]-[LIVMFY]-x(2)-[DENPHK] - 
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[ 1] Nakata P.A., Greene T.W., Anderson J.M., Smith- White B J., Okita T.W., Preiss J. Plant 
MoL Biol. 17:1089-1093(1991). 

[ 2] Preiss J., Ball K., Hutney J., Smith- White B J., Li. L., Okitsa T.W. Pure Appl. Chem. 
63:535-544(1991). 

369. Sodium/hydrogen exchanger family 

Na/H antiporters are key transporters in maintaining the 
pH of actively metabolizing cells. The molecular mechanisms 
of antiport are unclear. 

These antiporters contain 10-12 transmembrane regions (M) at the 
amino-terminus and a large cytoplasmic region at the carboxyl 
terminus. The transmembrane regions M3-M12 share identity with 
other members of the family. The M6 and M7 regions are highly 
conserved. Thus, this is thought to be the region that is involved 
in the transport of sodium and hydrogen ions. The cytoplasmic 
region has little similarity throughout the family. 

[1] Dibrov P, Fliegel L; FEBS Lett 1998;424:1-5. [2] Orlowski J ? Grinstein S; J Biol 
Chem 1997;272:22373-22376.[3] Numata M, Petrecca K, Lake N, Orlowski J; J Biol Chem 
1998;273:6951-6959. 

370. Sodium:sulfate symporter family signature (Na_sulph_symp) 

Integral membrane proteins that mediate the intake of a wide variety of molecules with the 
concomitant uptake of sodium ions (sodium symporters) canbe grouped, on the basis of 
sequence and functional similarities into a number of distinct families. One of these families 
currently consists of the following proteins: - Mammalian sodium/sulfate cotransporter [1]. - 
Mammalian renal sodiurn/dicarboxylate cotransporter [2], which transports succinate and 
citrate. - Mammalian intestinal sodium/dicarboxylate cotransporter. - Chlamydomonas 
reinhardtii putative sulfur deprivation response regulator SAC1 [3]. - Caenorhabditis elegans 
hypothetical proteins B0285.6, F31F6.6, K08E5.2 and R107.1. - Escherichia coli hypothetical 
protein yfbS. - Haemophilus influenzae hypothetical protein HI0608. - Synechocystis strain 
PCC 6803 hypothetical protein sll0640. - Methanococcus jannaschii hypothetical protein 
MJ0672.These transporters are proteins of from 430 to 620 amino acids which are highly 
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hydrophobic and which probably contain about 12 transmembrane regions. As a signature 
pattern, a conserved region was selected which is located in or near the penultimate 
transmembrane region. 

Consensus pattern: [STACP]-S-x(2)-F-x(2)-P-[LIVM]-[GSA]-x(3)-N-x-[LIVM]-V- 

[ 1] Markovich D., Forgo J., Stange G., Biber J., Murer H. Proc. Natl. Acad. Sci. U.S.A. 
90:8073-8077(1993). 

[ 2] Pajor A.M. Am. J. Physiol. 270:642-648(1996). 

[ 3] Davies J.P., Yildiz F.H., Grossman A. EMBO J. 15:2150-2159(1996). 

371. NifU-like domain 

This is an alignment of the carboxy-terminal domain. This is the only common region 
between the NifU protein from nitrogen-fixing bacteria and rhodobacterial species. The 
biochemical function of NifU is unknown [1]. 

Ouzounis C, Bork P, Sander C, Trends Biochem Sci 1994;19:199-200. 

372. Nitrilases / cyanide hydratase signatures 

Nitrilases (EC 3.5.5.1 ) are enzymes that convert nitriles into their corresponding acids and 
ammonia. They are widespread in microbes as well as in plants where they convert indole-3- 
acetonitrile to the hormone indole-3-acetic acid. A conserved cysteine has been shown [1,2] 
to be essential for enzyme activity; it seems to be involved in a nucleophilic attack on the 
nitrile carbon atom. Cyanide hydratase (EC 4.2.1. 66^ converts HCN to formamide. In 
phytopathogenic fungi, it is used to avoid the toxic effect of cyanide released by wounded 
plants [3]. The sequence of cyanide hydrolase is evolutionary related to that of nitrilases. 
Yeast hypothetical proteins YIL164c and YIL165c also belong to this family. As signature 
patterns for these enzymes, two conserved regions were selected. The first is located in the IS 
terminal section while the second, which contains the active site cysteine, is located in the 
central section. 



Consensus pattern: G-x(2)-[LIVMFY](2)-x-[IF]-x-E-x(2)-[LIVM]-x-G-Y-P- 
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Consensus pattern: G-[GAQ]-x(2)-C-[WA]-E-[NH]-x(2)-[PST]-[LIVMFYS]-x-[KR] [C is 
the active site residue] - 

[ 1] Kobayashi M., Izui H., Nagasawa T., Yamada H. Proc. Natl. Acad. Sci. U.S.A. 90:247- 
251(1993). 

[ 2] Kobayashi M., Komeda H., Yanaka N., Nagasawa T., Yamada H. J. Biol. Chem. 
267:20746-20751(1992). 

[ 3] Wang P., Vanetten H.D. Biochem. Biophys. Res. Commun. 187:1048-1054(1992). 

373. NusB family 

The NusB protein is involved in the regulation of rRNA biosynthesis by 
transcriptional antitermination. 

Huenges M, Rolz C, Gschwind R, Peteranderl R, Berglechner F, Richter G, Bacher A, 
Kessler H,Gemmecker G, EMBO J 1998;17:4092-4100. 

374. (Neur Chan) Neurotransmitter-gated ion-channels signature 

Neurotransmitter-gated ion-channels [1,2,3,4] provide the molecular basis for rapid signal 
transmission at chemical synapses. They are post-synapticoligomeric transmembrane 
complexes that transiently form a ionic channel upon the binding of a specific 
neurotransmitter. Presently, the sequence of subunits from five types of neurotransmitter- 
gated receptors are known: - The nicotinic acetylcholine receptor (AchR), an excitatory 
cation channel. In the motor endplates of vertebrates, it is composed of four different subunits 
(alpha, beta, gamma and delta or epsilon) with a molar stoichiometry of 2:1:1:1. In neurones, 
the AchR receptor is composed of two different types of subunits: alpha and non-alpha (also 
called beta). Nicotinic AchRs are also found in invertebrates. - The glycine receptor, an 
inhibitory chloride ion channel. The glycine receptor is a pentamer composed of two different 
subunits (alpha and beta). - The gamma-aminobutyric-acid (GABA) receptor, which is also 
an inhibitory chloride ion channel. The quaternary structure of the GABA receptor is 
complex; at least four classes of subunits are known to exist (alpha, beta, gamma, and delta) 
and there are many variants in each class (for example: six variants of the alpha class have 
already been sequenced). - The serotonin 5HT3 receptor. Serotonin is a biogenic hormone 
that functions as a neurotransmitter, a hormone and a mitogen. There are seven major groups 
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of serotonin receptors; six of these groups (5HT1, 5HT2, and 5HT4 to 5HT7) transduce 
extracellular signal by activating G proteins, while 5HT3 is a ligand-gated cation-specific ion 
channel which, when activated causes fast, depolarizing responses in neurons. - The 
glutamate receptor, an excitatory cation channel. Glutamate is the main excitatory 
neurotransmitter in the brain. At least three different types of glutamate receptors have been 
described and are named according to their selective agonists (kainate, N-methyl-D-aspartate 
(NMDA) and quisqualate).All known sequences of subunits from neurotransmitter-gated ion- 
channels are structurally related. They are composed of a large extracellular glycosylated N- 
terminal ligand-binding domain, followed by three hydrophobic transmembrane regions 
which form the ionic channel, followed by an intracellular region of variable length. A fourth 
hydrophobic region is found at the C-terminal of the sequence. The sequence of subunits 
from the AchR, GABA, 5HT3, and Gly receptors are clearly evolutionary related and share 
many regions of sequence similarities. These sequence similarities are either absent or very 
weak in the Glu receptors. In the N-terminal extracellular domain of AchR/GABA/5HT3/Gly 
receptors, there are two conserved cysteine residues, which, in AchR, have been shown to 
form a disulfide bond essential to the tertiary structure of the receptor. A number of amino 
acids between the two disulfide-bonded cysteines are also conserved. Therefore this region 
was used as a signature pattern for this subclass of proteins. 

Consensus pattern: C-x-[LIVMFQ]-x-[LIVMF]-x(2)-[FY]-P-x-D-x(3)-C [The two C's are 
linked by a disulfide bond]- 

[ 1] Stroud R.M., McCarthy M.P., Shuster M. Biochemistry 29:11009-11023(1990). 
[ 2] Betz H. Neuron 5:383-392(1990). 

[ 3] Dingledine R., Myers S.J., Nicholas R.A. FASEB J. 4:2632-2645(1990). 
[ 4] Barnard E.A. Trends Biochem. Sci. 17:368-374(1992). 

375. Orotidine 5 '-phosphate decarboxylase active site 

Orotidine 5 '-phosphate decarboxylase (EC 4.1.1.23) (OMPdecase) [1,2] catalyzes the last step 
in the de novo biosynthesis of pyrimidines, the decarboxylation of OMP into UMP. In higher 
eukaryotes OMPdecase is part, with orotatephosphoribosyltransferase, of a Afunctional 
enzyme, while the prokaryotic and fungal OMPdecases are monofunctional protein. Some 
parts of the sequence of OMPdecase are well conserved across species. The best conserved 
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region is located in the N-terminal half of OMPdecases and is centered around a lysine 
residue which is essential for the catalytic function of the enzyme. This region has been 
developed as a signature pattern. 

5 Consensus pattern: [LIVMFTA]-[LIVMF]-x-D-x-K-x(2)-D-I-[GP]-x-T-[LIVMTA] [K is the 
active site residue] - 

[ 1] Jacquet M., Guilbaud R., Garreau H. Mol. Gen. Genet. 211:441-445(1988). 
[ 2] Kimsey H.H., Kaiser D. J. Biol. Chem. 267:819-824(1992). 

10 

376. ATP synthase delta (OSCP) subunit signature 

ATP synthase (proton-translocating ATPase) (EC 3.6.1.34) [1,2] is a component 

of the cytoplasmic membrane of eubacteria, the inner membrane of mitochondria, 
15 and the thylakoid membrane of chloroplasts. The ATPase complex is composed of 

anoligomeric transmembrane sector, called CF(0), which acts as a proton 

channel, and a catalytic core, termed coupling factor CF(1). 

One of the subunits of the ATPase complex, known as subunit delta in bacteria 

and chloroplasts or the Oligomycin Sensitivity Conferral Protein (OSCP) in 
20 mitochondria, seems to be part of the stalk that links CF(0) to CF(1). It 

either transmits conformational changes from CF(0) into CF(1) or is involved 

in proton conduction [3]. 

The different delta/OSCP subunits are proteins of approximately 200 amino-acid 
residues - once the transit peptide has been removed in the chloroplast and 

2 5 mitochondrial forms - which show only moderate sequence homology. 

The signature pattern used to detect ATPase delta/OSCP subunits is based on a 
conserved region in the C-terminal section of these proteins. 

Consensus pattern: [LIVM]-x-[LIVMFYT]-x(3)-[LIVMT]-[DENQK]-x(2)-[LIVM]-x- 

3 0 [GSA]-G-[LIVMFYGA]-x-[LrVM]-[KRHENQ]-x-[GSEN] 

[ 1] Futai M., Noumi T., Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3] Engelbrecht S., Junge W. Biochim. Biophys. Acta 1015:379-390(1990). 
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377. Aspartate and ornithine carbamoyltransferases signature 
Aspartate carbamoyltransferase (EC 2.1.3.2) (ATCase) catalyzes the conversion 
of aspartate and carbamoyl phosphate to carbamoylaspartate, the second step 
in the de novo biosynthesis of pyrimidine nucleotides [1]. In prokaryotes 
ATCase consists of two subunits: a catalytic chain (gene pyrB) and a 
regulatory chain (gene pyrl), while in eukaryotes it is a domain in a multi- 
functional enzyme (called URA2 in yeast, rudimentary in Drosophila, and CAD 
in mammals [2]) that also catalyzes other steps of the biosynthesis of 
pyrimidines. 

Ornithine carbamoyltransferase (EC 2.13.3) (OTCase) catalyzes the conversion 
of ornithine and carbamoyl phosphate to citrulline. In mammals this enzyme 
participates in the urea cycle [3] and is located in the mitochondrial 
matrix. In prokaryotes and eukaryotic microorganisms it is involved in the 
biosynthesis of arginine. In some bacterial species it is also involved in the 
degradation of arginine [4] (the arginine deaminase pathway). 
It has been shown [5] that these two enzymes are evolutionary related. The 
predicted secondary structure of both enzymes are similar and there are some 
regions of sequence similarities. One of these regions includes three 
residues which have been shown, by crystallographic studies [6], to be 
implicated in binding the phosphoryl group of carbamoyl phosphate. 
This region was selected as a signature for these enzymes. 

Consensus pattern: F-x-[EK]-x-S-[GT]-R-T[S, R, and the 2nd T bind carbamoyl phosphate] 
-Note: the residue in position 3 of the pattern allows to distinguish between 
an ATCase (Glu) and an OTCase (Lys). 

[ 1] Lerner C.G., Switzer R.L. J. Biol. Chem. 261:11156-11165(1986). 

[ 2] Davidson J.N., Chen K.C., Jamison R.S., Musmanno L.A. ? Kern C.B. BioEssays 

15:157-164(1993). 

[ 3] Takiguchi M., Matsubasa T., Amaya Y., Mori M. BioEssays 10:163-166(1989). 
[ 4] Baur H., Stalon V., Falmagne P., Luethi E., Haas D. Eur. J. Biochem. 166:111- 
117(1987). 
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[ 5] Houghton J.E., Bencini DA, O'Donovan G.A., Wild J.R. Proc. Natl. Acad. Sci. U.S.A. 
81:4864-4868(1981). 

[ 6] Ke H.-M., Honzatko R.B., Lipscomb W.N. Proc. Natl. Acad. Sci. U.S.A. 81:4037- 
4040(1984). 

378. Oleosins signature 

Oleosins [1] are the proteinaceous components of plants' lipid storage bodies 
called oil bodies. Oil bodies are small droplets (0.2 to 1.5 mu-m in diameter) 
containing mostly triacylglycerol that are surrounded by a phospholipid/ 
oleosin annulus. Oleosins may have a structural role in stabilizing the lipid 
body during dessication of the seed, by preventing coalescence of the oil. 
They may also provide recognition signals for specific lipase anchorage in 
lipolysis during seedling growth. Oleosins are found in the monolayer lipid/ 
water interface of oil bodies and probably interact with both the lipid and 
phospholipid moieties. 

Oleosins are proteins of 16 Kd to 24 Kd and are composed of three domains: an 
N-terminal hydrophilic region of variable length (from 30 to 60 residues); a 
central hydrophobic domain of about 70 residues and a C-terminal amphipathic 
region of variable length (from 60 to 100 residues). The central hydrophobic 
domain is proposed to be made up of beta-strand structure and to interact with 
the lipids [2]. It is the only domain whose sequence is conserved and therefore 
a section from that domain was selected as a signature pattern. 

Consensus pattern: [AG]-[ST]-x(2)-[AG]-x(2)-[LIVM]-[SAD]-T-P-[LIVMF](4)-F-S-P- 
[LIVM](3)-P-A 

[ 1] Murphy DJ. ; Keen J.N., O'Sullivan J.N., Au D.M.Y., Edwards R-W., Jackson P.J., 
Cummins I., Gibbons T., Shaw C.H., Ryan AJ. Biochim. Biophys. Acta 1088:86-94(1991). 
[ 2] Tzen J.T.C., Lie G.C., Huang A.H.C. J. Biol. Chem. 267:15626-15634(1992). 

379, (Orbi VP5) Orbivirus outer capsid protein VP5 
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This paper shows the location of the different capsid proteins 
and their relation to each other. 

[1] Schoehn G, Moss SR, Nuttall PA, Hewat EA; Virology 1997;235:191-200. 



380. Orn/DAP/Arg decarboxylases family 2 signatures 

Pyridoxal-dependent decarboxylases acting on ornithine, lysine, arginine and 
related substrates can be classified into two different families on the basis 
of sequence similarities [1,2,3]. The second family consists of: 

- Eukaryotic ornithine decarboxylase (EC 4.1.1.17) (ODC). ODC catalyzes the 
transformation of ornithine into putrescine. 

- Prokaryotic diaminopimelic acid decarboxylase (EC 4.1.1.20) (DAPDC). DAPDC 
catalyzes the conversion of diaminopimelic acid into lysine; the last step 

in the biosynthesis of lysine. 

- Pseudomonas syringae pv. tabaci protein tabA. tabA is probably involved in 
the biosynthesis of tabtoxin and is highly similar to DAPDC. 

-Bacterial and plant biosynthetic arginine decarboxylase (EC 4.1.1.19) 
(ADC). ADC catalyzes the transformation of arginine into agmatine, the 
first step in the biosynthesis of putrescine from arginine. 
The above proteins, while most probably evolutionary related, do not share 
extensive regions of sequence similarities. Two of the conserved regions were 
selected as signature patterns. The first pattern contains a conserved lysine 
residue which is known, in mouse ODC [4], to be the site of attachment of the 
pyridoxal-phosphate group. The second pattern contains a stretch of three 
consecutive glycine residues and has been proposed to be part of a substrate- 
binding region [5]. 

These enzymes are collectively known as group IV decarboxylases [3]. 

Consensus pattern: [FY]-[PA]-x-K^SACV]-[NHCLFW]-x(4)-[LIVMF]-[LIVMTA]-x(2)- 
[LIVMA]-x(3)-[GTE] [K is the pyridoxal-P attachment site] 

Consensus pattern: [GS]-x(2,6)-[LIVMSCP]-x(2)-[LIVMF]-[DNS]-[LIVMCA]-G-G-G- 
[LIVMFY]-[GSTPCEQ] 
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[ 1] Bairoch A. Unpublished observations (1993). 

[ 2] Martin C, Cami B., Yeh P., Stragier P., Parsot C, Patte J.-C. MoL Biol. Evol. 5:549- 
559(1988). 

[ 3] Sandmeier E., Hale T.I., Christen P. Eur. J. Biochem. 221:997-1002(1994). 
5 [ 4] Poulin R., Lu L., Ackermann B., Bey P., Pegg A.E. J. Biol. Chem. 267:150-158(1992). 
[ 5] Moore R.C., Boyle S.M. J. Bacteriol. 172:4631-4640(1990). 

381. Osteopontin signature 

1 0 Osteopontin is an acidic phosphorylated glycoprotein of about 40 Kd which is 
abundant in the mineral matrix of bones and which binds tightly to 
hydroxyapatite [1,2,3]- It is suggested that osteopontin might function as a 
cell attachment factor and could play a key role in the adhesion of 
osteoclasts to the mineral matrix of bone. 

1 5 Osteopontin-K is a kidney protein which is highly similar to osteopontin and 
probably also involved in cell-adhesion. 

As a signature pattern a highly conserved region located at the 
N-terminal extremity of the mature protein was selected. 

2 0 Consensus pattern: [KQ]-x-[TA]-x(2)-[GA]-S-S-E-E-K 

[ 1] Butler W.T. Connect. Tissue Res. 23:123-36(1989). 

[ 2] Gorski J.P. Calcif. Tissue Int. 50:391-396(1992). 

[ 3] Denhardt D.T., Guo X. FASEB J. 7:1475-1482(1993). 

25 

382. Oxysterol-binding protein family signature 

A number of eukaryotic proteins that seem to be involved with sterol synthesis 
and/or its regulation have been found [1] to be evolutionary related: 

3 0 - Mammalian oxysterol-binding protein (OSBP). A protein of about 800 amino- 

acid residues that binds a variety of oxysterols: oxygenated derivatives of 
cholesterol. OSBP seems to play a complex role in the regulation of sterol 
metabolism. 

- Yeast proteins HES1 and KES1; highly related proteins of 434 residues that 
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seem to play a role in ergosterol synthesis. 

- Yeast OSH1, a protein of 859 residues that also plays a role in ergosterol 
synthesis. - Yeast hypothetical protein YHROOlw (437 residues). 

- Yeast hypothetical protein YHR073w (996 residues). 

- Yeast hypothetical protein YKR003w (448 residues). 

All these proteins contain a moderately conserved domain of about 250 residues 
located in the C-terminal half of OBSP, OSH1 and YHR073w and in the central 
section of the other proteins. As a signature pattern, the best conserved part was 
selected of this domain, a region that contains a conserved 
pentapeptide. 

Consensus pattern: E-[KQ]-x-S-H-[HR]-P-P-x-[STACF]-A 

[ 1] Jiang B. ? Brown J.L., Sheraton J., Fortin N. ? Bussey H. Yeast 10:341-353(1994). 

383. FMN oxidoreductase 

384. Oxidoreductase FAD/NAD-binding domain 
Number of members: 250 

[1] 

Medline: 92084635 

The sequence of squash NADHmitrate reductase and its 
relationship to the sequences of other flavoprotein 
oxidoreductases, A family of flavoprotein pyridine nucleotide 
cytochrome reductases. 
Hyde GE, Crawford NM, Campbell W; 
J Biol Chem 1991;266:23542-23547. 
[2]Medline: 95111952 
Crystal structure of the FAD-containing fragment of corn 
nitrate reductase at 2.5 A resolution: relationship to other 
flavoprotein reductases. 
Lu G, Campbell WH, Schneider G, Lindqvist Y; 
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385. (oxidored molyb) Eukaryotic molybdopterin oxidoreductases signature 
A number of different eukaryotic oxidoreductases that require and bind a 
molybdopterin cofactor have been shown [1] to share a few regions of sequence 
similarity. These enzymes are: 

- Xanthine dehydrogenase (EC 1,1.1,204), which catalyzes the oxidation of 
xanthine to uric acid with the concomitant reduction of NAD. Structurally, 
this enzyme of about 1300 amino acids consists of at least three distinct 
domains: an N-terminal 2Fe-2S ferredoxin-like iron-sulfur binding domain 

(see <PDOC00175>), a central FAD/NAD-binding domain and a C-terminal Mo- 
pterin domain, 

- Aldehyde oxidase (EC 1.2.3.1), which catalyzes the oxidation aldehydes into 
acids. Aldehyde oxidase is highly similar to xanthine dehydrogenase in its 
sequence and domain structure. 

-Nitrate reductase (EC 1.6.6.1), which catalyzes the reduction of nitrate 
to nitrite. Structurally, this enzyme of about 900 amino acids consists of 
an N-terminal Mo-pterin domain, a central cytochrome b5-type heme-binding 
domain (see <PDOC00170>) and a C-terminal FAD/NAD-binding cytochrome 
reductase domain. 

- Sulfite oxidase (EC 1.8.3.1), which catalyzes the oxidation of sulfite to 
sulfate. Structurally, this enzyme of about 460 amino acids consists of an 
N-terminal cytochrome b5-binding domain followed by a Mo-pterin domain. 

There are a few conserved regions in the sequence of the molybdopterin-binding 
domain of these enzymes. The pattern used to detect these proteins is based 
on one of them. It contains a cysteine residue which could be involved in 
binding the molybdopterin cofactor. 

Consensus pattern: [GA]-x(3)-[KRNQHT]-x(ll,14)-[LIVMFWS]-x(8)-[LIVMF]-x-C-x(2)- 
[DEN]-R-x(2)-[DE] 

[ 1] Wootton J.C., Nicolson R.E., Cock J.M., Walters D.E., Burke J.F., Doyle W.A., Bray 
R.C. Biochim. Biophys. Acta 1057:157-185(1991). 
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386. (Oxidored ql) NADH-Ubiquinone/plastoquinone (complex I), various chains 
This family is part of complex I which catalyses the 

transfer of two electrons from NADH to ubiquinone in a 
reaction that is associated with proton translocation 
across the membrane. Number of members: 1824 

[1] 

Medline: 93110040 

The NADH:ubiquinone oxidoreductase (complex I) of respiratory chains. Walker JE; 
Q Rev Biophys 1992;25:253-324. 

387. (oxidored q3) NADH-ubiquinone/plastoquinone oxidoreductase chain 6. 179 members. 

388. (oxidored q5) NADH-ubiquinone oxidoreductase chain 4, amino terminus 
[1] Walker JE ; Q Rev Biophys 1992;25:253-324. 

389. (oxidored q6) Respiratory-chain NADH dehydrogenase 20 Kd subunit signature 
Respiratory-chain NADH dehydrogenase (EC 1.6.5.3) [1,2] (also known as complex 
I or NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex 
located in the inner mitochondrial membrane which also seems to exist in 

the chloroplast and in cyanobacteria (as a NADH-plastoquinone oxidoreductase). 
Among the 25 to 30 polypeptide subunits of this bioenergetic enzyme complex 
there is one with a molecular weight of 20 Kd (in mammals) [3], which is a 
component of the iron-sulfur (IP) fragment of the enzyme. It seems to bind a 
4Fe-4S iron-sulfur cluster. The 20 Kd subunit has been found to be: 

- Nuclear encoded, as a precursor form with a transit peptide in mammals, and 
in Neurospora crassa. - Mitochondrial encoded in Paramecium (gene psbG). 

- Chloroplast encoded in various higher plants (gene ndhK or psbG). 
The 20 Kd subunit is highly similar to [4]: 
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- Synechocystis strain PCC 6803 proteins psbGl and psbG2. 

- Subunit B of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoB). 

- Subunit NQ06 of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 

- Subunit 7 of Escherichia coli formate hydrogenlyase (gene hycG). 

- Subunit I of Escherichia coli hydrogenase-4 (gene hyfl). 

As as signature pattern a highly conserved region was selected, located in the 
central section of this subunit and which contains a conserved cysteine that 
is probably involved in the binding of the 4Fe-4S center. 

Consensus pattern: [GN]-x-D-[KRST]-[LIVMF](2)-P-[IV]-D-[LIVMFYW](2)-x-P-x-C-P- 

[PT] [The C is a putative 4Fe-4S ligand] 

[ 1] Ragan C.L Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T., Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991). 
[ 3] Arizmendi J.M., Runswick M.J., Skehel J.M., Walker J.E. FEBS Lett. 301:237- 
242(1992). 

[ 4] Weidner U., Geier S., Ptock A., Friedrich T., Leif H., Weiss H. J. Mol. Biol. 233:109- 
122(1993). 

390. p53 tumor antigen signature 

The p53 tumor antigen [1 to 5, E1,E2] is a protein found in increased amounts 
in a wide variety of transformed cells. It is also detectable in many 
proliferating nontransformed cells, but it is undetectable or present at low 
levels in resting cells. It is frequently mutated or inactivated in many types 
of cancer. p53 seems to act as a tumor suppressor in some, but probably not 
all, tumor types. p53 is probably involved in cell cycle regulation, and may 
be a trans-activator that acts to negatively regulate cellular division by 
controlling a set of genes required for this process. 

p53 is a phosphoprotein of about 390 amino acids which can be subdivided into 
four domains: a highly charged acidic region of about 75 to 80 residues, a 
hydrophobic proline-rich domain (position 80 to 150), a central region (from 
150 to about 300), and a highly basic C-terminal region. The sequence of p5 3 
is well conserved in vertebrate species; attempts to identify p53 in other 
eukaryotic philum has so far been unsuccessful. 
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As a signature pattern for p53 a perfectly conserved stretch of 13 

residues located in the central region of the protein was selected. This region, known as 
domain IV in [3], is involved (along with an adjacent region) in the binding 
of the large T antigen of SV40. In man this region is the focus of a variety 
of point mutations in cancerous tumors. 

Consensus pattern: M-C-N-S-S-C-M-G-G-M-N-R-R 

[ 1] Levine A.J., Momand J., Finlay C.A. Nature 351:453-456(1991). 

[ 2] Levine A.J., Momand J. Biochim. Biophys. Acta 1032:119-136(1990). 

[ 3] Soussi T., Caron De Fromentel C, May P. Oncogene 5:945-952(1990). 

[ 4] Lane D.P., Benchimol S. Genes Dev. 4:1-8(1990). 

[ 5] Ulrich S.J., Anderson C.W., Mercer W.E., Appella E. J. Biol. Chem. 267:15259- 
15262(1992). 

391. (P5CR) Delta l-pyrroline-5-carboxylate reductase signature 

Delta l-pyrroline-5-carboxylate reductase (P5CR) (EC 1.5.1.2) [1,2] is the 

enzyme that catalyzes the terminal step in the biosynthesis of proline from 

glutamate, the NAD(P) dependent oxidation of l-pyrroline-5-carboxylate into 

proline. 

The sequences of P5CR from eubacteria (gene proC), archaebacteria and 
eukaryotes show only a moderate level of overall similarity. As a signature 
pattern, the best conserved region located in the C-terminal 
section of P5CR was selected. 

Consensus pattern: [PALF]-x(2,3)-[LIV]-x(3)-[LIVM]-[STAC]-[STV]-x-[GAN]-G-x-T- 
[AG]-[LIV]-x(2)-[LMF]-[DENQK] 

[ 1] Delauney A.J., Verma D.P. Mol. Gen. Genet. 221:299-305(1990). 
[ 2] Savioz A., Jeenes D.J., Kocher H.P., Haas D. Gene 86:107-111(1990). 



392. Poly-adenylate binding protein, unique domain. 
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393. (PAL) Phenylalanine and histidine ammonia-lyases active site 
Phenylalanine ammonia-lyase (EC 4.3.L5) (PAL) is a key enzyme of plant and 
fungi phenylpropanoid metabolism which is involved in the biosynthesis of a 
wide variety of secondary metabolites such as flavanoids, furanocoumarin 
phytoalexins and cell wall components. These compounds have many important 
roles in plants during normal growth and in responses to environmental stress. 
PAL catalyzes the removal of an ammonia group from phenylalanine to form 
trans-cinnamate. 

Histidine ammonia-lyase (EC 4.3.1.3) (histidase) catalyzes the first step in 
histidine degradation, the removal of an ammonia group from histidine to 
produce urocanic acid. 

The two types of enzymes are functionally and structurally related [1]. They 
are the only enzymes which are known to have the modified amino acid dehydro- 
alanine (DHA) in their active site. A serine residue has been shown [2,3,4] to 
be the precursor of this essential electrophilic moiety. The region around 
this active site residue is well conserved and can be used as a signature 
pattern. 

Consensus pattern: G-[STG]-[LIVM]-[STG]-[AC]-S-G-[DH]-L-x-P-L-[SA]-x(2)-[SA] [S is 
the active site residue] 

[ 1] Taylor R.G., Lambert M.A., Sexsmith E. ? Sadler S.J., Ray P.N., Mahuran D.J., Mclnnes 
R.R. J. Biol. Chem. 265:18192-18199(1990). 

[ 2] Langer M., Reck G., Reed J., Retey J. Biochemistry 33:6462-6467(1994). 

[ 3] Schuster B., Retey J. FEBS Lett. 349:252-254(1994). 

[ 4] Taylor R.G., Mclnnes R.R. J. Biol. Chem. 269:27473-27477(1994). 

394. PAS domain 
-!- CAUTION. This family does not currently match all known 
examples of PAS domains. 

PAS motifs appear in archaea, eubacteria and eukarya. Probably 
the most surprising identification of a PAS domain was that in 



Attorney No. 2750-1237P 

368 

EAG-like K+-channels[l,3]. 
Number of members: 308 

[1] 

Medline: 97446881 

PAS domain S-boxes in archaea, bacteria and sensors for 
oxygen and redox. 
Zhulin IB, Taylor BL, Dixon R; 
Trends Biochem Sci 1997;22:331-333. 
[2]Medline: 95275818 
1.4 A structure of photoactive yellow protein, a cytosolic 
photoreceptor: unusual fold, active site, and chromophore. 
Borgstahl GE, Williams DR, Getzoff ED; 
Biochemistry 1995;34:6278-6287. 
[3]Medline: 98044337 
PAS: a multifunctional domain family comes to light. 
Ponting CP, Aravind L; 
Curr Biol 1997;7:674-677. 

395. (PBP) Phosphatidylethanolamine-binding protein family signature 
Mammalian phosphatidylethanolamine-binding protein (also knowns as basic 
cytosolic 21 Kd protein) is a 186 residue protein found in a variety of 
tissues [1]. It binds hydrophobic ligands, such as phosphatidylethanolamine, 
but also seems [2] to bind nucleotides such as GTP and FMN, it is suggested 
that it could act in membrane remodeling during growth and maturation. This 
protein belongs to a family that also includes: 

- Drosophila antennal protein A5, a putative odorant-binding protein. 

- Onchocerca volvulus antigen Ov-16 and the related proteins Dl, D2 and D3. 

- Plasmodium falciparum putative phosphatidylethanolamine-binding protein. 

- Toxocara canis secreted antigen TES-26. This larval protein has been shown 
to bind phosphatidylethanolamine. 

-Yeast protein DKA1 (also known as NSP1 or TFS1). The function of this 
protein is not very clear. - Yeast hypothetical protein YLR179C 

- Caenorhabditis elegans hypothetical protein F40A3.3. 
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As a signature pattern, the best conserved region was selected which is located 
in the end of the first third of the sequence of these proteins. 

Consensus pattern: [FYL]-x-[LV]-[LIVF]-x-[TIV]-[DC]-P-D-x-P-[SN]-x(10)-H 

[ 1] Seddiqi N., Bollengier F., Alliel P.M., Perin J.P., Bonnet F., Bucquoy S., Jolles P., 

Schoentgen F. J. Mol. Evol. 39:655-660(1994). 

[ 2] Schoentgen F., Jolles P. FEBS Lett. 369:22-6(1995). 

396. PCI domain 

This domain has also been called the PINT motif (Proteasome, 

Int-6, Nip-1 and TRIP-15) [1]. 
Number of members: 49 

[1] 

Medline: 98308842 

The PCI domain: a common theme in three multiprotein 
complexes. 
Hofmann K, Bucher P; 
Trends Biochem Sci 1998;23:204-205. 
[2]Medline: 98266368 
Homologues of 26S proteasome subunits are regulators of 
transcription and translation. 
Aravind L, Ponting CP; 
Protein Sci 1998;7:1250-1254. 

397. (PCMT) Protein-L-isoaspartate (D-aspartate) O-methyltransferase signature. Protein-L- 
isoaspartate (D-aspartate) O-methyltransferase (EC 2X1.77) (PCMT)[1] (which is also 
known as L-isoaspartyl protein carboxyl methyltransferase)is an enzyme that catalyzes the 
transfer of a methyl group from S-adenosylmethionine to the free carboxyl groups of D- 
aspartyl or L-isoaspartyl residues in a variety of peptides and proteins. The enzyme does not 
act on normal L-aspartyl residues L-isoaspartyl and D-aspartyl are the products of the 
spontaneous de amidation and/or isomerization of normal L-aspartyl and L-asparaginyl 
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residues in proteins. PCMT plays a role in the repair and/ordegradation of these damaged 
proteins; the enzymatic methyl esterification of the abnormal residues can lead to their 
conversion to normal L-aspartylresidues. PCMT is a well-conserved and widely distributed 
cytosolic protein of about 24Kd. As a signature pattern, a conserved region in the central part 
of this enzyme has been developed. 

Consensus pattern: [GSA]-D-G-x(2)-G-[FYWV]-x(3)-[AS]-P-[FY]-[DN]-x-I - 

[ 1] Kagan R.M., McFadden H.J., McFadden P.N., O'Connor C, Clarke S. Comp. Biochem. 
Physiol. 117b:379-385(1997). 

398. (PCNA) Proliferating cell nuclear antigen signatures 

Proliferating cell nuclear antigen (PCNA) [1,2] is a protein involved in DNA 

replication by acting as a cofactor for DNA polymerase delta, the 

polymerase responsible for leading strand DNA replication. 

A similar protein exists in yeast (gene POL30) [3] and is associated with 

polymerase III, the yeast analog of polymerase delta. In baculoviruses the 

ETL protein has been shown [4] to be highly related to PCNA and is probably 

associated with the viral encoded DNA polymerase. An homolog of PCNA is also 

found in archebacteria. 

As signatures for this family of proteins, two conserved regions were selected 
located in the N-terminal section. The second one has been proposed to bind 
DNA. 

Consensus pattern: [GA]-[LIVMF]-x-[LIVMA]-x-[SAV]-[LIVM]-D-x-[NSAE]-[HKR]-[VI]- 
x-[LY]-[VGA]-x-[LIVM]-x-[LIVM]-x(4)-F 

-Consensus pattern: [RKA]-C-[DE]-[RH]-x(3)-[LIVMF]-x(3)-[LIVM]-x-[SGAN]-[LIVMF]- 
x-K-[LIVMF](2) 

[ 1] Bravo R., Frank R., Blundell P.A., McDonald-Bravo H. Nature 326:515-517(1987). 
[ 2] Suzuka I., Hata S., Matsuoka M., Kosugi S., Hashimoto J. Eur. J. Biochem. 195:571- 
575(1991).[ 3] Bauer G.A., Burgess P.M.J. Nucleic Acids Res. 18:261-265(1990). 
[ 4] O'Reilly D.R., Crawford A.M., Miller L.K. Nature 337:606-606(1989). 
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399. (PDT) Prephenate dehydratase signatures 

Prephenate dehydratase (EC 4.2.1.51) (PDT) catalyzes the decarboxylation of 
prephenate into phenylpyruvate. In microorganisms PDT is involved in the 
terminal pathway of the biosynthesis of phenylalanine. In some bacteria such 
as Escherichia coli PDT is part of a bifunctional enzyme (P-protein) that also 
catalyzes the transformation of chorismate into prephenate (chorismate 
mutase) while in other bacteria it is a monofunctional enzyme. The sequence of 
monofunctional PDT align well with the C-terminal part of that of P-proteins 

[!]• 

As signature patterns for PDT two conserved regions were selected. The first 
region contains a conserved threonine which has been said to be essential for 
the activity of the enzyme in E. coli. The second region includes a conserved 
glutamate. Both regions are in the C-terminal part of PDT. 

Consensus pattern: [FY]-x-[LIVM]-x(2)-[LIVM]-x(5)-[DN]-x(5)-T-R-F-[LiVMW]-x- 
[LIVM] 

[ 1] Fischer R.S., Zhao G., Jensen R.A. J. Gen. Microbiol. 137:1293-1301(1991). 

400. PDZ domain (Also known as DHR or GLGF). 

PDZ domains are found in diverse signaling proteins. 

[1] Ponting CP, Phillips C, Davies KE, Blake DJ 
Bioessays 1997;19:469-479. [2] Doyle DA, Lee A, Lewis J, Kim E, Sheng M, MacKinnon R; 
Cell. 1996;85:1067-1076. [3] Ponting CP; Protein Sci 1997;6:464-468. 

401. (PPDK_N_term) PEP-utilizing enzymes signatures 

A number of enzymes that catalyze the transfer of a phosphoryl group from 
phosphoenolpyruvate (PEP) via a phospho-histidine intermediate have been shown 
to be structurally related [1,2,3,4]. These enzymes are: 

- Pyruvate,orthophosphate dikinase (EC 2.7.9.1) (PPDK). PPDK catalyzes the 
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reversible phosphorylation of pyruvate and phosphate by ATP to PEP and 
diphosphate. In plants PPDK function in the direction of the formation of 
PEP, which is the primary acceptor of carbon dioxide in C4 and crassulacean 
acid metabolism plants. In some bacteria, such as Bacteroides symbiosus, 
PPDK functions in the direction of ATP synthesis. 

- Phosphoenolpyruvate synthase (EC 2.7.9.2) (pyruvate,water dikinase). This 
enzyme catalyzes the reversible phosphorylation of pyruvate by ATP to form 
PEP, AMP and phosphate, an essential step in gluconeogenesis when pyruvate 
and lactate are used as a carbon source. 

- Phosphoenolpyruvate-protein phosphotransferase (EC 2.7.3.9). This is the 
first enzyme of the phosphoenolpyruvate-dependent sugar phosphotransferase 
system (PTS), a major carbohydrate transport system in bacteria. The PTS 
catalyzes the phosphorylation of incoming sugar substrates concomitant 
with their translocation across the cell membrane. The general mechanism 
of the PTS is the following: aphosphoryl group from PEP is transferred 

to enzyme-I (EI) of PTS which in turn transfers it to a phosphoryl carrier 
protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar- 
specific permease. 

All these enzymes share the same catalytic mechanism: they bind PEP and 
transfer the phosphoryl group from it to a histidine residue. The sequence 
around that residue is highly conserved and can be used as a signature pattern 
for these enzymes. As a second signature pattern a conserved 

region was selected in the C-terminal part of the PEP-utilizing enzymes. The biological 
significance of this region is not yet known. 

Consensus pattern: G-[GA]-x-[TN]-x-H-[STA]-[STAV]-[LIVM](2)-[STAV]-[RG] [H is 
phosphorylated] 

-Consensus pattern: [DEQSK]-x-[LIVMF]-S-[LIVMF]-G-[ST]-N-D-[LIVM]-x-Q- 
[LIVMFYGT]-[STALIV]-[LIVMF]-[GAS]-x(2)-R 

[ 1] Reizer J., Hoischen C, Reizer A., Pham T.N., Saier M.H. Jr. Protein Sci. 2:506- 
521(1993). 

[ 2] Reizer J., Reizer A., Merrick M.J., Plunkett G. Ill, Rose D.J., Saier M.H. Jr. Gene 
181:103-108(1996). 
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[ 3] Pocalyko D.J., Carroll L.J., Martin B.M., Babbitt P.C., Dunaway-Mariano D. 
Biochemistry 29:10757-10765(1990). 

[ 4] Niersbach M., Kreuzaler R, Geerse R.H., Postma P., Hirsch H.J. Mol. Gen. Genet. 
232:332-336(1992). 

5 

402. (PEPCK ATP) Phosphoenolpyruvate carboxykinase (ATP) signature 
Phosphoenolpyruvate carboxykinase (ATP) (EC 4.1.1.49) (PEPCK) [1] catalyzes 
the formation of phosphoenolpyruvate by decarboxylation of oxaloacetate while 
1 0 hydrolyzing ATP, a rate limiting step in gluconeogenesis (the biosynthesis of 
glucose). 

The sequence of this enzyme has been obtained from Escherichia coli, yeast, 
and Trypanosoma brucei; these three sequences are evolutionary related and 
share many regions of similarity. As a signature pattern a highly 
1 5 conserved region was selected that contains four acidic residues and which is located in 
the central part of the enzyme. The beginning of the pattern is located about 
10 residues to the C-terminus of an ATP-binding motif ! A ! (P-loop) (see 
<PDOC00017>) and is also part of the ATP-binding domain [2], 

2 0 Consensus pattern: L-I-G-D-D-E-H-x-W-x-[DE]-x-G-[IV]-x-N 

-Note: phosphoenolpyruvate carboxykinase (GTP) (EC 4.1.1.32) an enzyme that catalyzes 
the same reaction, but using GTP instead of ATP, is not related to the above enzyme (see 
<PDOC00421>). 

2 5 [1] Medina V., Pontarollo R., Glaeske D., Tabel H., Goldie H. J. Bacteriol. 172:7151- 
7156(1990). 

[ 2] Matte A., Goldie H., Sweet R.M., Delbaere L.TJ. J. Mol. Biol. 256:126-143(1996). 

30 403. (Pepcase) Phosphoenolpyruvate carboxylase active sites. Phosphoenolpyruvate 
carboxylase (EC 4.1.1.31) (PEPcase) catalyzes the irreversible beta-carboxylation of 
phosphoenolpyruvate by bicarbonate to yield oxaloacetate and phosphate. The enzyme is 
found in all plants and in a variety of microorganisms. A histidine [1] and a lysine [2] have 
been implicated in the catalytic mechanism of this enzyme; the regions around these active 
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site residues are highly conserved in PEPcase from various plants, bacteria and cyanobacti 
and can be used as a signature patterns for this type of enzyme. 

Consensus pattern: [VT]-x-T-A-H-P-T-[EQ]-x(2)-R-[KRH] [H is an active site residue- 
Consensus pattern: [IV]-M-[LIVM]-G-Y-S-D-S-x-K-D-[STAG]-G [K is an active site 
residue] - 

[ 1] Terada K., Izui K. Eur. J. Biochem. 202:797-803(1991).[ 2] Jiao J.-A., Podesta F.E., 
Chollet R., O'Leary M.H., Andreo C.S. Biochim. Biophys. Acta 1041:291-295(1990). 

404. PET112 family signature 

The following proteins from eukaryotes, prokaryotes and archaebacteria belong 
to the same family: 

- Yeast mitochondrial protein PET112 [1], which plays an unknown role in the 
expression of mitochondrial genes, probably at the level of translation. 

- Aspergillus nidulans mitochondrial protein nempA. 

- Bacillus subtilis hypothetical protein yzdD. 

- Moraxella catarrhalis hypothetical protein in bloR-1 3'region. 

- Mycoplasma genitalium hypothetical protein MG100. 

- Methanococcus jannaschii hypothetical proteins MJ0019 and MJ0160. 
The size of these proteins range from 419 to 630 amino acids. As a signature 
pattern, a conserved region located in the N-terminal section was selected. 

Consensus pattern: [DN]-x-[DN]-R-x(3)-P-L-[LIV]-E-[LIV]-x-[ST]-x-P 
[ 1] Mulero J.J., Rosenthal J.K., Fox T.D. Curr. Genet. 25:299-304(1994), 

405. (PFK) Phosphofructokinase signature 

Phosphofructokinase (EC 2.7.1.11) (PFK) [1,2] is a key regulatory enzyme in 
the glycolytic pathway. It catalyzes the phosphorylation by ATP of fructose 
6-phosphate to fructose 1,6-bisphosphate. In bacteria PFK is a tetramer of 
identical 36 Kd subunits. In mammals it is a tetramer of 80 Kd subunits. Each 
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80 Kd subunit consist of two homologous domains which are highly related to 
the bacterial 36 Kd subunits. In Human there are three, tissue-specific, types 
of PFK isozymes: PFKM (muscle), PFKL (liver), and PFKP (platelet). In yeast 
PFK is an octamer composed of four 100 Kd alpha chains (gene PFK1) and four 
5 100 Kd beta chains (gene PFK2); like the mammalian 80 Kd subunits, the yeast 
100 Kd subunits are composed of two homologous domains. 
As a signature pattern for PFK a region that contains three basic 
residues involved in fructose-6-phosphate binding was selected. 

1 0 Consensus pattern: [RK]-x(4)-G-H-x-Q-[QR]-G-G-x(5)-D-R [The R/K, the H and the Q/R 
are involved in fructose-6-P binding] 

-Note: Escherichia coli has two phosphofructokinase isozymes which are encoded by genes 
pfkA (major) and pfkB (minor). The pfkB isozyme is not evolutionary related to other 
prokaryotic or eukaryotic PFK ! s (see <PDOC00504>). 

15 

[ 1] Poorman R.A., Randolph A., Kemp R.G., Heinrikson RX. Nature 309:467-469(1984). 
[ 2] Heinisch J., Ritzel R.G., von Borstel R.C, Aguilera A., Rodicio R., Zimmermann F.K. 
Gene 78:309-321(1989), 

20 

406. (PGAM) Phosphoglycerate mutase family phosphohistidine signature 
Phosphogly cerate mutase (EC 5.4.2.1) (PGAM) and bisphosphogly cerate mutase 
(EC 5.4,2.4) (BPGM) are structurally related enzymes which catalyze reactions 
involving the transfer of phospho groups between the three carbon atoms of 
2 5 phosphoglycerate [1,2]. Both enzymes can catalyze three different reactions, 
although in different proportions: 

- The isomerization of 2-phosphoglycerate (2-PGA) to 3-phosphoglycerate (3- 
PGA) with 2,3 -diphosphogly cerate (2,3-DPG) as the primer of the reaction. 

- The synthesis of 2,3-DPG from 1,3-DPG with 3-PGA as a primer. 

30 - The degradation of 2,3-DPG to 3-PGA (phosphatase EC 3.1.3.13 activity). 

In mammals, PGAM is a dimeric protein. There are two isoforms of PGAM: the M 
(muscle) and B (brain) forms. In yeast, PGAM is a tetrameric protein. BPGM is 
a dimeric protein and is found mainly in erythrocytes where it plays a major 
role in regulating hemoglobin oxygen affinity as a consequence of controlling 
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2,3 -DPG concentration. 

The catalytic mechanism of both PGAM and BPGM involves the formation of a 
phosphohistidine intermediate [3]. 

The Afunctional enzyme 6-phosphofructo-2-kinase / fructose-2,6-bisphosphatase 
5 (EC 2.7.1.105 and EC 3.1.3.46) (PF2K) [4] catalyzes both the synthesis and the 
degradation of fructose-2 ? 6-bisphosphate. PF2K is an important enzyme in the 
regulation of hepatic carbohydrate metabolism. Like PGAM/BPGM, the fructose- 
2,6-bisphosphatase reaction involves a phosphohistidine intermediate and the 
phosphatase domain of PF2K is structurally related to PGAM/BPGM. 
1 0 The bacterial enzyme alpha-ribazole-5 -phosphate phosphatase (gene cobC) which 
is involved in cobalamin biosynthesis also belongs to this family [5]. 
A signature pattern was built around the phosphohistidine residue. 

Consensus pattern: [LIVM]-x-R-H-G~[EQ]-x(3)-N [H is the phosphohistidine residue] 
1 5 -Note: some organisms harbor a form of PGAM independent of 2 ? 3-E>PG, this enzyme is 
not related to the family described above [6]. 

[ 1] Le Boulch P., Joulin V., Garel M.-C, Rosa J., Cohen-Solal M. Biochem. Biophys. Res. 
Commun. 156:874-881(1988). 
2 0 [2] White M.R, Fothergill-Gilmore L.A. FEBS Lett. 229:383-387(1988). 
[ 3] Rose Z.B. Meth. Enzymol. 87:43-51(1982). 

[ 4] Bazan Ji\, Fletterick R.J., Pilkis S J. Proc. Natl. Acad. Sci. U.S.A. 86:9642- 
9646(1989). 

[ 5] OToole G.A. ; Trzebiatowski J.R., Escalante-Semerena J.C. J. BioL Chem. 269:26503- 
25 26511(1994). 

[ 6] Grana X. ? De Lecea L. ? El-Maghrabi M.R., Urena J.M., Caellas C, Carreras J., 
Puigdomenech P. ? Pilkis S.J., Ciiment K J. BioL Chem. 267:12797-12803(1992). 

30 407. (PGI) Phosphoglucose isomerase signatures 

Phosphoglucose isomerase (EC 5.3.1.9) (PGI) [1,2] is a dimeric enzyme that 
catalyzes the reversible isomerization of glucose-6-phosphate and fructoses- 
phosphate. PGI is involved in different pathways: in most higher organisms it 
is involved in glycolysis; in mammals it is involved in gluconeogenesis; in 
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plants in carbohydrate biosynthesis; in some bacteria it provides a gateway 
for fructose into the Entner-Doudouroff pathway. PGI has been shown [3] to be 
identical to neuroleukin, a neurotrophic factor which supports the survival of 
various types of neurons. 

The sequence of PGI from many species ranging from bacteria to mammals is 
available and has been shown to be highly conserved. As signature patterns for 
this enzyme two conserved regions were selected, the first region is located in 
the central section of PGI, while the second one is located in its C-terminal 
section. 

Consensus pattern: [DENS]-x-[LIVM]-G-G-R-[FY]-S-[LIVMT]-x-[STA]-[PSAq- 
[LIVMA]-G 

-Consensus pattern: [GS]-x-[LIVM]-[LIVMFYW]-x(4)-[FY]-[DN]-Q-x-G-V-E-x(2)-K 

[ 1] Achari A., Marshall S.E., Muirhewad H., Palmieri R.H., Noltmann E.A. Philos. Trans. 

R. Soc. Lond., B, Biol. Sci. 293:145-157(1981). 

[ 2] Smith M.W., Doolittle R.F. J. Mol. Evol. 34:544-545(1992). 

[ 3] Faik P., Walker J.I.H., Redmill AA.M., Morgan M.J. Nature 332:455-456(1988). 

408. (PGK) Phosphoglycerate kinase signature 

Phosphoglycerate kinase (EC 2.7.2.3) (PGK) [1] catalyzes the second step in 
the second phase of glycolysis, the reversible conversion of 1,3-diphospho- 
gly cerate to 3-phosphoglycerate with generation of one molecule of ATP. PGK 
is found in all living organisms and its sequence has been highly conserved 
throughout evolution. It is a two-domain protein; each domain is composed of 
six repeats of an alpha/beta structural motif. As a signature pattern for 
PGK's, a conserved region in the N-terminal region was selected. 
Consensus pattern: [KRHGTCVN]-[VT]-[LIVMF]-[LIVMC]-R-x-D-x-N-[SACV]-P 

[ 1] Watson H.C., Littlechild J.A. Biochem. Soc. Trans. 18:187-190(1990). 



409. (PGM PMM) Phosphoglucomutase and phosphomannomutase phosphoserine signature 
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- Phosphoglucomutase (EC 5.4.2.2) (PGM). PGM is an enzyme responsible for 
the conversion of D-glucose 1 -phosphate into D-glucose 6-phosphate. PGM 
participates in both the breakdown and synthesis of glucose [1]. 

- Phosphomannomutase (EC 5.4.2.8) (PMM). PMM is an enzyme responsible for 
5 the conversion of D-mannose 1 -phosphate into D-mannose 6-phosphate. PMM is 

required for different biosynthetic pathways in bacteria. For example, in 
enterobacteria such as Escherichia coli there are two different genes 
coding for this enzyme: rfbK which is involved in the synthesis of the O 
antigen of lipopolysaccharide and cpsG which is required for the synthesis 
10 of the M antigen capsular polysaccharide [2]. In Pseudomonas aeruginosa PMM 
(gene algC) is involved in the biosynthesis of the alginate layer [3] and 
in Xanthomonas campestris (gene xanA) it is involved in the biosynthesis of 
xanthan [4]. In Rhizobium strain ngr234 (gene noeK) it is involved in the 
biosynthesis of the nod factor. 
15 - Phosphoacetylglucosamine mutase (EC 5.4.2.3) which converts N-acetyl-D- 
glucosamine 1 -phosphate into the 6-phosphate isomer. 

The catalytic mechanism of both PGM and PMM involves the formation of a 

phosphoserine intermediate [1]. The sequence around the serine residue is well 

conserved and can be used as a signature pattern. 
20 In addition to PGM and PMM there are at least three uncharacterized proteins 

that belong to this family [5,6]: 

- Urease operon protein ureC from Helicobacter pylori. 

- Escherichia coli protein mrsA. 

- Paramecium tetraurelia parafusin, a phosphoglycoprotein involved in 

2 5 exocytosis. 

- A Methanococcus vannielii hypothetical protein in the 3 'region of the gene 
for ribosomal protein S10. 

Consensus pattern: [GSA]-[LIVM]-x-[LIVM]-[ST]-[PGA]-S-H-x-P-x(4)-[GNHE] [S is the 
30 phosphoserine residue] 

-Note: PMM from fungi do not belong to this family. 



[ 1] Dai J.B., Liu Y. ? Ray WJ. Jr., Konno M. J. Biol. Chem. 267:6322-6337(1992). 



Attorney No. 2750-1237P 

379 

[ 2] Stevenson G., Lee S.J., Romana L.K., Reeves P.R. Mol. Gen. Genet. 227:173- 
180(1991). 

[ 3] Zielinski N.A., Chakrabarty A.M., Berry A. J, Biol. Chem. 266:9754-9763(1991). 
[ 4] Koeplin R., Arnold W., Hoette B., Simon R., Wang G., Puehler A. J. Bacteriol. 
174:191-199(1992). 

[ 5] Bairoch A. Unpublished observations (1993). 

[ 6] Subramanian S.V., Wyroba E., Andersen A.P., Satir B.H. Proc. Natl. Acad. Sci. U.S.A. 
91:9832-9836(1994). 

410. PH domain profile 

The 'pleckstrin homology' (PH) domain is a domain of about 100 residues that 
occurs in a wide range of proteins involved in intracellular signaling or as 
constituents of the cytoskeleton [1 to 7]. 

The function of this domain is not clear, several putative functions have been 
suggested: - binding to the beta/gamma subunit of heterotrimeric G proteins, 

- binding to lipids, e.g. phosphatidylinositol-4,5-bisphosphate, 

- binding to phosphorylated Ser/Thr residues, 

- attachment to membranes by an unknown mechanism. 

It is possible that different PH domains have totally different ligand 
requirements. 

The 3D structure of several PH domains has been determined [8]. All known 
cases have a common structure consisting of two perpendicular anti-parallel 
beta sheets, followed by a C-terminal amphipathic helix. The loops connecting 
the beta-strands differ greatly in length, making the PH domain relatively 
difficult to detect. There are no totally invariant residues within the PH 
domain. 

Proteins reported to contain one more PH domains belong to the following 
families: 

- Pleckstrin, the protein where this domain was first detected, is the major 
substrate of protein kinase C in platelets. Pleckstrin is one of the rare 
proteins to contains two PH domains. 

- Ser/Thr protein kinases such as the Act/Rac family, the beta-adrenergic 
receptor kinases, the mu isoform of PKC and the trypanosomal NrkA family. 
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- Tyrosine protein kinases belonging to the Btk/Itk/Tec subfamily. 

- Insulin Receptor Substrate 1 (IRS-1). 

- Regulators of small G-proteins like guanine nucleotide releasing factor 
GNRP (Ras-GRF) (which contains 2 PH domains), guanine nucleotide exchange 
proteins like vav, dbl, SoS and yeast CDC24, GTPase activating proteins 

like rasGAP and BEM2/IPL2, and the human break point cluster protein bcr. 

- Cytoskeletal proteins such as dynamin (see <PDOC00362>), Caenorhabditis 
elegans kinesin-like protein unc-104 (see <PDOC00343>), spectrin beta- 
chain, syntrophin (2 PH domains) and yeast nuclear migration protein NUM1. 

-Mammalian phosphatidylinositol-specific phospholipase C (PI-PLC) (see 
<PDOC50007>) isoforms gamma and delta. Isoform gamma contains two PH 
domains, the second one is split into two parts separated by about 400 
residues. - Oxysterol binding proteins OSBP, yeast OSH1 and YHR073w. 

- Mouse protein citron, a putative rho/rac effector that binds to the GTP- 
bound forms of rho and rac, 

- Several yeast proteins involved in cell cycle regulation and bud formation 
likeBEM2, BEM3, BUD4 and the BEMl-binding proteins BOI2 (BEB1) and BOI1 
(BOB1). - Caenorhabditis elegans protein MIG-10. 

- Caenorhabditis elegans hypothetical proteins C04D8.1, K06H7.4 and ZK632.12. 

- Yeast hypothetical proteins YBR129c and YHR155w. 

The profile for the PH domain, which has been developed by Toby Gibson at the 
EMBL, covers the total length of domain. Several proteins contain large 
insertions in the PH domain and are thus difficult to detect with this 
profile. In some of these cases, the profile will align only to one half of 
the PH domain. 

-Sequences known to belong to this class detected by the pattern: ALL. But it 
should be noted that while all sequences containing PH domains are detected, 
not all PH domains are. Some of the split domains lie below the cutoff 
threshold. 

[ 1] Mayer B J., Ren R., Clark K.L., Baltimore D. Cell 73:629-630(1993). 
[ 2] Haslam R.J., Koide H.B., Hemmings B.A. Nature 363:309-310(1993). 
[ 3] Musacchio A., Gibson T.J., Rice P., Thompson J., Saraste M. 

Trends Biochem. ScL 18:343-348(1993). 
[ 4] Gibson T.J., Hyvonen M., Musacchio A., Saraste M., Birney E. 
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Trends Biochem. Sci. 19:349-353(1994).[ 5] Pawson T. 
Nature 373:573-580(1995).[ 6] Ingley E., Hemmings B.A. 
J. Cell. Biochem. 56:436-443(1994).[ 7] Saraste M., Hyvonen M. 
Curr. Opin. Struct. BioL 5:403-408(1995).[ 8] Riddihough G. 
Nat. Struct. BioL 1:755-757(1994). 

411. PHD-finger 
[1] 

Medline: 95216093 

The PHD finger: implications for chromatin-mediated 
transcriptional regulation. 
Aasland R, Gibson TJ, Stewart AF; 
Trends Biochem Sci 1995;20:56-59. 
Number of members: 181 

412. (PI-PLC-X) Phosphatidylinositol-specific phospholipase C profiles 
Phosphatidylinositol-specificphospholipase C (EC 3.1.4.11) ? aneukaryotic 
intracellular enzyme, plays an important role in signal transduction processes 
[1]. It catalyzes the hydrolysis of l-phosphatidyl-D-myo-inositol-3,4,5- 
triphosphate into the second messenger molecules diacylglycerol and inositol- 
1,4,5-triphosphate. This catalytic process is tightly regulated by reversible 
phosphorylation and binding of regulatory proteins [2 to 4]. 

In mammals, there are at least 6 different isoforms of PI-PLC, they differ in 
their domain structure, their regulation, and their tissue distribution. Lower 
eukaryotes also possess multiple isoforms of PI-PLC. 

All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to 
as ? X-box' and T-box'. The order of these two regions is always the same 
(NH2-X-Y-COOH), but the spacing is variable. In most isoforms, the distance 
between these two regions is only 50-100 residues but in the gamma isoforms 
one PH domain, two SH2 domains, and one SH3 domain are inserted between the 
two PLC-specific domains. The two conserved regions have been shown to be 
important for the catalytic activity. At the C-terminal of the Y-box, there is 
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a C2 domain (see <PDOC00380>) possibly involved in Ca-dependent membrane 
attachment. 

Profile analysis shows that sequences with significant similarity 
to the X-box domain occur also in prokaryotic and trypanosome Pi-specific 
phospholipases C. Apart from this region, the prokaryotic enzymes show no 
similarity to their eukaryotic counterparts. 

Two profiles were developed, one covering the X-box, the other the Y-box. 
[ 1] Meldrum E., Parker PJ., Carozzi A. 

Biochim. Biophys. Acta 1092:49-71(1991).[ 2] Rhee S.G., Choi K.D. 

Adv. Second Messenger Phosphoprotein Res. 26:35-61(1992). 
[ 3] Rhee S.G., Choi K.D. J. BioL Chem. 267:12393-12396(1992). 
[ 4] Sternweis P.C., Smrcka A.V. Trends Biochem. Sci. 17:502-506(1992). 

413. (PI-PLC-Y) Phosphatidylinositol-specific phospholipase C profiles 
Phosphatidylinositol-specificphospholipase C (EC 3.1.4.11), an eukaryotic 
intracellular enzyme, plays an important role in signal transduction processes 
[1]. It catalyzes the hydrolysis of l-phosphatidyl-D-myo-inositol-3,4,5- 
triphosphate into the second messenger molecules diacylglycerol and inositol- 
1,4,5-triphosphate. This catalytic process is tightly regulated by reversible 
phosphorylation and binding of regulatory proteins [2 to 4]. 
In mammals, there are at least 6 different isoforms of PI-PLC, they differ in 
their domain structure, their regulation, and their tissue distribution. Lower 
eukaryotes also possess multiple isoforms of PI-PLC. 

All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to 
as X-box' and *Y-box\ The order of these two regions is always the same 
(NH2-X-Y-COOH), but the spacing is variable. In most isoforms, the distance 
between these two regions is only 50-100 residues but in the gamma isoforms 
one PH domain, two SH2 domains, and one SH3 domain are inserted between the 
two PLC-specific domains. The two conserved regions have been shown to be 
important for the catalytic activity. At the C-terminal of the Y-box, there is 
a C2 domain (see <PDOC00380>) possibly involved in Ca-dependent membrane 
attachment. 

Profile analysis shows that sequences with significant similarity 
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to the X-box domain occur also in prokary otic and try panosome PI- specific 
phospholipases C. Apart from this region, the prokaryotic enzymes show no 
similarity to their eukaryotic counterparts. 

Two profiles were developed, one covering the X-box, the other the Y-box. 
[ 1] Meldrum E. ? Parker P.J., Carozzi A. 

Biochim. Biophys. Acta 1092:49-71(1991).[ 2] Rhee S.G., Choi K.D. 

Adv. Second Messenger Phosphoprotein Res. 26:35-61(1992). 
[ 3] Rhee S.G., Choi K.D. J. Biol. Chem. 267:12393-12396(1992). 
[ 4] Sternweis P.C., Smrcka A.V. Trends Biochem. Sci. 17:502-506(1992). 



414. (PK) Pyruvate kinase active site signature 

Pyruvate kinase (EC 2.7.1.40) (PK) [1] catalyzes the final step in glycolysis, 
the conversion of phosphoenolpyruvate to pyruvate with the concomitant 
phosphorylation of ADP to ATP. PK requires both magnesium and potassium ions 
for its activity. PK is found in all living organisms. In vertebrates there 
are four, tissues specific, isozymes: L (liver), R (red cells), Ml (muscle, 
heart, and brain), and M2 (early fetal tissues). In Escherichia coli there are 
two isozymes: PK-I (gene pykF) and PK-II (gene pykA). All PK isozymes seem to 
be tetramers of identical subunits of about 500 amino acid residues, 
As a signature pattern for PK a conserved region was selected that includes a 
lysine residue which seems to be the acid/base catalyst responsible for the 
interconversion of pyruvate and enolpyruvate, and a glutamic acid residue 
implicated in the binding of the magnesium ion. 

Consensus pattern: [LIVAC]-x-[LIVM](2)-[SAPCV]-K-[LIV]-E-[NKRST]-x-[DEQHS]- 
[GSTA]-[LIVM] [K is the active site residue] [E is a magnesium ligand] 

[ 1] Muirhead H. Biochem. Soc. Trans. 18:193-196(1990). 



415. (PLDc) Phospholipase D. Active site motif 
Phosphatidylcholine-hydrolyzing phospholipase D (PLD) isoforms are 
activated by ADP-ribosylation factors (ARFs). PLD produces phosphatide 
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acid from phosphatidylcholine, which may be essential for the formation 
of certain types of transport vesicles or may be constitutive vesicular 
transport to signal transduction pathways. 
PC-hydrolyzing PLD is a homologue of cardiolipin synthase, 
phosphatidylserine synthase, bacterial PLDs, and viral proteins. 
Each of these appears to possess a domain duplication which is apparent 
by the presence of two motifs containing well-conserved histidine, lysine, 
and/or asparagine residues which may contribute to the active site, 
aspartic acid. An E. coli endonuclease (nuc) and similar proteins appear 
to be PLD homologues but possess only one of these motifs. 
The profile contained here represents only the putative active site 
regions, since an accurate multiple alignment of the repeat units 
has not been achieved. 
Number of members: 139 

[1] 

Medline: 96303814 

A novel family of phospholipase D homologues that includes 
phospholipid synthases and putative endonucleases: 
identification of duplicated repeats and potential active 
site residues. 
Ponting CP, Kerr ID; 

Protein Sci 1996;5:914-922. 

[2]Medline: 96334293 
A duplicated catalytic motif in a new superfamily of 
phosphohydrolases and phospholipid synthases that includes 
poxvirus envelope proteins. 
Koonin EV; 

Trends Biochem Sci 1996;21:242-243. 

[3]Medline: 94327597 
Cloning and expression of phosphatidylcholine-hydrolyzing 
phospholipase D from Ricinus communis L. 
Wang X, Xu L, Zheng L; 

J Biol Chem 1994;269:20312-20317. 

[4]Medline: 97386825 
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Regulation of eukaryotic phosphatidylinositol-specific 
phospholipase C and phospholipase D. 
Singer WD, Brown HA, Sternweis PC; 
Annu Rev Biochem 1997;66:475-509. 

416. (PMI typel) Phosphomannose isomerase type I signatures 
Phosphomannose isomerase (EC 5.3.1.8) (PMI) [1,2] is the enzyme that catalyzes 
the interconversion of mannose-6-phosphate and fructose-6-phosphate. In 
eukaryotes, it is involved in the synthesis of GDP-mannose which is a 
constituent of N- and O-linked glycans as well as GPI anchors. In prokaryotes, 
it is involved in a variety of pathways including capsular polysaccharide 
biosynthesis and D-mannose metabolism. 

Three classes of PMI have been defined on the basis of sequence similarities 
[1]. The first class comprises all known eukaryotic PMI as well as the enzyme 
encoded by the manA gene in enterobacteria such as Escherichia coli. Class I 
PMFs are proteins of about 42 to 50 Kd which bind a zinc ion essential for 
their activity. 

As signature patterns for class I PMI, two conserved regions were selected. The 
first one is located in the N-terminal section of these proteins, the second 
in the C-terminal half. Both patterns contain a residue involved [3] in the 
binding of the zinc ion. 

Consensus pattern: Y-x-D-x-N-H-K-P-E [E is a zinc ligand] 

-Consensus pattern: H-A-Y-[LIVM]-x-G-x(2)-[LIVM]-E-x-M-A-x-S-D-N-x-[LIVM]-R-A- 
G-x-T-P-K [H is a zinc ligand] 

[ 1] Proudfoot A.EJL, Turcatti G. ? Wells T.N.C., Payton M.A., Smith D.J. Eur. J. Biochem. 
219:415-423(1994). 

[ 2] Coulin F., Magnenat E., Proudfoot A.E.I., Payton M.A., Scully P., Wells T.N.C. 
Biochemistry 32:14139-14144(1993). 

[ 3] Cleasby A., Wonacott A., Skarzynski T., Hubbard R.E., Davies GJ., Proudfoot A.E.L, 
Bernard A.R., Payton M.A., Wells T.N.C. Nat. Struct. Biol. 3:470-479(1996). 
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417. (PNP UDP 1) Purine and other phosphorylases family 1 signature 
The following phosphorylases belongs to the same family: 
-Purine nucleoside phosphorylase (EC 2.4.2.1) (PNP) from most bacteria 

(gene deoD). This enzyme catalyzes the cleavage of guanosine or inosine to 

respective bases and sugar- 1 -phosphate molecules [1]. 

- Uridine phosphorylase (EC 2.4.2.3) (UdRPase) from bacteria (gene udp) and 
mammals. Catalyzes the cleavage of uridine into uracil and ribose-1- 
phosphate. The products of the reaction are used either as carbon and 
energy sources or in the rescue of pyrimidine bases for nucleotide 
synthesis [2]. 

- S'-methylthioadenosine phosphorylase (EC 2.4.2.28) (MTA phosphorylase) from 
Sulfolobus solfataricus [3]. 

As a signature pattern, a conserved region was selected in the central part of 
these enzymes. 

Consensus pattern: [GST]-x-G-[LIVM]-G-x-[PA]-S-x-[GSTA]-I-x(3)-E-L 

-Note: it shoudl be noted that mammalian and some bacterial PNP as well as eukaryotic 

MTA phosphorylase belong to a different family of phosphorylases (see <PDOC0Q954>). 

[ 1] Takehara M. ? Ling F., Izawa S., Inoue Y., Kimura A. Biosci. Biotechnol. Biochem. 
59:1987-1990(1995). 

[ 2] Watanabe S.-I., Hino A., Wada K. ? Eliason J.F., Uchida T. J. Biol. Chem. 270:12191- 
12196(1995). 

[ 3] Cacciapuoti G., Porcelli M., Bertoldo C, De Rosa M., Zappia V. J. Biol. Chem. 
269:24762-24769(1994). 

418. (PP2C) Protein phosphatase 2C signature 

Protein phosphatase 2C (PP2C) is one of the four major classes of mammalian 
serine/threonine specific protein phosphatases (EC 3.1.3.16). PP2C [1] is a 
monomeric enzyme of about 42 Kd which shows broad substrate specificity and 
is dependent on divalent cations (mainly manganese and magnesium) for its 
activity. Its exact physiological role is still unclear. Three isozymes are 
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currently known in mammals: PP2C-alpha, -beta and -gamma. In yeast, there are 
at least four PP2C homologs: phosphatase PTC1 [2] which has weak tyrosine 
phosphatase activity in addition to its activity on serines, phosphatases PTC2 
andPTC3, and hypothetical protein YBR125c. Isozymes of PP2C are also known 
from Arabidopsis thaliana (ABI1, PPH1), Caenorhabditis elegans (FEM-2, 
F42G9.1, T23F11.1), Leishmania chagasi and Paramecium tetraurelia. 
In Arabidopsis thaliana, the kinase associated protein phosphatase (KAPP) [3] 
is an enzyme that dephosphorylates the Ser/Thr receptor-like kinase RLK5 and 
which contains a C-terminal PP2C domain. 

PP2C does not seem to be evolutionary related to the main family of serine/ 
threonine phosphatases: PP1, PP2A and PP2B . However, it is significantly 
similar to the catalytic subunit of pyruvate dehydrogenase phosphatase 
(EC 3.1.3.43) (PDPC) [4], which catalyzes dephosphorylation and concomitant 
reactivation of the alpha subunit of the El component of the pyruvate 
dehydrogenase complex. PDPC is a mitochondrial enzyme and, like PP2C, is 
magnesium-dependent. 

As a signature pattern, the best conserved region was selected which is located 
in the N-terminal part and contains a perfectly conserved tripeptide. This 
region includes a conserved aspartate residue involved in divalent cation 
binding [5]. 

Consensus pattern: [LIVMFY]-[LIVMFYA]-[GSAC]-[LIVM]-[FYC]-D-G-H-[GAV] 
-Note:PP2C belongs [6] to a superfamily which also includes bacterial proteins such as 
Bacillus spoIIE, rsbU and rsbW, Synechocystis PCC 6803 icfG as well as a domain in fungal 
adenylate cyclases. 

[ 1] Wenk J., Trompeter H.-L, Pettrich K.-G., Cohen P.T.W., Campbell D.G., Mieskes G. 
FEBS Lett. 297:135-138(1992). 

[ 2] Maeda T., Tsai A.Y.M., Saito H. Mol. Cell. Biol. 13:5408-5417(1993). 

[ 3] Stone J.M., Collinge M.A., Smith R.D., Horn M.A., Walker J.C. Science 266:793- 

795(1994). 

[ 4] Lawson J.E., Niu X.-D., Browning K.S., Trong H.L., Yan J., Reed L.J. Biochemistry 
32:8987-8993(1993). 

[ 5] Das A.K., Helps N.R., Cohen P.T.W., Barford D. EMBO J. 24:6798-6809(1996). 
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[ 6] Bork P., Brown N.P., Hegyi H., Schultz J. Protein Sci, 5:1421-1425(1996). 

419. (PPTA) Protein prenyl transferases alpha subunit repeat signature 
5 Protein prenyltransferases catalyze the transfer of an isoprenyl moiety to a 
cysteine four residues from the C-terminus of several proteins. They are 
heterodimeric enzymes consisting of alpha and beta subunits. The alpha subunit 
is thought to participate in a stable complex with the isoprenyl substrate; 
the beta subunit binds the peptide substrate. Distinct protein 
1 0 prenyltransferases might share a common alpha subunit. Both the alpha and 
beta subunit show repetitive sequence motifs [1]. These repeats have distinct 
structural and functional implications and are unrelated to each other. Known 
protein prenyltransf erase alpha subunits are: 

- Mammalian protein farnesyltransferase alpha subunit. 

15 - Yeast protein RAM2, a protein farnesyltransferase alpha subunit. 

- Yeast protein BET4, a protein geranylgeranyltransferase alpha subunit. 

The conserved domain of the alpha subunit consists of about 34 amino acids and 
is repeated five times. It contains an invariant tryptophan possibly involved 
in heterodimerization with the conserved phenylalanines in the repeated 
2 0 domains of the beta subunits, via hydrophobic bonds. The signature pattern for 
this domain is centered on the invariant tryptophan. 

Consensus pattern: [PSIAV]-x-[NDFV]-[NEQIY]-x-[LIVMAGP]-W-[NQSTHF]-[FYHQ]- 
[LIVMR] 

25 

[ 1] Boguski M.S., Murray A.W., Powers S. New Biol. 4:408-411(1992). 



420. (PR55) Protein phosphatase 2A regulatory subunit PR55 signatures 
3 0 Protein phosphatase 2A (PP2A) is a serine/threonine phosphatase involved in 
many aspects of cellular function including the regulation of metabolic 
enzymes and proteins involved in signal transduction. PP2A is a trimeric 
enzyme that consists of a core composed of a catalytic subunit associated with 
a 65 Kd regulatory subunit (PR65) ? also called subunit A; this complex then 
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associates with a third variable subunit (subunit B), which confers distinct 

properties to the holoenzyme [1]. One of the forms of the variable subunit is 

a 55 Kd protein (PR55) which is highly conserved in mammals - where three 

isoforms are known to exist Drosophila and yeast (gene CDC55). This subunit 

could perform a substrate recognition function or be responsible for targeting 

the enzyme complex to the appropriate subcellular compartment. 

As signature patterns, two perfectly conserved sequences of 15 

residues were selected; one located in the N-terminal region, the other in the center of 

the protein. 

Consensus pattern: E-F-D-Y-L-K-S-L-E-I-E-E-K-I-N 

Consensus pattern: N-[AG]-H-[TA]-Y-H-I-N-S-I-S-[LIVM]-N-S-D 

[ 1] Mayer- Jaekel R., Hemmings B.A. Trends Cell Biol. 4:287-291(1994). 

421. N-(5'phosphoribosyl)anthranilate (PRA) isomerase 
[1] Wilmanns M, Priestle JP, Niermann T, Jansonius JN; 
J Mol Biol 1992;223:477-507. 

422. (PRK) Phosphoribulokinase signature 

Phosphoribulokinase (EC 2.7.1.19) (PRK) [1,2] is one of the enzymes specific 
to the Calvin's reductive pentose phosphate cycle which is the major route by 
which carbon dioxide is assimilated and reduced by autotrophic organisms. PRK 
catalyzes the ATP -dependent phosphorylation of ribulose 5-phosphate into 
ribulose 1,5-bisphosphate which is the substrate for RubisCO. 
PRK's of diverse origins show different properties with respect to the size of 
the protein, the subunit structure, or the enzymatic regulation. However an 
alignment of the sequences of PRK from plants, algae, photosynthetic and 
chemoautotrophic bacteria shows that there are a few regions of sequence 
similarity. As a signature pattern one of these regions was selected. 



Consensus pattern: K-[LIVM]-x-R-D-x(3)-R-G-x-[ST]-x-E 
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[ 1] Kossmann J., Klintworth R., Bowien B. Gene 85:247-252(1989). 

[ 2] Gibson J.L., Chen J.-H., Tower P.A., Tabita F.R. Biochemistry 29:8085-8093(1990). 

423. (PRPP synt) Phosphoribosyl pyrophosphate synthetase signature 
Phosphoribosyl pyrophosphate synthetase (EC 2.7.6.1) (PRPP synthetase) 
catalyzes the formation of PRPP from ATP and ribose 5-phosphate. PRPP is then 
used in various biosynthetic pathways, as for example in the formation of 
purines, pyrimidines, histidine and tryptophan. PRPP synthetase requires 
inorganic phosphate and magnesium ions for its stability and activity. 
In mammals, three isozymes of PRPP synthetase are found; in yeast there are at 
least four isozymes. 

As a signature pattern for this enzyme, a very conserved region was selected 
that has been suggested to be involved in binding divalent cations [1]. This 
region contains two conserved aspartic acid residues as well as a histidine, 
which are all potential ligands for a cation such as magnesium. 

Consensus pattern: D-[LI]-H-[SA]-x-Q-[IMST]-[QM]-G-[FY]-F-x(2)-P-[LIVMFC]-D 

[ 1] Bower S.G., Harlow K.W., Switzer R.L., Hoven- Jensen B. J. Biol. Chem. 264:10287- 
10291(1989). 

424. (PRTP) Herpesvirus processing and transport protein 

The members of this family are associate with capsid intermediates during packaging of the 
virus. 

Number of members: 31 
[1] 

Medline: 98362148 

Herpes simplex virus type 1 cleavage and packaging proteins 
UL15 and UL28 are associated with B but not C capsids during 
packaging. Yu D, Weller SK; 
J Virol 1998;72:7428-7439. 
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425. Photosystem I psaG / psaK (PSI PSAK) proteins signature 

Photosystem I (PSI) [1] is an integral membrane protein complex that uses light energy to 
mediate electron transfer from plastocyanin to ferredoxin. It is found in the chloroplasts of 
plants and cyanobacteria. PSI is composed of at least 14 different subunits, two of which PSI- 
G (gene psaG) and PSI-K (gene psaK) are small hydrophobic proteins of about 7 to 9 Kd and 
evolutionary related [2]. Both seem to contain two transmembrane regions. Cyanobacteria 
seem to encode only for PSI-K. 

As a signature pattern, the best-conserved region was selected which seems to 
correspond to the second transmembrane region. 

-Consensus pattern: [GT]-F-x-[LIVM]-x-[DEA]-x(2)-[GA]-x-[GTA]-[SA]-x-G-H-x-[LIVM]- 
[GA] 

[1] Golbeck J.H. Biochim. Biophys. Acta 895:167-204(1987). 

[2] Kjaerulff S., Andersen B., Nielsen V.S., Moller B.L., Okkels J.S. J . Biol. Chem. 

268:18912-18916(1993). 



426. PTR2 family proton/oligopeptide symporters signatures 
A family of eukaryotic and prokaryotic proteins that seem to be mainly 
involved in the intake of small peptides with the concomitant uptake of a 
proton has been recently characterized [1,2]. Proteins that belong to this 
family are: - Fungal peptide transporter PTR2. 

- Mammalian intestine proton-dependent oligopeptide transporter PeptTl. 

- Mammalian kidney proton-dependent oligopeptide transporter PeptT2. 

- Drosophila optl. 

- Arabidopsis thaliana peptide transporters PTR2-A and PTR2-B (also known as 
the histidine transporting protein NTR1). 

- Arabidopsis thaliana proton-dependent nitrate/chlorate transporter CHL1. 

- Lactococcus proton-dependent di- and tri-peptide transporter dtpT. 

- Caenorhabditis elegans hypothetical protein C06G8.2. 

- Caenorhabditis elegans hypothetical protein F56F4.5. 



Attorney No. 2750-1237P 

392 

- Caenorhabditis elegans hypothetical protein K04E7.2. 

- Escherichia coli hypothetical protein ybgH. 

- Escherichia coli hypothetical protein ydgR. 

- Escherichia coli hypothetical protein yhiP. 

- Escherichia coli hypothetical protein yjdL. 

- Bacillus subtilis hypothetical protein yclR 

These integral membrane proteins are predicted to comprise twelve 
transmembrane regions. As signature patterns, two of the best 

conserved regions were selected. The first is a region that includes the end of the second 
transmembrane region, a cytoplasmic loop as well as the third transmembrane 
region. The second pattern corresponds to the core of the fifth transmembrane 
region. 

-Consensus pattern: [GA]-[GAS]-[LIVMFWA]-[LIVM]-[GAS]-D-x-[LIVMFYWT]- 

[LIVMFYW]-G-x(3)-[TAV]-[IV]-x(3)-[GSTAV]-x-[LIVMF]-x(3)-[GA] 

-Consensus pattern: [FYT]-x(2)-[LMFY]-[FYV]-[LIVMFYWA]-x-[IVG]-N-[LIVMAG]-G 

[GSA]-[LIMF] 

[ 1] Paulsen I.T., Skurray R.A. Trends Biochem. Sci. 19:404-404(1994). 

[ 2] Steiner H.-Y., Naider F., Becker J.M. Mol. Microbiol. 16:825-834(1995). 

427. Pumilio-family RNA binding domains (aka PUM-HD, Pumilio homology domain) 

Puf domains are necessary and sufficient for sequence specific 
RNA binding in fly Pumilio and worm FBF-1 and FBF-2. Both proteins 
function as translational repressors in early embryonic development 
by binding sequences in the 3* UTR of target mRNAs (e.g. the 
nanos response element (NRE) in fly Hunchback mRNA, or the point 

mutation element (PME) in worm fem-3 mRNA). Other proteins that contain Puf domains are 

also plausible RNA binding proteins. JSN1_YEAST, for instance, appears to also contain a 

single RRM domain by HMM analysis. 

Puf domains usually occur as a tandem repeat of 8 domains. 

The Pf am model does not necessarily recognize all 8 domains in 
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all sequences; some sequences appear to have 5 or 6 domains on 
initial analysis, but further analysis suggests the presence 
of additional divergent domains. 

[1] Zhang B, Gallegos M, Puoti A, Durkin E, Fields S, Kimble J, 

Wickens MP. Nature 1997;390:477-484. [2] Zamore PD, Williamson JR, Lehmann R. 

RNA 1997;3:1421-1433. 

428. PWWP domain. The PWWP domain is named after a conserved Pro-Trp-Trp-Pro motif. 
The function of the domain is currently unknown. Number of members: 19 

[1] Medline: 98282232. WHSC1, a 90 kb SET domain-containing gene, expressed in early 
development and homologous to a Drosophila dysmorphy gene maps in the Wolf-Hirschhorn 
syndrome critical region and is fused to IgH in t(4;14) multiple myeloma. Stec I, Wright TJ, 
van Ommen GJB, de Boer PAJ, van Haeringen A, Moorman AFM, Altherr MR, den Dunnen 
JT; Hum Mol Genet 1998;7:1071-1082. 

429. PX domain 

Eukaryotic domain of unknown function present in phox proteins, PLD isoforms, a PI3K 
isoform. 

Number of members : 7 1 
[1] 

Medline: 97084820 

Novel domains in NADPH oxidase subunits, sorting nexins, and 
Ptdlns 3-kinases: binding partners of SH3 domains? 
Ponting CP; 
Protein Sci 1996;5:2353-2357. 

430. ParA family ATPase 
[1] 

Medline: 91141297 
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A family of ATPases involved in active partitioning of 
diverse bacterial plasmids. 
Motallebi-Veshareh M, Rouch DA, Thomas CM; 
Mol Microbiol 1990;4:1455-1463. 
Number of members: 122 

431. (Parvo coat) Parvovirus coat protein. 72 members. 

432. Pectinesterase signatures 

Pectinesterase (EC 3.1.1.11) (pectin methylesterase) catalyzes the hydrolysis 
of pectin into pectate and methanol. In plants, it plays an important role in 
cell wall metabolism during fruit ripening. In plant bacterial pathogens such 
asErwinia carotovora and in fungal pathogens such as Aspergillus niger, 
pectinesterase is involved in maceration and soft-rotting of plant tissue. 
Prokaryotic and eukaryotic pectinesterases share a few regions of sequence 
similarity [1,2,3]. two of these regions were selected as signature patterns. 
The first is based on a region in the N-terminal section of these enzymes; it 
contains a conserved tyrosine which may play a role in the catalytic mechanism 
[3]. The second pattern corresponds to the best conserved region, an 
octapeptide located in the central part of these enzymes. 

-Consensus pattern: [GSTNP]-x(6)-[FYVHR]-[IVN]-[KEP]-x-G-[STIVKRQ]-Y- 
[DNQKRMV]-[EP]-x(3)-[LIMVA] 

-Consensus pattern: [IV]-x-G-[STAD]-[LIVT]-D-[FYI]-[IV]-[FSN]-G 

[ 1] Ray J., Knapp J., Grierson D., Bird C, Schuch W. Eur. J. Biochem. 174:119-124(1988). 

[ 2] Plastow G.S. Mol. Microbiol. 2:247-254(1988). 

[ 3] Markovic O., Joernvall H. Protein Sci. 1:1288-1292(1992). 

433. Pentapeptide repeats (8 copies) 
These repeats are found in many cyanobacterial proteins. 
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The repeats were first identified in hglK [1]. The function of 
these repeats is unknown. 

The structure of this repeat has been predicted to be a 
beta-helix [2]. 

The repeat can be approximately described as A(D/N)LXX ? where 
X can be any amino acid.Number of members: 75 
[1] 

Medline: 96062225 

The hglK gene is required for localization of 
heterocyst-specific glycolipids in the cyanobacterium 
Anabaena sp. strain PCC 7120. 
Black K, Buikema WJ, Haselkorn R; 
J Bacteriol 1995;177:6440-6448. 
[2]Medline: 98318059 
Structure and distribution of pentapeptide repeats in 
bacteria. 

Bateman A, Murzin A, Teichmann SA; 
Protein Sci 1998;7:1477-1480. 
[3]Medline: 98316713 
Characterisation of an Arabidopsis cDNA encoding a thylakoid 
lumen protein related to a novel 'pentapeptide repeat 1 family 
of proteins. 

Kieselbach T, Mant A, Robinson C, Schroder WP; 
FEBS Lett 1998;428:241-244. 

434. Polypeptide deformylase 
[1] 

Medline: 97002011 

A new subclass of the zinc metalloproteases superfamily 
revealed by the solution structure of peptide deformylase. 
Meinnel T, Blanquet S, Dardel F; 

J Mol Biol 1996;262:375-386. 

[2]Medline: 98332750 
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Solution structure of nickel-peptide deformylase. 
Dardel F, Ragusa S, Lazennec C, Blanquet S, Meinnel T; 
J Mol Biol 1998;280:501-513. 
Number of members: 21 

435. Peptidyl-tRNA hydrolase signatures 

Peptidyl-tRNA hydrolase (EC 3.1.1.29) (PTH) is a bacterial enzyme that cleaves 
peptidyl-tRNA or N-acyl-aminoacyl-tRNA to yield free peptides or N-acyl-amino 
acids and tRNA. The natural substrate for this enzyme may be peptidyl-tRNA 
which drop off the ribosome during protein synthesis [1,2]. Bacterial PTH has 
been found [2,3] to be evolutionary related to yeast hypothetical protein 
YHR189w. 

PTH and YHR189w are proteins of about 200 amino acid residues. As signature 
patterns, two conserved regions were selected that each contain an histidine. 
The first of these regions is located in the N-terminal section, the other in 
the central part. 

-Consensus pattern: [FY]-x(2)-T-R-H-N-x-G-x(2)-[LIVMFA](2)-[DE] 
-Consensus pattern: [GS]-x(3)-H-N-G-[LIVM]-[KR]-[DNS]-[LIVMT] 

[ 1] Garcia- Villegas M.R., De La Vega F.M., Galindo J.M., Segura M., Buckingham R.H., 
Guarneros G. EMBO J. 10:3549-3555(1991). 

[ 2] De La Vega F.M., Galindo J.M., Old I.G., Guarneros G. Gene 169:97-100(1996). 
[ 3] Ouzounis C, Bork P., Casari G. ? Sander C Protein Sci. 4:2424-2428(1995). 

436. (Peptidase M17) Cytosol aminopeptidase signature 
Cytosol aminopeptidase is a eukaryotic cytosolic zinc-dependent exopeptidase 
that catalyzes the removal of unsubstituted amino-acid residues from the 
N-terminus of proteins. This enzyme is often known as leucine aminopeptidase 
(EC 3.4.11.1) (LAP) but has been shown [1] to be identical with prolyl 
aminopeptidase (EC 3.4.11.5). Cytosol aminopeptidase is a hexamer of identical 
chains, each of which binds two zinc ions. 
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Cytosol aminopeptidase is highly similar to Escherichia coli pepA, a manganese 
dependent aminopeptidase. Residues involved in zinc ion-binding [2] in the 
mammalian enzyme are absolutely conserved in pepA where they presumably bind 
manganese. 

A cytosol aminopeptidase from Rickettsia prowazekii [3] and one from 

Arabidopsis thaliana also belong to this family. 

As a signature pattern for these enzymes, a perfectly conserved 

octapeptide was selected which contains two residues involved in binding metal ions: an 
aspartate and a glutamate. 

-Consensus pattern: N-T-D-A-E-G-R-L [The D and the E are zinc/manganese ligands] 
-Note: these proteins belong to family M17 in the classification of peptidases [4,E1]. 

[ 1] Matsushima M., Takahashi T., Ichinose M., Miki K., Kurokawa K., Takahashi K. 
Biochem. Biophys. Res. Commun. 178:1459-1464(1991). 

[ 2] Burley S.K., David P.R., Sweet R.M., Taylor A., Lipscomb W.N. J. Mol. Biol. 224:113- 
140(1992). 

[ 3] Wood D.O., Solomon M.J., Speed R.R. J. Bacteriol. 175:159-165(1993). 
[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

437. Assemblin (Peptidase family S21) 
[1] 

Medline: 96399137 

Three-dimensional structure of human cytomegalovirus 
protease. 

Shieh HS, Kurumbail RG, Stevens AM, Stegeman RA, Sturman EJ, 
Pak JY, Wittwer AJ, Palmier MO, Wiegand RC, Holwerda BC, 
Stallings WC; 
Nature 1996;383:279-282. 
Number of members: 29 



438. Pollen proteins Ole e I family signature 
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The following plant pollen proteins, whose biological function is not yet 
known, are structurally related [1]: 

- Olive tree pollen major allergen (Ole e I). 

- Tomato anther-specific protein LAT52. - Maize pollen-specific protein ZmC13. 

These proteins are most probably secreted and consist of about 145 residues. 

As shown in the following schematic representation, there are six cysteines 

which are conserved in the sequence of these proteins. They seem to be 

involved in disulfide bonds. 

xxxxxxCxCxxxxxxxxxCxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxxxxxxxxxxxCxxxxxxx 

******i C f. conserved cysteine involved in a disulfide bond. 

'*': position of the pattern. 

-Consensus pattern: [EQ]-G-x-V-Y-C-D-T-C-R [The two Cs are probably involved in 
disulfide bonds] 

[ 1] Villalba M., Batanero E., Lopez-Otin C, Sanchez L.M., Monsalve R.L, Gonzalez De La 
Pena MA, Lahoz C. ? Rodriguez R. Eur. J. Biochem. 216:863-869(1993). 

439. Pollen allergen 

This family contains allergens lol PI, PII and PHI from Lolium perenne. 
Number of members: 49 
[1] 

Medline: 90105394 

Complete primary structure of a Lolium perenne (perennial rye 
grass) pollen allergen, Lol p III: comparison with known Lol 
p I and II sequences. 
Ansari AA, Shenbagamurthi P ? Marsh DG; 
Biochemistry 1989;28:8665-8670. 

440. Porphobilinogen deaminase cofactor-binding site 

Porphobilinogen deaminase (EC 4.3.1.8), or hydroxymethylbilane synthase, is an 
enzyme involved in the biosynthesis of porphyrins and related macrocycles. It 
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catalyzes the assembly of four porphobilinogen (PBG) units in a head to tail 
fashion to form hydroxymethylbilane. 

The enzyme covalently binds a dipyrromethane cofactor to which the PBG 
subunits are added in a stepwise fashion. In the Escherichia coli enzyme (gene 
hemC), this cofactor has been shown [1] to be bound by the sulfur atom of a 
cysteine. The region around this cysteine is conserved in porphobilinogen 
deaminases from various prokaryotic and eukaryotic sources. 

-Consensus pattern: E-R-x-[LIVMFA]-x(3)~[LIVMF]-x-G-[GSA]-C-x-[IVT]-P-[LIVMF]- 
[GSA] [C is the cofactor attachment site] 

[ 1] Miller A.D., Hart G J., Packman L.C., Battersby A.R. Biochem. J. 254:915-918(1988). 
441. Presenilin 

Mutations in presenilin-1 are a major cause of early onset Alzheimer's disease [2], It has 
been found that presenilin-1 (Swiss:P49768) binds to beta-catenin in vivo [4]. This family 
also contains SPE proteins from C.elegans. 
Number of members: 23 

[i] 

Medline: 98045995 
Presenilins and Alzheimer's disease. 
Kim TW, Tanzi RE; 
Curr Opin Neurobiol 1997;7:683-688. 
[2]MedIine: 98045995 
Presenilins and Alzheimer's disease. 
Kim TW, Tanzi RE; 
Curr Opin Neurobiol 1997;7:683-688. 
[3]Medline: 98099802 
Interaction of presenilins with the filamin family of 
actin-binding proteins. 
Zhang W, Han SW, McKeel DW, Goate A, Wu JY; 
J Neurosci 1998;18:914-922. 
[4]Medline: 99004850 
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Destabilisation of beta-catenin by mutations in presenilis 1 

potentiates neuronal apoptosis. 
Zhang Z, Hartmann H, Do VM, Abramowski D, Sturchler-Pierrat 
C, Staufenbiel M, Sommer B, van de Wetering M, Clevers H, 
Saftig P, De Strooper B ? He X, Yankner BA; 
Nature 1998;395:698-702. 

442. (Pribosyltran) Purine/pyrimidine phosphoribosyl transferases signature 
Phosphoribosyltransferases (PRT) are enzymes that catalyze the synthesis of 
beta-n-5 '-monophosphates from phosphoribosylpyrophosphate (PRPP) and an enzyme 
specific amine. A number of PRT's are involved in the biosynthesis of purine, 
pyrimidine, and pyridine nucleotides, or in the salvage of purines and 
pyrimidines. These enzymes are: 

- Adenine phosphoribosyltransferase (EC 2.4.2.7) (APRT), which is involved in 
purine salvage. 

- Hypoxanthine-guanine or hypoxanthine phosphoribosyltransferase (EC 2.4.2.8) 
(HGPRT or HPRT), which are involved in purine salvage. 

- Orotate phosphoribosyltransferase (EC 2.4.2.10) (OPRT), which is involved 
in pyrimidine biosynthesis. 

- Amido phosphoribosyltransferase (EC 2.4.2.14), which is involved in purine 
biosynthesis. 

- Xanthine-guanine phosphoribosyltransferase (EC 2.4.2.22) (XGPRT), which is 
involved in purine salvage. 

In the sequence of all these enzymes there is a small conserved region which 
may be involved in the enzymatic activity and/or be part of the PRPP binding 
site [1]. 

-Consensus pattern: [LIVMFYWCTA]-[LIVM]-[LIVMA]-[LIVMFC]-[DE]-D-[LIVMS]- 

[LIVM]-[STAVD]-[STAR]-[GAC]-x-[STAR] 

-Note: in position 11 of the pattern most of these enzymes have Gly. 



[ 1] Hershey H.V., Taylor M.W. Gene 43:287-293(1986). 
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443. (Pro CA) 

Prokaryotiotype carbonic anhydrases signatures 

5 Carbonic anhydrases (EC 4.2.1.1) (CA) are zinc metalloenzymes which catalyze the 

reversible hydration of carbon dioxide. In Escherichia coli, CA (gene cynT) is involved in 
recycling carbon dioxide formed in the bicarbonate-dependent decomposition of cyanate by 
cyanase (gene cynS). By this action, it prevents the depletion of cellular bicarbonate [1]. In 
photosynthetic bacteria and plant chloroplast, CA is essential to inorganic carbon fixation [2], 

1 0 Prokaryotic and plant chloroplast CA are structurally and evolutionary related and form a 
family distinct from the one which groups the many different forms of eukaryotic CA's (see 
<PDOC00146>). Hypothetical proteins yadF from Escherichia coli and HI1301 from 
Haemophilus influenzae also belong to this family. Two signature patterns were developed 
for this family of enzymes. Both patterns contain conserved residues that could be involved 

15 in binding zinc (cysteine and histidine). 

-Consensus pattern: C-[SA]-D-S-R-[LIVM]-x-[AP] 

-Consensus pattern: [EO]-Y-A-[LIVM]-x(2)-[LIVM]-x(4)-[LIVMF](3)-x-G-H-x(2)-C-G 

20 [1] Guilloton M.B., Korte J.J., Lamblin A.F., Fuchs J.A., Anderson P.M. J. BioL Chem. 
267:3731-3734(1992). 

[ 2] Fukuzawa H., Suzuki R, Komukai Y., Miyachi S. Proc. Natl. Acad. ScL U.S.A. 
89:4437-4441(1992). 

25 

444. (Prolyl_oligopep) 

Prolyl oligopeptidase family serine active site 

The prolyl oligopeptidase family [1,2,3] consist of a number of evolutionary related 
3 0 peptidases whose catalytic activity seems to be provided by a charge relay system similar to 
that of the trypsin family of serine proteases, but which evolved by independent convergent 
evolution. The known members of this family are listed below. 
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- Prolyl endopeptidase (EC 3.4.21.26) (PE) (also called post-proline cleaving enzyme). PE is 
an enzyme that cleaves peptide bonds on the C-terminal side of prolyl residues. The sequence 
of PE has been obtained from a mammalian species (pig) and from bacteria (Flavobacterium 
meningosepticum and Aeromonas hydrophila); there is a high degree of sequence 
conservation between these sequences. 

- Escherichia coli protease II (EC 3.4.21.83) (oligopeptidase B) (gene prtB) which cleaves 
peptide bonds on the C-terminal side of lysyl and argininyl residues. 

- Dipeptidyl peptidase IV (EC 3.4.14.5) (DPP IV). DPP IV is an enzyme that removes N- 
terminal dipeptides sequentially from polypeptides having unsubstituted N-termini provided 
that the penultimate residue is proline. 

- Yeast vacuolar dipeptidyl aminopeptidase A (DPAP A) (gene: STE13) which is responsible 
for the proteolytic maturation of the alpha-factor precursor. 

- Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene: DAP2). 

- Acylamino-acid-releasing enzyme (EC 3.4.19.1) (acyl-peptide hydrolase). 

This enzyme catalyzes the hydrolysis of the amino-terminal peptide bond of an N-acetylated 
protein to generate a N-acetylated amino acid and a protein with a free amino-terminus. 

A conserved serine residue has experimentally been shown (in E.coli proteasell as well as in 
pig and bacterial PE) to be necessary for the catalytic mechanism. This serine, which is part 
of the catalytic triad (Ser, His, Asp), is generally located about 150 residues away from the C- 
terminal extremity of these enzymes (which are all proteins that contains about 700 to 800 
amino acids). 

Consensus pattern: D-x(3)-A-x(3)-[LIVMFYW]-x(14)-G-x-S-x-G-G-[LIVMFYW](2) [S is 
the active site residue] Sequences known to belong to this class detected by the pattern ALL, 
except for yeast DPAP A. 

Note: these proteins belong to families S9A/S9B/S9C in the classification of peptidases [4]. 
[ 1] Rawlings N.D., Polgar L., Barrett A.J. Biochem. J. 279:907-911(1991). 
[ 2] Barrett AJ. ? Rawlings N.D. 



Attorney No. 2750-1237P 
[ 3] Polgar L., Szabo E. 



403 



[ 4] Rawlings N.D., Barrett AJ. Meth. Enzymol. 244:19-61(1994). 

445. (Pterin 4a) 

Pterin 4 alpha carbinolamine dehydratase 

Pterin 4 alpha carbinolamine dehydratase is aka DCoH (dimerisation cofactor of hepatocyte 
nuclear factor 1 -alpha). 

Number of members: 11 

[1] Cronk JD, Endrizzi JA ? Alber T; Medline: 97052967 High-resolution structures of the 
bifunctional enzyme and transcriptional coactivator DCoH and its complex with a product 
analogue." Protein Sci 1996;5:1963-1972. 

446. (Pyridox oxidase) 

Pyridoxamine 5 '-phosphate oxidase signature 

Pyridoxamine 5 ! -phosphate oxidase (EC 1.4.3.5) is a FMN flavoprotein involved in the de 
novo synthesis of pyridoxine (vitamin B6) and pyridoxal phosphate. It oxidizes 
pyridoxamine-5-P (PMP) and pyridoxine-5-P (PNP) to pyridoxal-5-P. The sequences of the 
enzyme from bacterial (genes pdxH or fprA) [1] and fungal (gene PDX3) [2] sources show 
that this protein has been highly conserved throughout evolution. 
PdxH is evolutionary related [3] to one of the enzymes in the phenazine biosynthesis 
protein pathway, phzD (also known as phzG). As a signature pattern, a highly conserved 
region was selected located in the C-terminal part of these enzymes. 

-Consensus pattern: [LIVF]-E-F-W-[QHG]-x(4)-R-[LIVM]-H-[DNE]-R 

[ 1] Lam H.-M. ? Winkler M.E. J. BacterioL 174:6033-6045(1992). 

[ 2] Loubbardi A., Karst R, Guilloton M., Marcireau C. J. BacterioL 177:1817-1823(1995). 
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[ 3] Pierson L.S. III, Gaffney T. ? Lam S., Gong F. FEMS Microbiol. Lett. 134:299- 
307(1995). 

447. (Pyrophosphatase) 

Inorganic pyrophosphatase signature 

Inorganic pyrophosphatase (EC 3.6.1.1) (PPase) [1,2] is the enzyme responsible for the 
hydrolysis of pyrophosphate (PPi) which is formed principally as the product of the many 
biosynthetic reactions that utilize ATP. All known Ppases require the presence of divalent 
metal cations, with magnesium conferring the highest activity. Among other residues, a 
lysine has been postulated to be part or close to the active site. PPases have been sequenced 
from bacteria such as Escherichia coli (homohexamer), thermophilic bacteria PS-3 and 
Thermus thermophilus, from the archaebacteria Thermoplasma acidophilum, from fungi 
(homodimer), from a plant, and from bovine retina. In yeast, a mitochondrial isoform of 
PPase has been characterized which seems to be involved in energy production and whose 
activity is stimulated by uncouplers of ATP synthesis. 

The sequences of PPases share some regions of similarities. As signature patterns a region 
was selected that contains three conserved aspartates that are involved in the binding of 
cations. 

-Consensus pattern: D-[SGDN]-D-[PE]~[LIVMF]-D-[LIVMGAC] 
[The three D's bind divalent metal cations] 

[ 1] Lahti R., Kolakowski L.F. Jr., Heinonen L, Vihinen M. ? Pohjanoksa K., Cooperman 
B.S. Biochim, Biophys. Acta 1038:338-345(1990). 

[ 2] Cooperman B.S., Baykov A.A., Lahti R. Trends Biochem. Sci. 17:262-266(1992). 



448. (Peptidase S26) 

Signal peptidases I signatures. 
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Signal peptidases (SPases) [1] (aka leader peptidases) remove the signal peptides from 
secretory proteins. In prokaryotes three types of SPasesare known: type I (gene lepB) which 
is responsible for the processing of the majority of exported pre-proteins; type II (gene lsp) 
which only process lipoproteins, and a third type involved in the processing of pili subunits. 
SPase I (EC 3.4.21.89) is an integral membrane protein that is anchored in the cytoplasmic 
membrane by one (in B. subtilis) or two (in E. coli) N-terminal transmembrane domains with 
the main part of the protein protuding in the periplasmic space. Two residues have been 
shown [2,3] to be essential for the catalytic activity of SPase I: a serine and an lysine. SPase I 
is evolutionary related to the yeast mitochondrial inner membrane protease subunit 1 and 2 
(genes IMP1 and IMP2) which catalyze the removal of signal peptides required for the 
targeting of proteins from the mitochondrial matrix, across the inner membrane, into the 
inter-membrane space [4]. In eukaryotes the removal of signal peptides is effected by an 
oligomeric enzymatic complex composed of at least five subunits: the signal peptidase 
complex (SPC). The SPC is located in the endoplasmic reticulum membrane. Two 
components of mammalian SPC ? the 18 Kd (SPC18) and the 21 Kd (SPC21) subunits as well 
as the yeast SEC11 subunit have been shown [5] to share regions of sequence similarity with 
prokaryotic SPases I and yeast IMP1/IMP2. Three signature patterns have been developed for 
these proteins. The first signature contains the putative active site serine, the second signature 
contains the putative active site lysine which is not conserved in the SPC subunits, and the 
third signature corresponds to a conserved region of unknown biological significance which 
is located in the C-terminal section of all these proteins. 

Consensus pattern: [GS]-x-S-M-x-[PS]-[AT]-[LF] [S is an active site residue]- 

Consensus pattern: K-R-[LIVMSTA](2)-G-x-[PG]-G-[DE]-x-[LIVM]-x-[LIVMFY] [Kis an 

active site residue] - 

Consensus pattern: [LIVMFYW](2)-x(2)-G-D-[NH]-x(3)-[SND]-x(2)-[SG]- 

[ 1] Dalbey R.E., von Heijne G. Trends Biochem. Sci. 17:474-478(1992).[ 2] Sung M., 
Dalbey R.E. J. Biol. Chem. 267:13154-13159(1992).[ 3] Black M.T. J. Bacteriol. 175:4957- 
4961(1993).[ 4] Nunnari J., Fox T.D., Walter P. Science 262: 1997-2004(1 993). [ 5] van Dijl 
J.M., de Jong A., Vehmaanpera J. ? Venema G., Bron S. EMBO J. 11:2819-2828(1992).[ 6] 
Rawlings N.D., Barrett A.J. Meth. EnzymoL 244:19-61(1994).[E1] 
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449. (Peptidase CI) Eukaryotic thiol (cysteine) proteases active sites. Eukaryotic thiol 
proteases (EC 3.4.22.-) [1] are a family of proteolytic enzymes which contain an active site 
cysteine. Catalysis proceeds through a thioester intermediate and is facilitated by a nearby 
histidine side chain; an asparagine completes the essential catalytic triad. The proteases 
which are currently known to belong to this family are listed below (references are only 
provided for recently determined sequences). - Vertebrate lysosomal cathepsins B (EC 
3A22.1 Y H (EC 3.4.22.16) . L (EC 3.4.22.151 and S (EC 3.4.22.27) [2]. - Vertebrate 
lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as cathepsin C) [2]. - Vertebrate 
calpains (EC 3.4,22.17 ). Calpains are intracellular calcium- activated thiol protease that 
contain both a N-terminal catalytic domain and a C-terminal calcium-binding domain. - 
Mammalian cathepsin K, which seems involved in osteoclastic bone resorption [3]. - Human 
cathepsin O [4]. - Bleomycin hydrolase. An enzyme that catalyzes the inactivation of the 
antitumor drug BLM (a glycopeptide). - Plant enzymes: barley aleurain (EC 3A22J6), EP- 
B1/B4; kidney bean EP-C1, rice bean SH-EP; kiwi fruit actinidin (EC 3A22J4); papaya 
latex papain (EC 3.4.22.2 ). chymopapain (EC 3.4.22.6), caricain (EC 3-4.22.3fl), and 
proteinase IV (EC 3A22,25); pea turgor-responsive protein 15 A; pineapple stem bromelain 
(EC 3-4.2232): rape COT44; rice oryzain alpha, beta, and gamma; tomato low-temperature 
induced, Arabidopsis thaliana A494, RD19A and RD21A. - House-dust mites allergens 
DerPl and EurMl. - Cathepsin B-like proteinases from the worms Caenorhabditis elegans 
(genes gcp-1, cpr-3, cpr-4, cpr-5 and cpr-6), Schistosoma mansoni (antigen SM31) and 
Japonica (antigen SJ31), Haemonchus contortus (genes AC-1 and AC-2), and Ostertagia 
ostertagi (CP-1 and CP-3). - Slime mold cysteine proteinases CP1 and CP2. - Cruzipain from 
Trypanosoma cruzi and brucei. - Throphozoite cysteine proteinase (TCP) from various 
Plasmodium species. - Proteases from Leishmania mexicana, Theileria annulata and Theileria 
parva. - Baculoviruses cathepsin-like enzyme (v-cath). - Drosophila small optic lobes protein 
(gene sol), a neuronal protein that contains a calpain-like domain. - Yeast thiol protease 
BLH1/YCP1/LAP3. - Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like 
protein. Two bacterial peptidases are also part of this family: - Aminopeptidase C from 
Lactococcus lactis (gene pepC) [5]. - Thiol protease tpr from Porphyromonas gingivalis. 
Three other proteins are structurally related to this family, but may have lost their proteolytic 
activity. - Soybean oil body protein P34. This protein has its active site cysteine replaced by a 
glycine. - Rat testin, a Sertoli cell secretory protein highly similar to cathepsin L but with the 
active site cysteine is replaced by a serine. Rat testin should not be confused with mouse 
testin which is a LIM-domain protein (see < PDOC00382 >). - Plasmodium falciparum serine- 
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repeat protein (SERA), the major blood stage antigen. This protein of 111 Kd possesses a C- 
terminal thiol-protease-like domain [6], but the active site cysteine is replaced by a serine. 
The sequences around the three active site residues are well conserved and can be used as 
signature patterns. 

Consensus pattern: Q-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC]-[STAGCV] [C is the active site 
residue]- Note: the residue in position 4 of the pattern is almost always cysteine; the only 
exceptions are calpains (Leu), bleomycin hydrolase (Ser) and yeast YCP1 (Ser). -Note: the 
residue in position 5 of the pattern is always Gly except in papaya protease IV where it is 
Glu. 

Consensus pattern: [LIVMGSTAN]-x-H-[GSACE]-[LIVM]-x-[LIVMAT](2)-G-x- 
[GSADNH] [H is the active site residue] - 

Consensus pattern: [FYCH]-[WI]-[LIVT]-x-[KRQAG]-N-[ST]-W-x(3)-[FYW]-G-x(2)-G- 
[LFYW]-[LIVMFYG]-x-[LIVMF] [N is the active site residue] - Note: these proteins belong 
to family CI (papain-type) and C2 (calpains) in the classification of peptidases [7,E1].- 

[ 1] Dufour E. Biochimie 70:1335-1342(1988).[ 2] Kirschke H., Barrett A.J., Rawlings N.D. 
Protein Prof. 2:1587-1643(1995).[ 3] Shi G.-P., Chapman H.A., Bhairi S.M., Deleeuw C, 
Reddy V.Y., Weiss S.J. FEBS Lett. 357:129-134(1995).[ 4] Velasco G., Ferrando A.A., 
Puente X.S., Sanchez L.M., Lopez-Otin C. J. Biol. Chem. 269:27136-27142(1994).[ 5] 
Chapot-Chartier M.P., Nardi M., Chopin M.C., Chopin A., Gripon J.C. Appl. Environ. 
Microbiol. 59:330-333(1993).[ 6] Higgins D.G., McConnell D.J., Sharp P.M. Nature 
340:604-604(1989).[ 7] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

450. (peptidase M24) Aminopeptidase P and proline dipeptidase signature (1). 
Aminopeptidase P (EC 3.4.11.9) is the enzyme responsible for the release of any N-terminal 
amino acid adjacent to a proline residue. Proline dipeptidase(EC 3,4.13,9) (prolidase) splits 
dipeptides with a prolyl residue in the carboxyl terminal position. Bacterial aminopeptidase P 
II (gene pepP) [1], proline dipeptidase (gene pepQ)[2], and human proline dipeptidase (gene 
PEPD) [3] are evolutionary related. These proteins are manganese metalloenzymes. Yeast 
hypothetical proteins YER078c and YFR006w and Mycobacterium tuberculosis hypothetical 
protein MtCY49.29c also belong to this family. As a signature pattern for these enzymes a 
conserved region that contains three histidine residues has been developed 
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Consensus pattern: [HA]-[GSYR]-[LIVMT]-[SG]-H-x-[LIV]-G-[LIVM]-x-[IV]-H-[DE]- 

[ 1] Yoshimoto T., Tone H., Honda T., Osatomi K., Kobayashi R., Tsuru D. J. Biochem. 
105:412-416(1989).[ 2] Nakahigashi K., Inokuchi H. Nucleic Acids Res. 18:6439- 
6439(1990).[ 3] Endo F., Tanoue A., Nakai H., Hata A., Indo Y., Titani K., Matsuda I. J. 
Biol. Chem. 264:4476-4481(1989).[ 4] Rawlings N.D., Barrett AJ. Meth. Enzymol. 248:183- 
228(1995). 

Methionine aminopeptidase signatures. (2). Methionine aminopeptidase (EC 3.4.11.18) 
(MAP) is responsible for the removal of the amino-terminal (initiator) methionine from 
nascent eukaryotic cytosolic and cytoplasmic prokaryotic proteins if the penultimate amino 
acid is small and uncharged. All MAP studied to date are monomeric proteins that require 
cobalt ions for activity. Two subfamilies of MAP enzymes are known to exist [1,2]. While 
being evolutionary related, they only share a limited amount of sequence similarity mostly 
clustered around the residues shown, in the Escherichia coli MAP [3],to be involved in 
cobalt-binding. The first family consists of enzymes from prokaryotes as well as 
eukaryoticMAP-1, while the second group is made up of archebacterial MAP and 
eukaryoticMAP-2. The second subfamily also includes proteins which do not seem to be 
MAP, but that are clearly evolutionary related such as mouse proliferation-associated protein 
1 and fission yeast curved DNA-binding protein. For each of these subfamilies, a specific 
signature pattern that includes residues known to be involved in colbalt-binding has been 
developed. 

Consensus pattern: [MFY]-x-G-H-G-[LIVMC]-[GSH]-x(3)-H-x(4)-[LIVM]-x-[HN]- [YWV] 
[H is a cobalt ligand]- 

Consensus pattern: [DA]-[LIVMY]-x-K-[LIVM]-D-x-G-x-[HQ]-[LIVM]-[DNS]-G-x(3)- 
[DN] [The second D and the last D/N are cobalt ligands] 

[ 1] Arfin S.M., Kendall R.L., Hall L., Weaver L.H., Stewart A.E., Matthews B.W., 
Bradshaw R.A. Proc. Natl. Acad. Sci. U.S.A. 92:7714-7718(1995).[ 2] Keeling P.J., Doolittle 
W.F. Trends Biochem. Sci. 21:285-286(1996).[ 3] Roderick S.L., Mathews B.W. 
Biochemistry 32:3907-3912(1993).[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183- 
228(1995). 
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451. Cytochrome P450 cysteine heme-iron ligand signature 

Cytochrome P450's [1,2,3, El] are a group of enzymes involved in the oxidative metabolism 
of a high number of natural compounds (such as steroids, fatty acids, prostaglandins, 
leukotrienes, etc) as well as drugs, carcinogens and mutagens. Based on sequence similarities, 
P450 1 s have been classified into about forty different families [4,5], P450 f s are proteins of 
400 to 530 amino acids; the only exception is Bacillus BM-3 (CYP102) which is a protein of 
1048residues that contains a N-terminal P450 domain followed by a reductase domain. 
P450's are heme proteins. A conserved cysteine residue in the C-terminal part of P450's is 
involved in binding the heme iron in the fifth coordination site. From a region around this 
residue, a ten residue signature was developed specific to P450's. 

Consensus pattern: [FW]-[SGNH]-x-[GD]-x-[RHPT]-x-C-[LIVMFAP]-[GAD] [C is the 
heme iron ligand]- 

[ 1] Nebert D.W., Gonzalez F.J, Annu. Rev. Biochem. 56:945-993(1987). 

[ 2] Coon MJ. ? Ding X., Pernecky S.J., Vaz A.D.N. FASEB J. 6:669-673(1992). 

[ 3] Guengerich F.P. J. Biol. Chem. 266:10019-10022(1991). 

[ 4] Nelson D.R., Kamataki T., Waxman DJL, Guengerich F.P., Estrabrook R.W., Feyereisen 
R., Gonzalez F.J., Coon M.J., Gunsalus I.C., Gotoh O., Okuda K., Nebert D.W. DNA Cell 
Biol. 12:1-51(1993). 

[ 5] Degtyarenko K.N., Archakov AX FEBS Lett. 332:1-8(1993). 

452. (Pec Lyase) Pectate lyase 

This enzyme forms a right handed beta helix structure. Pectate lyase is an enzyme 
involved in the maceration and soft rotting of plant tissue. 

[1] Yoder MD, Keen NT, Jurnak F, Science 1993;260:1503-1507. 

453. (pep M24) Aminopeptidase P and proline dipeptidase signature (pepl) 
Aminopeptidase P (EC 3.4.11.9 ) is the enzyme responsible for the release of any N-terminal 
amino acid adjacent to a proline residue. Proline dipeptidase(EC 3.4.13.9) (prolidase) splits 
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dipeptides with a prolyl residue in the carboxyl terminal position. Bacterial aminopeptidase P 
II (gene pepP) [1], proline dipeptidase (gene pepQ)[2], and human proline dipeptidase (gene 
PEPD) [3] are evolutionary related. These proteins are manganese metalloenzymes. Yeast 
hypothetical proteins YER078c and YFR006w and Mycobacterium tuberculosis .hypothetical 
protein MtCY49.29c also belong to this family. As a signature pattern for these enzymes a 
conserved region was selected that contains three histidine residues. 

Consensus pattern: [HA]-[GSYR]-[LIVMT]-[SG]-H-x-[LIV]-G-[LIVM]-x-[IV]-H-[DE]- 

[ 1] Yoshimoto T., Tone H., Honda T., Osatomi K., Kobayashi R., Tsuru D. J. Biochem. 
105:412-416(1989). 

[ 2] Nakahigashi K., Inokuchi H. Nucleic Acids Res. 18:6439-6439(1990). 

[ 3] Endo F., Tanoue A., Nakai H., Hata A., Indo Y., Titani K., Matsuda I. J. Biol. Chem. 

264:4476-4481(1989). 

[ 4] Rawlings N.D., Barrett AJ. Meth. Enzymol. 248:183-228(1995). 
Methionine aminopeptidase signatures (pep2) 

Methionine aminopeptidase (EC 3.4.11.18^ (MAP) is responsible for the removal of the 
amino-terminal (initiator) methionine from nascent eukaryotic cytosolic and cytoplasmic 
prokaryotic proteins if the penultimate amino acid is small and uncharged. All MAP studied 
to date are monomeric proteins that require cobalt ions for activity. Two subfamilies of MAP 
enzymes are known to exist [1,2]. While being evolutionary related, they only share a limited 
amount of sequence similarity mostly clustered around the residues shown, in the Escherichia 
coli MAP [3],to be involved in cobalt-binding. The first family consists of enzymes from 
prokaryotes as well as eukaryotic MAP-1, while the second group is made up of 
archebacterial MAP and eukaryotic MAP-2. The second subfamily also includes proteins 
which do not seem to be MAP, but that are clearly evolutionary related such as mouse 
proliferation-associated protein 1 and fission yeast curved DNA-binding protein. For each of 
these subfamilies, a specific signature pattern was developed that includes residues known to 
be involved in colbalt-binding. 



Consensus pattern: [MFY]-x-G-H-G-[LIVMC]-[GSH]-x(3)-H-x(4)-[LIVM]-x-[HN]- [YWV] 
[H is a cobalt ligand]- 
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Consensus pattern: [DA]-[LIVMY]-x-K-[LIVM]-D-x-G-x-[HQ]-[LIVM]-[DNS]-G-x(3)- 
[DN] [The second D and the last D/N are cobalt ligands] 

[ 1] Arfin S.M., Kendall R.L., Hall L., Weaver L.H., Stewart A.E., Matthews B.W., 

Bradshaw R.A. Proc. Natl. Acad. Sci. U.S.A. 92:7714-7718(1995). 

[ 2] Keeling P.J., Doolittle W.F. Trends Biochem. Sci. 21:285-286(1996). 

[ 3] Roderick S.L., Mathews B.W. Biochemistry 32:3907-3912(1993). 

[ 4] Rawlings N.D., Barrett AJ. Meth. Enzymol. 248:183-228(1995). 

454. Peroxidases signatures 

Peroxidases (EC 1.11.1.-) [1] are heme-binding enzymes that carry out a variety of 
biosynthetiq and degradative functions using hydrogen peroxide as the electron acceptor. 
Peroxidases are widely distributed throughout bacteria, fungi, plants, and vertebrates. In 
peroxidases the heme prosthetic group is protoporphyrin IX and the fifth ligand of the heme 
iron is a histidine (known as the proximal histidine). Another histidine residue (the distal 
histidine) serves as an acid-base catalyst in the reaction between hydrogen peroxide and the 
enzyme. The regions around these two active site residues are more or less conserved in a 
majority of peroxidases [2,3]. The enzymes in which one or both of these regions can be 
found are listed below. - Yeast cytochrome c peroxidase (EC 1.11.1.5). - Myeloperoxidase 
(EC 1.11.1.7) (MPO). MPO is found in granulocytes and monocytes and plays a major role in 
the oxygen-dependent microbicidal system of neutrophils. - Lactoperoxidase (EC 1.11.1.7 ) 
(LPO). LPO is a milk protein which acts as an antimicrobial agent. - Eosinophil peroxidase 
(EC 1.11.1.7) (EPO). An enzyme found in the cytoplasmic granules of eosinophils. - Thyroid 
peroxidase (EC 1.11.1.8 ) (TPO). TPO plays a central role in the biosynthesis of thyroid 
hormones. It catalyzes the iodination and coupling of the hormonogenic tyrosines in 
thyroglobulin to yield the thyroid hormones T3 and T4. - Fungal ligninases. Ligninase 
catalyzes the first step in the degradation of lignin. It depolymerizes lignin by catalyzing the 
C(alpha)-C(beta) cleavage of the propyl side chains of lignin. - Plant peroxidases (EC 
1.11.1.7 ). Plants expresses a large numbers of isozymes of peroxidases. Some of them play a 
role in cell-suberization by catalyzing the deposition of the aromatic residues of suberin on 
the cell wall, some are expressed as a defense response toward wounding, others are involved 
in the metabolism of auxin and the biosynthesis of lignin. - Prokaryotic catalase-peroxidases. 
Some bacterial species produce enzymes that exhibit both catalase and broad-spectrum 
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peroxidase activities [4]. Examples of such enzymes are: catalase HP I from Escherichia coli 
(gene katG) and perA from Bacillus stearothermophilus. 

Consensus pattern: [DET]-[LIVMTA]-x(2)-[LIVM]-[LIVMSTAG]-[SAG]-[LIVMSTAG]-H- 
5 [STA]-[LIVMFY] [H is the proximal heme-binding ligand] - 

Consensus pattern: [SGATV]-x(3)-[LIVMA]-R-[LIVMA]-x-[FW]-H-x-[SAC] [H is an active 
site residue]- 

[ 1] Dawson J.H. Science 240:433-439(1988). 
10 [ 2] Kimura S., Ikeda-Saito M. Proteins 3:113-120(1988). 

[ 3] Henrissat B., Saloheimo M., Lavaitte S., Knowles J.K.C. Proteins 8:251-257(1990). 
[ 4] Welinder K.G. Biochim. Biophys. Acta 1080:215-220(1991). 

15 455. pfkB family of carbohydrate kinases signatures 

It has been shown [1,2 ? 3] that the following carbohydrate and purine kinasesare evolutionary 
related and can be grouped into a single family, which isknown [1] as the 'pfkB family': - 
Fructokinase (EC 2.7.1.4 ) (gene scrK). - 6-phosphofructokinase isozyme 2 (EC 2.7.1.11) 
(phosphofructokinase-2) (gene pfkB). pfkB is a minor phosphofructokinase isozyme in 

2 0 Escherichia coli and is not evolutionary related to the major isozyme (gene pfkA). Plants 6- 
phosphofructokinase also belong to this family. - Ribokinase (EC 2.7.1.15 ) (gene rbsK). - 
Adenosine kinase (EC 2.7.1.20 ) (gene ADK). - 2-dehydro-3-deoxygluconokinase (EC 
2.7.1.45 ) (gene: kdgK). - 1 -phosphofructokinase (EC 2.7.1.56 ) (fructose 1-phosphate kinase) 
(gene fruK). - Inosine-guanosine kinase (EC 2.7.1.73 ) (gene gsk). - Tagatose-6-phosphate 

25 kinase (EC 2.7.1.144 ) (phosphotagatokinase) (gene lacC). - Escherichia coli hypothetical 
protein yeiC. - Escherichia coli hypothetical protein yeil. - Escherichia coli hypothetical 
protein yhfQ. - Escherichia coli hypothetical protein yihV. - Bacillus subtilis hypothetical 
protein yxdC. - Yeast hypothetical protein YJR105w.All the above kinases are proteins of 
from 280 to 430 amino acid residues that share a few region of sequence similarity. Two of 

30 these regions were selected as signature patterns. The first pattern is based on a region rich in 
glycine which is located in the N-terminal section of these enzymes; while the second pattern 
is based on a conserved region in the C-terminal section. 



Consensus pattern: [AG]-G-x(0,l)-[GAP]-x-N-x-[STA]-x(6)-[GS]-x(9)-G- 
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Consensus pattern: [DNSK]-[PSTV]-x-[SAG](2)-[GD]-D-x(3>[SAGV]-[AG]- [LIVMFYA]- 
[LIVMSTAP] 

[ 1] Wu L.-R, Reizer A., Reizer J., Cai B., Tomich J.M., Saier M.H. Jr. J. Bacteriol. 
5 173:3117-3127(1991). 

[ 2] Orchard L.M.D., Romberg H.L. Proc. R. Soc. Lond., B, Biol. Sci. 242:87-90(1990). 
[ 3] Blatch G.L., Scholle R.R., Woods D.R. Gene 95:17-23(1990). 

1 0 456. Phospholipase A2 active sites signatures 

Phospholipase A2 (EC 3.1.1.4) (PA2) [1,2] is an enzyme which releases fatty acids from the 
second carbon group of glycerol. PA2's are small and rigid proteins of 120 amino-acid 
residues that have four to seven disulfide bonds.PA2 binds a calcium ion which is required 
for activity. The side chains of two conserved residues, a histidine and an aspartic acid, 

1 5 participate in a 'catalytic network'. Many PA2's have been sequenced from snakes, lizards, 
bees and mammals. In the latter, there are at least four forms: pancreatic, membrane- 
associated as well as two less characterized forms. The venom of most snakes contains 
multiple forms of PA2. Some of them are presynaptic neurotoxins which inhibit 
neuromuscular transmission by blocking acetylcholine release from the nerve termini. Two 

2 0 different signature patterns were derived for PA2 T s. The first is centered on the active site 

histidine and contains three cysteines involved in disulfide bonds. The second is centered on 
the active site aspartic acid and also contains three cysteines involved in disulfide bonds. 

Consensus pattern: C-C-x(2)-H-x(2)-C [H is the active site residue] This pattern will not 

2 5 detect some snake toxins homologous with PA2 but which have lost their catalytic activity as 

well as otoconin-22, a Xenopus protein from the aragonitic otoconia which is also unlikely to 
be enzymatically active. 

Consensus pattern: [LIVMA]-C-{LIVMFYWPCST}-C-D-x(5)-C [D is the active site 
residue] The majority of functional and non-functional PA2 f s. Undetected sequences 

3 0 are bee PA2, gila monster PA2's, PA2 PL-X from habu and PA2 PA-5 from mulga. 



[ 1] Davidson F.F., Dennis E.A. J. Mol. Evol. 31:228-238(1990). 

[ 2] Gomez R, Vandermeers A., Vandermeers-Piret M.-C, Herzog R. 5 Rathe J., Stievenart 
M, Winand J., Christophe J. Eur. J. Biochem. 186:23-33(1989). 
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457. Phosphorylase pyridoxal-phosphate attachment site. Phosphorylases (EC 2.4.1.1) [1] are 
important allosteric enzymes in carbohydrate metabolism. They catalyze the formation of 
glucose 1-phosphatefrom polyglucose such as glycogen, starch or maltodextrin. Enzymes 
from different sources differ in their regulatory mechanisms and their natural substrates. 
However, all known phosphorylases share catalytic and structural properties. They are 
pyridoxal-phosphate dependent enzymes; the pyridoxal-P group is attached to a lysine 
residue around which the sequence is highly conserved and can be used as a signature pattern 
to detect this class of enzymes. 

Consensus pattern: E-A-[SC]-G-x-[GS]-x-M-K-x(2)-[LM]-N [K is the pyridoxal-P 
attachment site]- 

[ 1] Fukui T., Shimomura S., Nakano K. Mol. Cell. Biochem. 42:129-144(1982). 
458. Protein kinases signatures and profile 

Eukaryotic protein kinases [1 to 5] are enzymes that belong to a very extensive family of 
proteins which share a conserved catalytic core common toboth serine/threonine and tyrosine 
protein kinases. There are a number ofconserved regions in the catalytic domain of protein 
kinases. Two of these regions were selected to build signature patterns. The first region, 
which is located in the N-terminal extremity of the catalytic domain, is a glycine-rich stretch 
of residues in the vicinity of a lysine residue, which has been shown to be involved in ATP 
binding. The second region, which is located in the central part of the catalytic domain, 
contains a conserved aspartic acid residue which is important for the catalytic activity of the 
enzyme [6]; Two signature patterns were derived for that region: one specific for 
serine/threonine kinases and the other for tyrosine kinases. A profile was also developed 
which is based on the alignment in [1] and covers the entire catalytic domain. 

Consensus pattern: [LIV]-G-{P}-G-{P}-[FYWMGSTNH]-[SGA]-{PW}-[LIVCAT]-{PD}-x- 
[GSTACLIVMFY]-x(5,18)-[LIVMFYWCSTAR]-[AIVP]-[LIVMFAGCKR]-K [K binds 
ATP]. The majority of known protein kinases belong to the class detected by this pattern, but 
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it fails to find a number of them, especially viral kinases which are quite divergent in this 
region and are completely missed by this pattern. 

Consensus pattern: [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-K-x(2)-N-[LIVMFYCT](3) [D is 
an active site residue]. Most serine/ threonine specific protein kinases belong to this class 
detected by the pattern with 10 exceptions (half of them viral kinases) and also Epstein-Barr 
virus BGLF4 and Drosophila ninaC which have respectively Ser and Arg instead of the 
conserved Lys and which are therefore detected by the tyrosine kinase specific pattern 
described below. 

Consensus pattern: [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-[RSTAC]-x(2)-N-[LIVMFYC](3) 
[D is an active site residue] ALL tyrosine specific protein kinases with the exception of 
human ERBB3 and mouse blk belong to this class detected by the pattern. This pattern will 
also detect most bacterial aminoglycoside phosphotransferases [8,9] and herpesviruses 
ganciclovir kinases [10]; which are proteins structurally and evolutionary related to protein 
kinases. This profile also detects receptor guanylate cyclases and 2-5A-dependent 
ribonucleases. Sequence similarities between these two families and the eukaryotic protein 
kinase family have been noticed before. It also detects Arabidopsis thaliana kinase- like 
protein TMKL1 which seems to have lost its catalytic activity. If a protein analyzed includes 
the two protein kinase signatures, the probability of it being a protein kinase is close to 100%. 
Eukaryotic-type protein kinases have also been found in prokaryotes such as Myxococcus 
xanthus [11] and Yersinia pseudotuberculosis. 

[ 1] Hanks S.K., Hunter T. FASEB J. 9:576-596(1995). 

[ 2] Hunter T. Meth. Enzymol. 200:3-37(1991). 

[ 3] Hanks S.K., Quinn A.M. Meth. Enzymol. 200:38-62(1991). 

[ 4] Hanks S.K. Curr. Opin. Struct. Biol. 1:369-383(1991). 

[ 5] Hanks S.K., Quinn A.M., Hunter T. Science 241:42-52(1988). 

[ 6] Knighton D.R., Zheng J., Ten Eyck L.F., Ashford V.A., Xuong N.-H., Taylor S.S., 

Sowadski J.M. Science 253:407-414(1991). 

[ 7] Bairoch A., Claverie J.-M. Nature 331:22(1988). 

[ 8] Benner S. Nature 329:21-21(1987). 

[ 9] Kirby R. J. Mol. Evol. 30:489-492(1992). 

[10] Littler E., Stuart A.D., Chee M.S. Nature 358:160-162(1992). 

[11] Munoz-Dorado J., Inouye S., Inouye M. Cell 67:995-1006(1991). 
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Receptor tyrosine kinase class II signature 

A number of growth factors stimulate mitogenesis by interacting with a familyof cell surface 
receptors which possess an intrinsic, ligand-sensitive, protein tyrosine kinase activity [1], 
These receptor tyrosine kinases (RTK)all share the same topology: an extracellular ligand- 
binding domain, a single transmembrane region and a cytoplasmic kinase domain. However 
they can be classified into at least five groups. The prototype for class II RTK's is the insulin 
receptor, a heterotetramer of two alpha and two beta chains linked by disulfide bonds. The 
alpha and beta chains are cleavage products of a precursor molecule. The alpha chain 
contains the ligand binding site, the beta chain transverses the membrane and contains the 
tyrosine protein kinase domain. The receptors currently known to belong to class II are: - 
Insulin receptor from vertebrates. - Insulin growth factor I receptor from mammals. - Insulin 
receptor-related receptor (IRR), which is most probably a receptor for a peptide belonging to 
the insulin family. - Insects insulin-like receptors. - Molluscan insulin-related peptide(s) 
receptor (MIP-R). - Insulin-like peptide receptor from Branchiostoma lanceolatum. - The 
Drosophila developmental protein sevenless, a putative receptor for positional information 
required for the formation of the R7 photoreceptor cells. - The trk family of receptors 
(NTRK1, NTRK2 and NTRK3), which are high affinity receptors for nerve growth factor and 
related neurotrophic factors (BDNF and NT-3).And the following uncharacterized receptors: 
- ROS. - LTK (TYK1). - EDDR1 (cak, TRKE, RTK6). - NTRK3 (TyrolO, TKT). - A sponge 
putative receptor tyrosine kinase. While only the insulin and the insulin growth factor I 
receptors are known to exist in the tetrameric conformation specific to class II RTK's, all the 
above proteins share extensive homologies in their kinase domain, especially around the 
putative site of autophosphorylation. Hence, a signature pattern was developed for this class 
of RTK's, which includes the tyrosine residue, itself probably autophosphorylated. 

Consensus pattern: [DN]-[LIV]-Y-x(3)-Y-Y-R [The second Y is the autophosphorylation 
site] 

[ 1] Yarden Y., Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 
Receptor tyrosine kinase class III signature 

A number of growth factors stimulate mitogenesis by interacting with a family of cell surface 
receptors which possess an intrinsic, ligand-sensitive, protein tyrosine kinase activity [1]. 
These receptor tyrosine kinases (RTK)all share the same topology: an extracellular ligand- 
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binding domain, a single transmembrane region and a cytoplasmic kinase domain. However 
they can be classified into at least five groups. The class III RTK's are characterized by the 
presence of five to seven immunoglobulin-like domains [2] in their extracellular section. 
Their kinase domain differs from that of other RTK T s by the insertion of a stretch of 70 to 100 
5 hydrophilic residues in the middle ofthis domain. The receptors currently known to belong to 
class III are: - Platelet-derived growth factor receptor (PDGF-R). PDGF-R exists as a homo- 
or heterodimer of two related chains: alpha and beta [3]. - Macrophage colony stimulating 
factor receptor (CSF-l-R) (also known as the fms oncogene). - Stem cell factor (mast cell 
growth factor) receptor (also known as the kit oncogene). - Vascular endothelial growth 
10 factor (VEGF) receptors Flt-1 and Flk-l/KDR [4]. - Fl cytokine receptor Flk-2/Flt-3 [5]. - 
The putative receptor Flt-4 [7], a signature pattern Was developed for this class of RTK's 
which is based on a conserved region in the kinase domain. 

Consensus pattern: G-x-H-x-N-[LIVM]-V-N~L-L-G-A-C-T- 

15 

[ 1] Yarden Y., Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 

[ 2] Hunkapiller T., Hood L. Adv. Immunol. 44:1-63(1989). 

[ 3] Lee K.-H., Bowen-Pope D.F. ? Reed R.R. Mol. Cell. BioL 10:2237-2246(1990). 

[ 4] Terman B.L, Dougher-Vermazen M., Carrion M.E., Dimitrov D. ? Armellino D.C. ? 

2 0 Gospodarowicz D. ? Boehlen P. Biochem. Biophys. Res. Commun. 187:1579-1586(1992). 

[ 5] Lyman S.D., James L. ? Vanden Bos T. ; de Vries P. ? Brasel K. ? Gliniak B. 5 Hollingsworth 
L.T., Picha K.S., McKenna H.J., Splett R.R. Cell 75:1157-1167(1993). 
[ 6] Galland F. ? Karamysheva A., Pebusque M.J., Borg J .P., Rottapel R. ? Dubreuil P., Rosnet 
O., Birnbaum D, Oncogene 8:1233-1240(1993). 
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Receptor tyrosine kinase class V signatures 

A number of growth factors stimulate mitogenesis by interacting with a familyof cell surface 
receptors which possess an intrinsic, ligand-sensitive, protein tyrosine kinase activity [1]. 
These receptor tyrosine kinases (RTK)all share the same topology: an extracellular ligand- 

3 0 binding domain, a single transmembrane region and a cytoplasmic kinase domain. However 

they can be classified into at least five groups on the basis of sequence similarities. The 
extracellular domain of class V RTK's consist of a region of about 300amino acids, amongst 
which 16 conserved cysteines probably involved in disulfide bonds; this region is followed 
by two copies of a fibronectin typelll domain. The ligands for these receptors are proteins of 
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about 200 to 300residues collectively known as Ephrins. The receptors currently known to 
belong to class V are [2,3,E1]: - EPHA1 (Eph-1; Esk). - EPHA2 (Eck; Mpk-5; Sek-2). - 
EPHA3 (Etk-1; Hek; Mek4; Tyro4; Rek4; Cek4). - EPHA4 (Sek; Hek8; Mpk-3; Cek8). - 
EPHA5 (Ehk-1; Hek7; Bsk; Cek7). - EPHA6 (Ehk-2). - EPHA7 (Ehk-3; Hekll; Mdk-1; 
Ebk). - EPHA8 (Eek). - EPHB1 (Eph-2; Elk; Net). - EPHB2 (Eph-3; Hek5; Drt; Erk; Nuk; 
Sek-3; Cek5; Qek5). - EPHB3 (Hek-2; Mdk-5). - EPHB4 (Htk; Mdk-2; Myk-1). - EPHB5 
(Cek9).The EPHA subtype receptors bind to GPI-anchored ephrins while the EPHB subtype 
receptors bind to type-I membrane ephrins. Two signature patterns were developed for this 
class of RTK's, which each include some of the conserved cysteine residues. 

Consensus pattern: F-x-[DN]-x-[GAW]-[GA]-C-[LIVM]-[SA]-[LIVM](2)-[SA]-[LV]- 
[KRHQ]-[LIVA]-x(3)-[KR]-C-[PSAW] [The two C's are probably involved in disulfide 
bonds] 

Consensus pattern: C-x(2)-[DE]-G-[DEQ]-W-x(2,3)-[PAQ]-[LIVMT]-[GT]-x-C-x-C- x(2)- 
G-[HFY]-[EQ] [The three C's are probably involved in disulfide bonds] 

[ 1] Yarden Y., Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 

[ 2] Sajjadi F.G., Pasquale E.B., Subramani S. New Biol. 3:769-778(1991). 

[ 3] Wicks LP., Wilkinson D., Salvaris E., Boyd A.W. Proc. Natl. Acad. Sci. U.S.A. 89:1611- 

1615(1992). 



459. Protein kinase C terminal domain 



460. Plant thionins signature 

Thionins are small, basic, plant proteins generally toxic to animal cells [l].They seem to exert 
their toxic effect at the level of the cell membrane but their exact function is not known. They 
consist of a polypeptide chain of forty five to fifty amino acids with three to four internal 
disulfide bonds. They are found in seeds but also in the cell wall of leaves [2]. Thionins are 
processed from larger precursor proteins [3]. Crambin [4], a hydrophobic plant seed protein, 
also belongs to this family. The pattern to detect this family of proteins includes three of the 

six cysteine residues involved in disulfide bonds. + + 1 + 

+ mil xxCCxxxxxxxxxxxCxxxxxxxxxCxxxCxxCxxxxxCxxxxxxxx 
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************** j j j _, (.'C': conserved cysteine involved in a disulfide bond.'*': 

position of the pattern. 

Consensus pattern: C-C-x(5)-R-x(2)-[FY]-x(2)-C [The three C's are involved in disulfide 
bonds] The proteins from the gamma-thionin family are not related to the above proteins and 
are described in a separate section. 

[ 1] Vernon L.P., Evett G.E., Zeikus R.D., Gray W.R. Arch. Biochem. Biophys. 238:18- 
29(1985). 

[ 2] Bohlmann H., Clausen S., Behnke S., Giese H., Hiller C, Reimann-Phillip U., Schrader 

G., Barkholt V., Apel K. EMBO J. 7:1559-1565(1988). 

[ 3] Bohlmann H., Apel K. Mol. Gen. Genet. 207:446-454(1987). 

[ 4] Teeter M.M., Mazer J.A., L'ltalien J.J. Biochemistry 20:5437-5443(1981). 

461. Polyprenyl synthetases signatures 

A variety of isoprenoid compounds are synthesized by various organisms. For example in 
eukaryotes the isoprenoid biosynthetic pathway is responsible for the synthesis of a variety of 
end products including cholesterol, dolichol, ubiquinone or coenzyme Q. In bacteria this 
pathway leads to the synthesis of isopentenyl tRNA, isoprenoid quinones, and sugar carrier 
lipids. Among the enzymes that participate in that pathway, are a number of polyprenyl 
synthetase enzymes which catalyze a 1'4-condensation between 5 carbon isoprene units. 
Currently the sequence of some of these enzymes is known: - Eukaryotic farnesyl 
pyrophosphate synthetase (FPP synthetase) (EC 2.5.1.1 / EC 2.5.1.10) which catalyzes the 
sequential condensation of isopentenyl pyrophosphate (IPP) with dimethylallyl 
pyrophosphate (DMAPP), and then with the resultant geranyl pyrophosphate to form farnesyl 
pyrophosphate. FPP synthetase is a cytoplasmic dimeric enzyme. - Prokaryotic farnesyl 
pyrophosphate synthetase (gene ispA). - Prokaryotic octaprenyl diphosphate synthase (gene 
ispB). - Prokaryotic heptaprenyl diphosphate synthase (EC 2.5.1.30). - Eukaryotic 
geranylgeranyl pyrophosphate synthetase (GGPP synthetase) (EC 2.5.1.1 / EC 2.5.1.10 / EC 
2.5.1.29 ) which catalyzes the sequential addition of the three molecules of IPP onto DMAPP 
to form geranylgeranyl pyrophosphate. In plants GGPP synthase is a chloroplast enzyme 
involved in the biosynthesis of terpenoids; in fungi, such as Neurospora crassa (gene al-3), 
this enzyme is involved in the biosynthesis of carotenoids. - Prokaryotic GGPP synthetase, 
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which are involved in the biosynthesis of carotenoids (gene crtE). Such an enzyme is also 
encoded in the cyanelle genome of Cyanophora paradoxa. - Eukaryotic hexaprenyl 
pyrophosphate synthetase, which is involved in the biosynthesis of coenzyme Q and which 
catalyzes the formation of all trans- polyprenyl pyrophosphates generally ranging in length of 
5 between 6 and 10 isoprene units depending on the species. HP synthetase is a mitochondrial 
membrane- associated enzyme. It has been shown [1 to 5] that all the above enzymes share 
some regions of sequence similarity. Two of these regions are rich in aspartic-acid residues 
and could be involved in the catalytic mechanism and/or the binding of the substrates, 
signature patterns were developed for both regions. Possible additional members of this 
10 family of proteins are: - Bacillus subtilis spore germination protein C3 (gene gerC3). Both 
proteins are most probably also enzymes involved in isoprenoid metabolism [6]. 

Consensus pattern: [LIVM](2)-x-D-D-x(2,4)-D-x(4)-R-R-[GH]- 

Consensus pattern: [LIVMFY]-G-x(2)-[FYL]-Q-[LIVM]-x-D-D-[LIVMFY]-x-[DNG] 

15 

[ 1] Ashby M.N., Edwards PA. J. Biol. Chem. 265:13157-13164(1990). 

[ 2] Fujisaki S., Hara H M Nishimura Y., Horiuchi K. 5 Nishino T. J. Biochem. 108:995- 

1000(1990). 

[ 3] Carattoli A., Romano N., Ballario P., Morelli G., Macino G. J. Biol Chem. 266:5854- 

2 0 5859(1991). 

[ 4] Kuntz M. ? Roemer S., Suire C, Hugueney P., Weil J.H., Schantz R. ? Camara B. Plant J. 
2:25-34(1992). 

[ 5] Math S.K. ? Hearst J.E., Poulter CD. Proc. Natl. Acad. ScL U.SA. 89:6761-6764(1992). 
[ 6] Bairoch A. Unpublished observations (1993). 
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462. Potato inhibitor I family signature 

The potato inhibitor I family is one of the numerous families of serine proteinase inhibitors. 
Members of this protein family are found in plants; in the seeds of barley or beans [1,2,3], 

3 0 and in potato or tomato leaves where they accumulate in response to mechanical damage 

[4,5]. An inhibitor belonging to this family is also found in leech [6]. It is interesting to note 
that, currently, this is the only proteinase inhibitor family to be found both inplant and animal 
kingdoms. Structurally these inhibitors are small (60 to 90 residues) and in contrast with 
other families of protease inhibitors, they lack disulfide bonds. They have a single inhibitory 
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site. The consensus pattern includes three out of the four residues conserved in all members 
of this family and is located in the N-terminal half. 

Consensus pattern: [FYW]-P-[EQH]-[LIV](2)-G-x(2)-[STAGV]-x(2)-A- Barley subtilisin- 
chymotrypsin inhibitor-2b has Glu instead of Gly. There is a trypsin inhibitor from the 
cucurbitaceae Momordica charantia [7], which is said to belong to the potato inhibitor I 
family but which shows only a very weak similarity with the other members of this family. 

[ 1] Svendsen I., Hejgaard J., Chavan J.K. Carlsberg Res. Commun. 49:493-502(1984). 

[ 2] Svendsen I., Boisen S., Hejgaard J. Carlsberg Res. Commun. 47:45-53(1982). 

[ 3] Nozawa H., Yamagata H., Aizono Y., Yoshikawa M., Iwasaki T. J. Biochem. 106:1003- 

1008(1989). 

[ 4] Cleveland T.E., Thoraburg R.W., Ryan C.A. Plant Mol. Biol. 8:199-207(1987). 

[ 5] Lee J.S., Brown W.E., Graham J.S., Pearce G., Fox E.A., Dreher T.W., Ahern K.G., 

Pearson G.D., Ryan C.A. Proc. Natl. Acad. Sci. U.S.A. 83:7277-7281(1986). 

[ 6] Seemuller U., Eulitz M., Fritz H., Strobl A. Hoppe-Seyler's Z. Physiol. Chem. 361:1841- 

1846(1980). 

[ 7] Zeng F.-Y., Qian R.-Q., Wang Y. FEBS Lett. 234:35-38(1988). 
463. (pp binding) Phosphopantetheine attachment site 

Phosphopantetheine (or pantetheine 4' phosphate) is the prosthetic group of acyl carrier 
proteins (ACP) in some multienzyme complexes where it serves as a 'swinging arm' for the 
attachment of activated fatty acid and amino-acid groups [1]. Phosphopantetheine is attached 
to a serine residue in these proteins [2]. ACP proteins or domains have been found in various 
enzyme systems which are listed below (references are only provided for recently determined 
sequences). - Fatty acid synthetase (FAS), which catalyzes the formation of long-chain fatty 
acids from acetyl-CoA, malonyl-CoA and NADPH. Bacterial and plant chloroplast FAS are 
composed of eight separate subunits which correspond to the different enzymatic activities; 
ACP is one of these polypeptides. Fungal FAS consists of two multifunctional proteins, 
FAS1 and FAS2; the ACP domain is located in the N-terminal section of FAS2. Vertebrate 
FAS consists of a single multifunctional enzyme; the ACP domain is located between the 
beta-ketoacyl reductase domain and the C-terminal thioesterase domain [3]. - Polyketide 
antibiotics synthase enzyme systems. Polyketides are secondary metabolites produced from 



Attorney No. 2750-1237P 

422 

simple fatty acids, by microorganisms and plants. ACP is one of the polypeptidic components 
involved in the biosynthesis of Streptomyces polyketide antibiotics actinorhodin, curamycin, 
granatacin, monensin, oxytetracycline and tetracenomycin C. - Bacillus subtilis putative 
polyketide synthases pksK, pksL and pksM which respectively contain three, five and one 
5 ACP domains. - The multifunctional 6-methysalicylic acid synthase (MSAS) from 

Penicillium patulum. This is a multifunctional enzyme involved in the biosynthesis of a 
polyketide antibiotic and which contains an ACP domain in the C-terminal extremity. - 
Multifunctional mycocerosic acid synthase (gene mas) from Mycobacterium bovis. - 
Gramicidin S synthetase I (gene grsA) from Bacillus brevis. This enzyme catalyzes the first 

1 0 step in the biosynthesis of the cyclic antibiotic gramicidin S. - Tyrocidine synthetase I (gene 
tycA) from Bacillus brevis. The reaction carried out by tycA is identical to that catalyzed by 
grsA - Gramicidin S synthetase II (gene grsB) from Bacillus brevis. This enzyme is a 
multifunctional protein that activates and polymerizes proline, valine, ornithine and leucine. 
GrsB contains four ACP domains. - Erythronolide synthase proteins 1, 2 and 3 from 

1 5 Saccharopolyspora erythraea which is involved in the biosynthesis of the polyketide 

antibiotic erythromicin. Each of these proteins contain two ACP domains. - Conidial green 
pigment synthase from Aspergillus nidulans. - ACV synthetase from various fungi. This 
enzyme catalyzes the first step in the biosynthesis of penicillin and cephalosporin. It contains 
three ACP domains, - Enterobactin synthetase component F (gene entF) from Escherichia 

2 0 coli. This enzyme is involved in the ATP-dependent activation of serine during enterobactin 
(enterochelin) biosynthesis. - Cyclic peptide antibiotic surfactin synthase subunits 1, 2 and 3 
from Bacillus subtilis. Subunits 1 and 2 contains three related domains while subunit 3 only 
contains a single domain. - HC- toxin synthetase (gene HTS1) from Cochliobolus carbonum. 
This enzyme synthesizes HC-toxin, a cyclic tetrapeptide. HTS1 contains four ACP domains. - 

2 5 Fungal mitochondrial ACP [9], which is part of the respiratory chain NADH dehydrogenase 
(complex I). - Rhizobium nodulation protein nodF, which probably acts as an ACP in the 
synthesis of the nodulation Nod factor fatty acyl chain.The sequence around the 
phosphopantetheine attachment site is conserved in all these proteins and can be used as a 
signature pattern. A profile was also developed that spans the complete ACP -like domain. 

30 

Consensus pattern: [DEQGSTALMKRH]-[LIVMFYSTAC]-[GNQ]-[LIVMFYAG]- 
[DNEKHS]-S- [LIVMST]-{PCFY}-[STAGCPQLIVMF]-[LIVMATN]- 
[DENQGTAKRHLM] - [LIVMWSTA]-[LIVGSTACR]-x(2)-[LIVMFA] [S is the pantetheine 
attachment site] 
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[ 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New- 
York (1988). 

[ 2] Pugh E.L., Wakil S.J. J. Biol. Chem. 240:4727-4733(1965). 

[ 3] Witkowski A., Rangan V.S., Randhawa Z.I., Amy CM., Smith S. Eur. J. Biochem. 

198:571-579(1991). 

[ 6] Scotti C, Piatti M., Cuzzoni A., Perani P., Tognoni A., Grandi G., Galizzi A., Albertini 
A.M. Gene 130:65-71(1993). 

[ 9] Sackmann U., Zensen R., Rohlen D., Jahnke U., Weiss H. Eur. J. Biochem. 200:463- 
469(1991). 

464. (Prenyltrans) Terpene synthases signature 

The following enzymes catalyze mechanistically related reactions which involvethe highly 
complex cyclic rearrangement of squalene or its 2,3 oxide: - Lanosterol synthase (EC 

5.4.99.7) (oxidosqualene-lanosterol cyclase), which catalyzes the cyclization of (S)-2,3- 
epoxysqualene to lanosterol, the initial precursor of cholesterol, steroid hormones and 
vitamin D in vertebrates and of ergosterol in fungi (gene ERG7). - Cycloartenol synthase (EC 

5.4.99.8 ) (2,3-epoxysqualene-cycloartenol cyclase), a plant enzyme that catalyzes the 
cyclization of (S)-2,3- epoxysqualene to cycloartenol. - Hopene synthase (EC 5.4.99.-) 
(squalene-hopene cyclase), a bacterial enzyme that catalyzes the cyclization of squalene into 
hopene, a key step in hopanoid (triterpenoid) metabolism .These enzymes are evolutionary 
related [1] proteins of about 70 to 85 Kd. As a signature pattern, a highly conserved region 
was selected which is rich in aromatic residues and which is located in the C-terminal section. 

Consensus pattern: [DE]-G-S-W-x-G-x-W-[GA]-[LIVM]-x-[FY]-x-Y-[GA] 

[ 1] Corey E.J., Matsuda S.P.T., Bartel B. Proc. Natl. Acad. Sci. U.S.A. 90:11628- 
11632(1993). 



465. Prion protein signatures 

Prion protein (PrP) [1,2,3] is a small glycoprotein found in high quantity in the brains 
humans or animals infected with a number of degenerative neurological diseases such 
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Kuru, Creutzfeldt- Jacob disease (CJD), scrapie or bovine spongiform encephalopathy (BSE). 
PrP is encoded in the host genome and expressed both in normal and infected cells. It has a 
tendency to aggregate yielding polymers called rods. Structurally, PrP is a protein consisting 
of a signal peptide, followed by an N-terminal domain that contains tandem repeats of a short 
motif (PHGGGWGQin mammals, PHNPGY in chicken), itself followed by a highly 
conserved domain lly comes a C-terminal hydrophobic domain post-translationally removed 
when PrP is attachedto the extracellular side of the cell membrane by a GPI-anchor. The 

structureof PrP is shown in the following schematic representation: +— + +- 

****** **** + + |Sig| Tandem repeats | C C S\ \ +— + +— 

| 1 — 1+ + + + | GPPC: conserved cysteine involved in a 

disulfide bond/*': position of the patterns. As signature pattern for PrP, a perfectly conserved 
alanine- and glycine-rich region of 16 residues was selected as well as a region centered on 
the second cysteine involved in the disulfide bond. 

Consensus pattern: A-G-A-A-A-A-G-A-V-V-G-G-L-G-G-Y- 

Consensus pattern: E-x-[ED]-x-K-[LIVM](2)-x-[KR]-[LIVM](2)-x-[QE]-M-C-x(2)- Q-Y [C 
is involved in a disulfide bond] 

[ 1] Stahl N., Prusiner S.B. FASEB J. 5:2799-2807(1991). 

[ 2] Brunori M. ? Chiara Silvestrini ML, Pocchiari M. Trends Biochem. Sci. 13:309-313(1988). 
[ 3] Prusiner S.B. Annu. Rev. Microbiol. 43:345-374(1989). 

466. Cyclophilin-type peptidyl-prolyl cis-trans isomerase signature and profile (pro 
isomerase) 

Cyclophilin [1] is the major high-affinity binding protein in vertebrates for the 
immunosuppressive drug cyclosporin A (CSA). It exhibits a peptidyl- prolyl cis-trans 
isomerase activity (EC 5.2.1.8 ) (PPIase or rotamase). PPIase is an enzyme that accelerates 
protein folding by catalyzing the cis-transisomerization of proline imidic peptide bonds in 
oligopeptides [2]. It is probable that CSA mediates some of its effects via an inhibitory action 
on PPIase. Cyclophilin is a cytosolic protein which belongs to a family [3 ,4,5]that also 
includes the following isozymes: - Cyclophilin B (or S-cyclophilin), a PPIase which is 
retained in an endoplasmic reticulum compartment. - Cyclophilin C, a cytoplasmic PPiase. - 
Mitochondrial matrix cyclophilin (cyp3). - A PPIase which seems specific for the folding of 
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rhodopsin and is an integral membrane protein anchored by a C-terminal transmembrane 
region. This protein was first characterized in Drosophila (gene ninaA). - Bacterial 
periplasmic PPiase (gene ppiA). - Bacterial cytosolic PPiase (gene ppiB). - Natural-killer cell 
cyclophilin-related protein. This large protein (about 160 Kd) is a component of a putative 
tumor-recognition complex involved in the function of NK cells. It contains a cyclophilin- 
type PPiase domain. - Mammalian nucleoporin Nup358 [6], a nuclear pore complex protein 
of 358 Kd that contains a C-terminal cyclophilin-type PPiase domain. - Yeast hypothetical 
protein YJR032w. - Fission yeast hypothetical protein SpAC21El 1.05c. - Caenorhabditis 
elegans hypothetical protein T27Dl.l.The sequences of the different forms of cyclophilin- 
type PPIases are well conserved. As a signature pattern, a conserved region was selected in 
the central part of these enzymes. 

Consensus pattern: [FY]-x(2)-[STCNLV]-x-F-H-[RH]-[LIVMN]-[LIVM]-x(2)-F- [LIVM]-x- 
Q-[AG]-G- FKBP's, a family of proteins that bind the immunosuppressive drug FK506, are 
also PPIases, but their sequence is not at all related to that of cyclophilin. 

[ 1] Stamnes M.A., Rutherford S.L., Zuker C.S. Trends Cell Biol. 2:272-276(1992). 

[ 2] Fischer G., Schmid F.X. Biochemistry 29:2205-2212(1990). 

[ 3] Trandinh C.C., Pao G.M., Saier M.H. Jr. FASEB J. 6:3410-3420(1992). 

[ 4] Galat A. Eur. J. Biochem. 216:689-707(1993). 

[ 5] Hacker J., Fischer G. Mol. Microbiol. 10:445456(1993). 

[ 6] Wu J., Matunis M.J., Kraemer D. ? Blobel G., Coutavas E. J. Biol. Chem. 270:14209- 
14213(1995). 

467. Profilin signature 

Profilin [1,2] is a small eukaryotic protein that binds to monomeric actin(G-actin) in a 1:1 
ratio thus preventing the polymerization of actin into filaments (F-actin). It can also, in 
certain circumstance promotes actin polymerization. Profilin also binds to 
polyphosphoinositides such as PIP2,Overall sequence similarity among profilin from 
organisms which belong to different phyla (ranging from fungi to mammals) is low, but the 
N-terminal region is relatively well conserved. That region is thought to be involved inthe 
binding to actin. The signature pattern for profilin is based on conserved residues at the N- 
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terminal extremity A protein structurally similar to profilin is present in the genome of 
variola and vaccinia viruses (gene A42R). 

Consensus pattern: <x(0,l)-[STA]-x(0 ? l)-W-[DENQH]-x-[YI]-x-[DEQ] 

[ 1] Haarer B.K., Brown S.S. Cell MotiL Cytoskeleton 17:71-74(1990). 
[ 2] Sohn R.H., Goldschmidt-Clermont P. BioEssays 16:465-472(1994). 

468. Protamine PI signature 

Protamines are small, highly basic proteins, that substitute for histones in sperm chromatin 
during the haploid phase of spermatogenesis. They pack sperm DNA into a highly 
condensed, stable and inactive complex. There are two different types of mammalian 
protamine, called PI and P2. PI has been found in all species studied, while P2 is sometimes 
absent. There seems to be a single type of avian protamine whose sequence is closely related 
to that of mammalian PI [l].As a signature for this family of proteins, a conserved region 
was selected at the N-terminal extremity of the sequence. 

Consensus pattern: [AV]-R-[NFY]-R-x(2,3)-[ST]-x-S-x-S- 

[ 1] Oliva R., Goren R., Dixon G.H. J. Biol. Chem. 264:17627-17630(1989). 

469. Sperm histone P2 (protamine P2) 

This protein also known as protamine P2 can substitute for histones in the chromatin of 
sperm. The alignment contains both the sequence of the mature P2 protein and its propeptide. 

470. Proteasome A-type subunits signature 

The proteasome (or macropain) (EC 3.4.99.46 ^ [1 to 5,E1] is an eukaryotic and 
archaebacterial multicatalytic proteinase complex that seems to be involved inan 
ATP/ubiquitin-dependent nonlysosomal proteolytic pathway. In eukaryotes the proteasome is 
composed of about 28 distinct subunits which form a highly ordered ring-shaped structure 
(20S ring) of about 700 Kd. Most proteasome subunits can be classified, on the basis on 
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sequence similarities into two groups, A and B. Subunits that belong to the A-type group are 
proteins of from 210 to 290 amino acids that share a number of conserved sequence regions. 
Subunits that are known to belong to this family are listed below. - Vertebrate subunits C2 
(nu), C3, C8, C9, iota and zeta. - Drosophila PROS-25, PROS-28.1, PROS-29 and PROS-35. 
- Yeast CI (PRS1), C5 (PRS3), C7-alpha (Y8) (PRS2), Y7, Y13, PRE5, PRE6 and PUP2. - 
Arabidopsis thaliana subunits alpha and PSM30. - Thermoplasma acidophilum alpha-subunit. 
In this archaebacteria the proteasome is composed of only two different subunits.As a 
signature pattern for proteasome A-type subunits the best conserved region was selected, 
which is located in the N-terminal part of these proteins. 

Consensus pattern: [FY]-x(4)-[STNV]-x-[FYW]-S-P-x-G-[RKH]-x(2)-Q-[LIVM]-[DE]- Y- 
[SAD]-x(2)-[SAG]-. These proteins belong to family Tl in the classification of peptidases 
[6,E2]. 

[ 1] Rivett A.J. Biochem. J. 291:1-10(1993). 

[ 2] Rivett A.J. Arch. Biochem. Biophys. 268:1-8(1989). 

[ 3] Goldberg A.L., Rock K.L Nature 357:375-379(1992). 

[ 4] Wilk S. Enzyme Protein 47:187-188(1993). 

[ 5] Hilt W., Wolf D.H. Trends Biochem. Sci. 21:96-102(1996). 

[ 6] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

Proteasome B-type subunits signature 

The proteasome (or macropain) (EC 3.4.99.46 ^ [1 to 5,E1] is an eukaryotic and 
archaebacterial multicatalytic proteinase complex that seems to be involved in an 
ATP/ubiquitin-dependent nonlysosomal proteolytic pathway. In eukaryotes the proteasome is 
composed of about 28 distinct subunits which form a highly ordered ring-shaped structure 
(20S ring) of about 700 Kd. Most proteasome subunits can be classified, on the basis on 
sequence similarities into two groups, A and B. Subunits that belong to the B-type group are 
proteins of from 190 to 290 amino acids that share a number of conserved sequence regions. 
Subunits that are known to belong to this family are listed below. - Vertebrate subunits C5, 
beta, delta, epsilon, theta (C10-II), LMP2/RING12, C13 (LMP7/RING10), C7-I and MECL- 
1. - Yeast PRE1, PRE2 (PRG1), PRE3, PRE4, PRS3, PUP1 and PUP3. - Drosophila 
L(3)73AI. - Fission yeast ptsl. - Thermoplasma acidophilum beta-subunit. In this 
archaebacteria the proteasome is composed of only two different subunits. As a signature 
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pattern for proteasome B-type subunits the best conserved region was selected, which is 
located in the N-terminal part of these proteins. 

Consensus pattern: [LIVMA]-[GSA]-[LlVMF]-x-[FYLVGAC]-x(2)-[GSACFY]- 
[LIVMSTAC](3)-[GAC]-[GSTACV]-[DES]-x(15)-[RK]-x(12 J 13)-G-x(2)-[GSTA]-D-. These 
proteins belong to family Tl in the classification of peptidases [6,E2J. 

[ 1] Rivett A.J. Biochem. J. 291:1-10(1993). 

[ 2] Rivett A. J. Arch. Biochem. Biophys. 268:1-8(1989). 

[ 3] Goldberg A.L., Rock K.L Nature 357:375-379(1992). 

[ 4] Wilk S. Enzyme Protein 47:187-188(1993). 

[ 5] Hilt W., Wolf D.H. Trends Biochem. Sci. 21:96-102(1996). 

[ 6] Rawlings N.D., Barrett AJ. Meth. Enzymol. 244:19-61(1994). 

471. (pyr redox) Pyridine nucleotide-disulphide oxidoreductases class-I active site 
The pyridine nucleotide-disulphide oxidoreductases are FAD flavoproteins which contains a 
pair of redox-active cysteines involved in the transfer of reducing equivalents from the FAD 
cofactor to the substrate. On the basis of sequence and structural similarities [1] these 
enzymes can be classified into two categories. The first category groups together the 
following enzymes [2 to 6]: - Glutathione reductase (EC 1.6,4.2) (GR). - Higher eukaryotes 
thioredoxin reductase (EC 1 .6.4.5 V - Trypanothione reductase (EC 1.6.4 t 8). - Lipoamide 
dehydrogenase (EC 1 .8.1.4 ). the E3 component of alpha-ketoacid dehydrogenase complexes. 
- Mercuric reductase (EC 1.l6.l.l ).The sequence around the two cysteines involved in the 
redox-active disulfide bond is conserved and can be used as a signature pattern. 

Consensus pattern: G-G-x-C-[LIVA]-x(2)-G-C-[LIVM]-P [The two C's form the active site 
disulfide bond]. In positions 6 and 7 of the pattern all known sequences have Asn-(Val/ He) 
with the exception of GR from plant chloroplasts and from cyanobacteria which have Ile-Arg 

[7]- 

[ 1] Kurlyan J., Krishna T.S.R., Wong L., Guenther B., Pahler A., Williams C.H. Jr., Model 
P. Nature 352:172-174(1991). 

[ 2] Rice D.W., Schulz G.E., Guest J.R. J. Mol. Biol. 174:483-496(1984). 
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[ 3] Brown N.L. Trends Biochem. Sci. 10:400-402(1985). 

[ 4] Carothers D.J., Pons G., Patel M.S. Arch. Biochem. Biophys. 268:409-425(1989). 
[ 5] Walsh C.T., Bradley M., Nadeau K. Trends Biochem. Sci. 16:305-309(1991). 
[ 6] Gasdaska P.Y., Gasdaska J.R., Cochran S., Powis G. FEBS Lett. 373:5-9(1995). 
[ 7] Creissen G., Edwards E.A., Enard C, Wellburn A., Mullineaux P. Plant J. 2:129- 
131(1991). 

472. (pyridoxal deC) DDC / GAD / HDC / TyrDC pyridoxal-phosphate attachment site 
(pyridoxal deC) 

Three different enzymes - all pyridoxal-dependent decarboxylases - seem to share regions of 
sequence similarity [1,2,3,4], especially in the vicinity of the lysine residue which serves as 
the attachment site for the pyridoxal-phosphate (PLP) group. These enzymes are: - Glutamate 
decarboxylase (EC 4.1.1.15 ) (GAD). Catalyzes the decarboxylation of glutamate into the 
neurotransmitter GABA (4-aminobutanoate). - Histidine decarboxylase (EC 4.1.1.22) (HDC). 
Catalyzes the decarboxylation of histidine to histamine. There are two completely unrelated 
types of HDC: those that use PLP as a cofactor (found in Gram-negative bacteria and 
mammals), and those that contain a covalently bound pyruvoyl residue (found in Gram- 
positive bacteria). - Aromatic-L-amino-acid decarboxylase (EC 4.1.1.28) (DDC), also known 
as L-dopa decarboxylase or tryptophan decarboxylase. DDC catalyzes the decarboxylation of 
tryptophan to tryptamine. It also acts on 5-hydroxy- tryptophan and dihydroxyphenylalanine 
(L-dopa). - Tyrosine decarboxylase (EC 4.1.1.25) (TyrDC) which converts tyrosine into 
tyramine, a precursor of isoquinoline alkaloids and various amides.These enzymes are 
collectively known as group II decarboxylases [3,4]. 

Consensus pattern: S-[LIVMFYW]-x(5)-K-[LIVMFYWG](2)-x(3)-[LIVMFYW]-x-[CA]- 
x(2)-[LIVMFYWQ]-x(2)-[RK] [K is the pyridoxal-P attachment site] 

[ 1] Jackson F.R. J. Mol. Evol. 31:325-329(1990). 

[ 2] Joseph D.R., Sullivan P., Wang Y.-M., Kozak C, Fenstermacher D.A., Behrendsen M.E., 

Zahnow C.A. Proc. Natl. Acad. Sci. U.S.A. 87:733-737(1990). 

[ 3] Sandmeier E., Hale T.I., Christen P. Eur. J. Biochem. 221:997-1002(1994). 

[ 4] Ishii S., Mizugichi H., Nishino J., Hayashi H., Kagamiyama H. J. Biochem. 120:369- 

376(1996). 
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473. Regulator of chromosome condensation (RCC1) signatures (RCC1) 
The regulator of chromosome condensation (RCC1) [1] is a eukaryotic protein which binds to 
chromatin and interacts with ran, a nuclear GTP-binding protein, to promote the loss of 
bound GDP and the uptake offresh GTP, thus acting as a guanine-nucleotide dissociation 
stimulator (GDS)[2]. The interaction of RCC1 with ran probably plays an important role in 
the regulation of gene expression. RCC1, known as PRP20 or SRM1 in yeast, piml in fission 
yeast and BJ1 in Drosophila, is a protein that contains seven tandem repeats of a domain of 
about 50 to 60 amino acids. As shown in the following schematic representation, the repeats 
make up the major part of the length of the protein. Outside the repeat region, there is just a 
small N-terminal domain of about 40 to 50 residues and, in the Drosophila protein only, a C- 

terminal domain of about 130 residues. + — + + + + + + + 

„+ + |N-t.|Rpt. 1 |Rpt. 2 |Rpt. 3 |Rpt. 4 |Rpt. 5 |Rpt. 6 |Rpt. 7 | C-terminal | + — +— 

+ + + + + + + + In Drosophila two signature 

patterns for RCC1 were developed. The first is found in the N- terminal part of the second 
repeat; this is the most conserved part of RCC1. The second is derived from conserved 
positions in the C-terminal part of each repeat and detects up to five copies of the repeated 
domain. The RCCl-type of repeat is also found in the X-linked retinitis pigmentosa GTPase 
regulator [3]. 

Consensus pattern: G-x-N-D-x(2)-[AV]-L-G-R-x-T- 

Consensus pattern: [LIVMFA]-[STAGC](2)-G-x(2)-H-[STAGLI]-[LIVMFA]-x-[LIVM]- 

[ 1] Dasso M. Trends Biochem. Sci. 18:96-101(1993). 

[ 2] Boguski M.S., McCormick F. Nature 366:643-654(1993). 

[ 3] Roepman R., Van Duijnhoven G., Rosenberg T., Pinckers A.J.L.G., Bleeker- 

Wagemakers L.M., Bergen A.A.B., Post J., Beck A., Reinhardt R., Ropers H.-H., Cremers F., 

Berger W. Hum. Mol. Genet. 5:1035-1041(1996). 

474. RNA 3'-terminal phosphate cyclase signature (RCT) 

RNA 3'-terminal phosphate cyclase (EC 6.5.1.4) [1,2] catalyzes the conversion of 3'- 
phosphate to a 2*,3'-cyclic phosphodiester at the end of RNA. The biological role of this 
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enzyme is unknown but it is likely to function in some aspects of cellular RNA processing. 
The reaction catalyzed by the enzyme occurs in three steps: 1) adenylation of the enzyme by 
ATP; 2) the enzyme acts on RNA-3'terminal phosphate to produce RNA-3 'terminal 
diphosphate adenylate; 3) Release of AMP and cyclisation by a non catalytic nucleophilic 
attack by the adjacent 2tiydroxyl on the phosphorus in the diester linkage. This enzyme, 
which has been characterized in human (where there seems to be at least three isozymes) and 
Escherichia coli (gene rtCA), seems to be taxonomically widespread. It is found in insects, 
plants, fungi (gene RTC1 inyeast) and in archeabacteria. RNA cyclase is a protein of from 36 
to 42 Kd. The best conserved region, which is used as a signature pattern, is a glycine-rich 
stretch of residues located in the central part of the sequence and which is reminiscent of 
various ATP, GTPor AMP glycine-rich loops. In this context, the conserved Arg (His in the 
E.coli enzyme) could be the AMP-binding residue. 

Consensus pattern: [RH]-G-x(2)-P-x-G(3)-x-[LIV]- 

[ 1] Genschik P., Billy E., Swianiewicz M., Filipowicz W. EMBO J. 16:2955-2967(1997). 
[ 2] Filipowicz W., Vincente O. Meth. Enzymol. 181:499-510(1990). 



475. REV protein (anti-repression trans-activator protein) 



476. Prokaryotic-type class I peptide chain release factors signature (RF-1) 
Peptide chain release factors (RFs) are required for the termination of protein biosynthesis 
[1]. At present two classes of RFs can be distinguished. Class I RFs bind to ribosomes that 
have encountered a stop codon at their decoding site and induce release of the nascent 
polypeptide. Class II RFs are GTP-binding proteins that interact with class I RFs and enhance 
class I RF activity. In prokaryotes there are two class I RFs that act in a codon specific 
manner[2]: RF-1 (gene prfA) mediates UAA and UAG-dependent termination while RF- 
2(gene prfB) mediates UAA and UGA-dependent termination. RF-1 and RF-2 are structurally 
and evolutionary related proteins which have been shown [3] to make up a family that also 
contains the following proteins: - Fungal MRF1, a mitochondrial RF (m-RF) which 
recognizes the UAA and UAG codons. - Escherichia coli RF-H, a protein of unknown 
function. - Escherichia coli hypothetical protein yaeJ and a close Pseudomonas putida 



Attorney No. 2750-1237P 

432 

homolog. A highly conserved region located in the central part of the 40 to 45 Kd RF-1/2 and 
m-RF and in the N-terminal of the 15 to 16Kd RF-H and yaeJ is used as a signature pattern. 

Consensus pattern: [AR]-[STA]-x-G-x-G-G-Q-[HNGCS]-V-N-x(3)-[ST]-A-[IV] 
Note that prokaryotic-type class I RFs display no significant sequence similarity to 
prokaryotic-type class II which belong to the family of GTP-binding elongation factors nor to 
eukaryotic class I or class II RFs. 

[ 1] Tate W.P. , Poole E.S., Mannering S.M. Prog. Nucleic Acids. Res. Mol. Biol. 52:293- 
335(1996). 

[ 2] Craigen W.J., Lee C.C., Caskey C.T. Mol. Microbiol. 4:861-865(1990). 
[ 3] Pel H.J., Rep M., Grivell L.A. Nucleic Acids Res. 20:4423-4428(1992). 

477. RIO1/ZK632.3/MJ0444 family signature 

The following uncharacterized proteins are evolutionary related [1]: - Yeast protein RIOl. - 
Caenorhabditis elegans hypothetical protein ZK632.3. - Methanococcus jannaschii 
hypothetical protein MJ0444. - Thermoplasma acidophilum hypothetical protein if rpoA2 
3'region.The eukaryotic members of this family are proteins of about 55 to 60 Kd, while the 
archebacterial ones are half that size. The central part of these proteins is highly conserved. 
The best conserved region is used as a signature pattern. 

Consensus pattern: [LIVM]-V-H-[GA]-D-L-S-E-[FY]-N-x-[LIVM] 
[ 1] Bairoch A. Unpublished observations (1997). 

478. (RIP) Shiga/ricin ribosomal inactivating toxins active site signature. A number of 
bacterial and plant toxins act by inhibiting protein synthesis in eukaryotic cells. The toxins of 
the Shiga and ricin family inactivate 60S ribosomal subunits by an N-glycosidic cleavage 
which releases a specific adenine base from the sugar-phosphate backbone of 28S rRNA 
[1,2,3]. The toxins which are known to function in this manner are: - Shiga toxin from 
Shigella dysenteriae [4]. This toxin is composed of one copy of an enzymatically active A 
subunit and five copies of a B subunit responsible for binding the toxin complex to specific 
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receptors on the target cell surface. - Shiga-like toxins (SLT) are a group of Escherichia coli 
toxins very similar in their structure and properties to Shiga toxin. The sequence of two types 
of these toxins, SLT-1 [5] and SLT-2 [6], is known. - Ricin, a potent toxin from castor bean 
seeds. Ricin consists of two glycosylated chains linked by a disulfide bond. The A chain is 
enzymatically active. The B chain is a lectin with a binding preference for galactosides. Both 
chains are encoded by a single polypeptide precursor. Ricin is classified as a type-II 
ribosome-inactivating protein (RIP); other members of this family are agglutinin, also from 
castor bean, and abrin from the seeds of the bean Abrus precatorius [7]. - Single chain 
ribosome-inactivating proteins (type-I RIP) from plants. Examples of such proteins are: 
barley protein synthesis inhibitors I and II, mongolian snake-gourd trichosanthin, sponge 
gourd luffin-A and -B, garden four-o'clock MAP, common pokeberry PAP-S and soapwort 
saporin-6 [7] .All these toxins are structurally related. A conserved glutamic residue has been 
implicated [8] in the catalytic mechanism; it is located near a conserved arginine which also 
plays a role in catalysis [9]. The signature that has been developed for these proteins includes 
these catalytic residues. 

Consensus pattern: [LIVMA]-x-[LIVMSTA](2)-x-E-[SAGV]-[STAL]-R-[FY]-[RKNQS]-x- 
[LIVM]-[EQS]-x(2)-[LIVMF] [E and R are active site residues]- 

[ 1] Endo Y., Tsurugi K., Takeda Y., Ogasawara T., Igarashi K. Eur. J. Biochem. 171:45- 
50(1988).[ 2] May M.J., Hartley M.R., Roberts L.M., Krieg P.A., Osborn R.W., Lord J.M. 
EMBO J. 8:301-308(1989).[ 3] Funatsu G., Islam M.R., Minami Y., Sung-Sil K., Kimura M. 
Biochimie 73:1157-1161(1991).[ 4] Strockbine N.A., Jackson M.P., Sung L.M., Holmes 
R.K., O'Brien A.D. J. Bacteriol. 170:1116-1122(1988).[ 5] Calderwood S.B., Auclair F., 
Donohue-Rolfe A., Keusch G.T., Mekalanos J.J. Proc. Natl. Acad. Sci. U.S.A. 84:4364- 
4368(1987).[ 6] Jackson M.P., Neill R.J., O'Brien A.D., Holmes R.K., Newland J.W. FEMS 
Microbiol. Lett. 44:109-114(1987).[ 7] Barbieri L., Battelli M.G., Stirpe F. Biochim. 
Biophys. Acta 1154:237-282(1993).[ 8] Hovde C.J., Calderwood S.B., Mekalanos J.J., 
Collier RJ. Proc. Natl. Acad. Sci. U.S.A. 85:2568-2572(1988).[ 9] Monzingo A.F., Collins 
E.J., Ernst S.R., Irvin J.D., Robertus J.D. J. Mol. Biol. 233:705-715(1993). 

479. Bacterial RNA polymerase, alpha chain (RNA pol A bac) 
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Members of this family include alpha subunit from eubacteria and alpha subunits from 
chloroplasts. The alpha subunit of RNA polymerase consists of two independently folded 
domains, referred to as amino-terminal and carboxyl terminal domains. The amino terminal 
domain is involved in the interaction with the other subunits of the RNA polymerase. The 
carboxyl-terminal domain interacts with the DNA and activators. The amino acid sequence of 
the alpha subunit is conserved in prokaryotic and chloroplast RNA polymerases. There are 
three regions of particularly strong conservation, two in the amino-terminal and one in the 
carboxyl-Comment: terminal [3]. 

[1] Zhang G, Darst SA; Science 1998;281:262-266. [2] Jeon YH, Negishi T, Shirakawa M, 
Yamazaki T, Fujita N, Ishihama A, Kyogoku Y; Science 1995;270:1495-1497. [3] Ebright 
RH ? Busby S; Curr Opin Genet Dev 1995;5:197-203. [4] Murakami K, Kimura M, Owens JT, 
Meares CF, Ishihama A; Proc Natl Acad Sci USA 1997;94:1709-1714. 

480. RNA polymerase beta subunit (RNA pol B) 

RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain 
a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and 
chloroplast polymerases). Each RNA polymerase complex contains two related members of 
this family, in each case they are the two largest subunits. 

[1] Falkenburg D, Dworniczak B, Faust DM, Bautz EK; J Mol Biol 1987;195:929-937. 

481. RNA polymerases H / 23 Kd subunits signature 

In eukaryotes, there are three different forms of DNA-dependent RNA polymerases (EC 
2.7,7.6 ) transcribing different sets of genes. Each class of RNA polymerase is an assemblage 
of ten to twelve different polypeptides. In archaebacteria, there is generally a single form of 
RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 polypeptides. 
Archaebacterial subunit H (gene rpoH) [1,2] is a small protein of about 8.5 tolO Kd, it is 
evolutionary related to the C-terminal part of a 23 Kd component shared by all three forms of 
eukaryotic RNA polymerases (gene RPB5 in yeast and POLR2E in mammals).As a signature 
pattern a conserved region was selected which is located at theN-terminal extremity of 
subunit H; this region contains two histidines that could play a role in the binding of a metal 
ion. 
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Consensus pattern: H-[NEI]-[LIVM]-V-P-x-H-x(2)-[LIVM]-x(2)-[DE] 

[ 1] Klenk H.-P., Palm P., Lottspeich F., Zillig W. Proc. Natl. Acad. Sci. U.S.A. 89:407- 
410(1992). 

[ 2] Thiru A., Hodach M., Eloranta J.J., Kostourou V., Weinzierl R.O., Matthews S.; J. Mol. 
Biol. 287:753-760f 1999V 

482. RNA polymerases K / 14 to 18 Kd subunits signature 

In eukaryotes, there are three different forms of DNA-dependent RNApolymerases (EC 
2.7.7.6 ) transcribing different sets of genes. Each class of RNA polymerase is an assemblage 
often to twelve different polypeptides. In archaebacteria, there is generally a single form of 
RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 polypeptides. A 
component of 14 to 18 Kd shared by all three forms of eukaryotic RNA polymerases and 
which has been sequenced in budding yeast (gene RPB6 orRP026), in fission yeast (gene 
rpb6 or rpol5), in human and in African swine fever virus [1] is evolutionary related [2] to 
archaebacterial subunit K (gene rpoK). The archaebacterial protein is colinear with the C- 
terminal part of the eukaryotic subunit. 

Consensus pattern: [ST]-x-[FY]-E-x-[AT]-R-x-[LIVM]-[GSA]-x-R-[SA]-x-Q 

[ 1] Lu Z., Kutish G.F., Sussman M.D., Rock D.L. Nucleic Acids Res. 21:2940-2940(1993). 
[ 2] McKune K., Woychik N.A. J. Bacteriol. 176:4754-4756(1994). 

483. RNA polymerases L / 13 to 16 Kd subunits signature 

In eukaryotes, there are three different forms of DNA-dependent RNApolymerases (EC 
2.7.7.6) transcribing different sets of genes. Each class of RNA polymerase is an assemblage 
of ten to twelve different polypeptides. In archaebacteria, there is generally a single form of 
RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 polypeptides. It 
has been shown that small subunits of about 13 to 16 Kd found in all three types of 
eukaryotic polymerases are highly conserved. Subunits known to belong to this family are: - 
Budding yeast RPC19 subunit from RNA polymerases I and III [1]. - Budding yeast RPB11 
subunit from RNA polymerase II [2]. - Mammalian RPB11 (gene POLR2K) from RNA 
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polymerase II. - Caenorhabditis elegans hypothetical protein F58A4.9. - Methanococcus 
jannaschii RNA polymerase subunit L (gene rpoL). - Sulfolobus acidocaldarius RNA 
polymerase subunit L (gene rpoL) [3], As a signature pattern a conserved region was selected 
which is located at the N-terminal extremity of these polymerase subunits; this region 
contains two cysteines that could play a role in the binding of a metal ion. 

Consensus pattern: [DE](2)-H-[ST]-[LIVM]-[GAP]-N-x(ll)-V-x-[FM]-x(2)-Y-x(3)- H-P 

[ 1] Dequard-Chablat M., Riva M., Carles C, Sentenac A. J. Biol. Chem. 266:15300- 
15307(1991). 

[ 2] Woychik N.A., McKune K., Lane W.S., Young R.A. Gene Expr. 3:77-82(1993). 
[ 3] Langer D. EMBL/GenBank: X70805. 

484. RNA polymerases N / 8 Kd subunits signature 

In eukaryotes, there are three different forms of DNA-dependent RNA polymerases (EC 
2.7.7.6 ) transcribing different sets of genes. Each class of RNA polymerase is an assemblage 
of ten to twelve different polypeptides. In archaebacteria, there is generally a single form of 
RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 polypeptides. 
Archaebacterial subunit N (gene rpoN) [1] is a small protein of about 8 Kd, it is evolutionary 
related [2] to a 8.3 Kd component shared by all three forms of eukaryotic RNA polymerases 
(gene RPB10 in yeast and POLR2J in mammals) as well as to African swine fever virus 
protein CP80R [3]. As a signature pattern a conserved region was selected which is located at 
the N-terminal extremity of these polymerase subunits; this region contains two cysteines that 
could play a role in the binding of a metal ion. 

Consensus pattern: [LIVMF](2)-P-[LIVM]-x-C-F-[ST]-C-G- 

[ 1] Langer D., Hain J., Thuriaux P., Zillig W. Proc. Natl. Acad. Sci. U.S.A. 92:5768- 
5772(1995). 

[ 2] McKune K., Woychik N.A. J. Bacteriol. 176:4754-4756(1994). 

[ 3] Yanez R.J., Rodriguez J.M., Nogal M.L., Yuste L., Enriquez C, Rodriguez J.F., Vinuela 
E. Virology 208:249-278(1995). 
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485. Ribonuclease HII 

[1] Mian IS; Nucleic Acids Res 1997;25:3187-3189. 

486. Ribonuclease PH signature 

Prokaryotic ribonuclease PH (EC 2.7.7.56) (RNase PH) [1] is a 
phosphorolyticexoribonuclease that removes nucleotide residues following the -CCA 
terminus of tRNA and adds nucleotides to the ends of RNA molecules by using nucleoside 
diphosphates as substrates. RNase PH is a conserved protein of about 240 amino-acid 
residues. It is evolutionary related to Caenorhabditis elegans hypothetical protein B0564.1.As 
a signature pattern, the most highly conserved region was selected which is located in the 
central part of these proteins. 

Consensus sequence: C-[DE]-[LIVM](2)-Q-[GTA]-D-G-[SG]-x(2)-[TA]-A 
[ 1] Kelly K.O., Deutscher M.P. J. Biol. Chem. 267:17153-17158(1992). 

487. RanBPl domain 

[1] Di Matteo G, Fuschi P, Zerfass K, Moretti S, Ricordy R, Cenciarelli C, Tripodi M, 
Jansen-Durr P, Lavia P; Cell Growth Differ 1995;6:1213-1224. 



488. Rhodanese signatures 

Rhodanese (thiosulfate sulfurtransferase) (EC 2.8.1.1) [1,2] is an enzyme which catalyzes the 
transfer of the sulfane atom of thiosulfate to cyanide, to form sulfite and thiocyanate. In 
vertebrates, rhodanese is a mitochondrial enzyme of about 300 amino-acid residues involved 
in forming iron-sulfur complexes and cyanide detoxification. A cysteine residue takes part in 
the catalytic mechanism. Some bacterial proteins closely related to rhodanese are also 
thought to express a sulfotransferase activity. These are: - Azotobacter vinelandii rhdA. - 
Escherichia coli sseA [3]. - Saccharopolyspora erythraea cysA [4]. - Synechococcus strain 
PCC 7942 rhdA [5]. RhdA is a periplasmic protein probably involved in the transport of 
sulfur compounds. Two patterns for the rhodanese family were developed. They are based on 
highly conserved regions, one which is located in the N-terminal region, the other at the C- 
terminal extremity of the enzyme. 
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Consensus pattern: [FY]-x(3)-H-[LIV]-P-G-A-x(2)-[LIVF] 
Consensus pattern: [FY]-[DEAP]-G-[SA]-W-x-E-[FYW] 

5 [ 1] Westley J. Meth. Enzymol. 77:285-291(1981). 

[ 2] Weiland K.L., Dooley T.P. Biochem. J. 275:227-231(1991). 
[ 3] Rudd K.E. Unpublished observations (1993). 

[ 4] Donadio S. 5 Shafiee A., Hutchinson C.R. J. Bacteriol. 172:350-360(1990). 
[ 5] Laudenbach D.E., Ehrhardt D., Green L., Grossman A.R. J. Bacteriol. 173:2751- 
10 2760(1991). 

489. Ribonuclease III family signature 

Prokaryotic ribonuclease III (EC 3.1.26.3) (gene rnc) [1] is an enzyme that digests double- 
1 5 stranded RNA. It is involved in the processing of ribosomal RNA precursors and of some 

mRNAs. RNase III is evolutionary related [2] to the following proteins: - Fission yeast pacl, 
a ribonuclease that probably inhibits mating and meiosis by degrading a specific mRNA 
required for sexual development. - Yeast ribonuclease III (gene RNT1), a dsRNA-specific 
nuclease that cleaves eukaryotic preribosomal RNA at various sites. - Caenorhabditis elegans 
2 0 hypothetical protein F26E4.13. - Paramecium bursaria chlorella virus 1 protein A464R. - 
Synechocystis strain PCC 6803 hypothetical protein slr0346. - Fission yeast hypothetical 
protein SpAC8A4.08c, a protein with a N-terminal helicase domain and a C-terminal RNase 
III domain. - Caenorhabditis elegans hypothetical protein K12H4.8, a protein with the same 
structure as SpAC8A4.08c.These proteins share regions of sequence similarity; one of which 
25 is a highly conserved stretch of 9 residues which has been developed as a signature pattern. 

Consensus pattern: [DEQ]-[RQ]-[LM]-E-[FYW]-[LV]-G-D-[SAR]- 

[ 1] Nashimoto H., Uchida H. Mol. Gen. Genet. 201:25-29(1985). 
30 [2] Mian LS. Nucleic Acids Res. 25:3187-3195(1997). 



490. Rieske iron-sulfur protein signatures 
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Ubiquinol-cytochrome c reductase (EC 1.10.2.2 ) (also known as the bcl complexor complex 
III) is one of the electron transport chains of mitochondria and of some aerobic prokaryotes; 
it catalyzes the oxidoreduction of ubiquinol and cytochrome c. In the chloroplast of plants 
and in cyanobacteria plastoquinone-plastocyanin reductase (EC 1.10.99.1) (also known as the 
b6f complex) is functionally similar and catalyzes the oxidoreduction of plastoquinol and 
cytochrome f. One of the components of these electron transfer systems is an iron-sulfur 
protein with a 2Fe-2S cluster, which is called the Rieske protein [1,2]. The Rieske protein 
contains approximately 190 amino acid residues. The iron-sulfur cluster is complexed to the 
protein through cysteine and histidine residues. Two perfectly conserved regions in Rieske 
proteins contains all the residuesthat bind the iron-sulfur cluster. Both regions contain two 
cysteines and a histidine. The first cysteine and the histidine are 2Fe-2S ligands while the 
remaining cysteines form a disulfide bond [3]. Two conserved regions were selected as 
signature patterns. 

Consensus pattern: C-[TK]-H-L-G-C-[LIVST] [The first C and the H are 2Fe-2S ligands] 
[The second C is involved in a disulfide bond] 

Consensus pattern: C-P-C-H~x-[GSA] [The first C and the H are 2Fe-2S ligands] [The second 
C is involved in a disulfide bond] 

[ 1] Gatti F.L., Meinhardt S.W., Ohnishi T., Tzagoloff A. J. MoL Biol. 205:421-435(1989). 
[ 2] Kallas T., Spiller S., Malkin R. Proc. Natl. Acad. Sci. U.S.A. 85:5794-5798(1988). 
[ 3] Iwata S., Saynovits M., Link T.A., Michel H. Structure 4:567-579(1996). 

491. Ribosomal protein LI signature 

Ribosomal protein LI is the largest protein from the large ribosomal subunit.In Escherichia 
coli, LI is known to bind to the 23S rRNA. It belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1, 2], groups: - Eubacterial LI. - Algal and plant 
chloroplast LI. - Cyanelle LI. - Archaebacterial LI. - Vertebrate L10A. - Yeast SSMl.As a 
signature pattern, the best conserved region was selected located in the central section of 
these proteins. It is located at the end of an alpha helix thought to be involved in RNA- 
binding. 
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Consensus pattern: [IM]-x(2)-[LIVA]-x(2,3)-[LIVM]-G-x(2)-[LMS]-[GSNH]-[PTKR]- 
[KRAV]-G-x-[LIMF]-P-[DENSTKQ] 

[ 1] Nikonov S.V., Nevskaya N., Eliseikina I.A., Fomenkova N.P., Nikulin A., Ossina N., 
Garber M., Jonsson B.-H., Briand C, Al-Karadaghi S., Svensson L.A., Aevarsson A., Liljas 
A. EMBO J. 15:1350-1359(1996). 

[ 2] Olvera J., Wool I.G. 2.3.CO:2-"Biochem. Biophys. Res. Cn mmun. 220:954-957(1996). 

492. Ribosomal protein L10 signature 

Ribosomal protein L10 is one of the proteins from the large ribosomal subunit. L10 is a 
protein of 162 to 185 amino-acid residues which has only been found so far in eubacteria. A 
conserved region located in the N-terminal section of these proteins was used as a signature 
pattern. 

Consensus pattern: [DEH]-x(2)-[GS]-[LIVMF]-[STN]-[VA]-x-[DEOK]-[LIVMA]-x(2)- 
[LIM]-R 

493. Ribosomal protein LlOe signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis 
of sequence similarities. One of these families consists of: - Vertebrate L10 (QM) [1]. - Plant 
L10. - Caenorhabditis elegans L10 (F10B5.1). - Yeast L10 (QSR1). - Methanococcus 
jannaschii MJ0543 .These proteins have 174 to 232 amino-acid residues. A conserved region 
located in the central section was selected as a signature pattern. 

Consensus pattern: R-x-A-[FYW]-G-K-[PA]-x-G-x(2)-A-R-V 

[ 1] Chan Y.-L., Diaz J.-J., Denoroy L., Madjar J.-J., Wool I.G. 2.3.CO;2-"Biochem. 
Biophvs. Res. Cnmmun. 7.55 ;952-956f 19961 



494. Ribosomal protein Lll signature 
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Ribosomal protein Lll is one of the proteins from the large ribosomal subunit. In 
Escherichia coli, Lll is known to bind directly to the 23S rRNA. It belongs to a family of 
ribosomal proteins which, on the basis of sequence similarities [1,2], groups: 

- Eubacterial Lll. 

- Plant chloroplast Lll (nuclear-encoded). 

- Read algal chloroplast Lll. 
-Cyanelle Lll. 

- Archaebacterial Lll. 

- Mammalian LI 2. 

- Plants L12. 

- Yeast L12 (YL15). 

Lll is a protein of 140 to 165 amino-acid residues. A conserved region located in the C- 
terminal section of these proteins was selected as a signature pattern. In Escherichia coli, the 
C-terminal half of Lll has been shown [3] to be in an extended and loosely folded 
conformation and is likely to be buried within the ribosomal structure. 

Consensus pattern: [RKN]-x-[LIVM]-x-G-[ST]-x(2)-[SNQ]4LIVM]-G-x(2)-[LIVM]-x(04)^ 
[DENG] 

[ 1] Pucciarelli G., Remacha M., Ballesta J.P.G.; Nucleic Acids Res. 18:4409-4416(1990). 
[ 2] Otaka E. ? Hashimoto T., Mizuta K., Suzuki K.; Protein Seq. Data Anal. 5:301- 
313(1993). 

[ 3] Choli T. Biochem. Int. 19:1323-1338(1989). 

495. Ribosomal protein L7/L12 C-terminal domain 

[1] Leijonmarck M, Liljas A; J Mol Biol 1987;195:555-579. 

496. Ribosomal protein L13 signature 

Ribosomal protein L13 is one of the proteins from the large ribosomal subunit. 
In Escherichia coli, LI 3 is known to be one of the early assembly proteins of 



Attorney No. 2750-1237P 

442 

the SOS ribosomal subunit. It belongs to a family of ribosomal proteins which, 
on the basis of sequence similarities [1], groups: - Eubacterial L13. 

- Plant chloroplast L13 (nuclear-encoded). - Red algal chloroplast L13. 

- Archaebacterial L13. - Mammalian L13a (Turn P198). - Yeast Rp22 and Rp23. 
Lll is a protein of 140 to 250 amino-acid residues. As a signature pattern, a 
conserved region was selected located in the C-terminal section of these 
proteins. 

Consensus pattern: [LIVM]-[KIlV]-[GK]-M-[LIV]-[PS]-x(4 ? 5)-[GS]-[NQEKRA]-x(5)- 
[LIVM]-x-[AIV]-[LFY]-x-[GDN] 

[ 1] Chan Y.-L., Olvera J, Glueck A., Wool LG. J. Biol. Chem. 269:5589-5594(1994). 
497. Ribosomal protein L13e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of 
sequence similarities [1]. One of these families consists of: 

- Vertebrate L13 (was previously known as Breast Basic Conserved protein 1 
(BBC1)). - Drosophila L13. - Plant L13. - Yeast probable L13 (YM9375.11c). 

These proteins have 199 to 218 amino-acid residues. As a signature pattern, 
a stretch of about 16 residues in the first third of these proteins selected. 

-Consensus pattern: [KR]-Y-x(2)-K-[LIVM]-R-[STA]-G-[KR]-G-F-[ST]-L-x-E 
[ 1] Olvera J., Wool LG. Biochem. Biophys. Res. Commun. 201:102-107(1994). 

498. Ribosomal protein L14 signature 

Ribosomal protein L14 is one of the proteins from the large ribosomal subunit. 
In eubacteria, L14 is known to bind directly to the 23S rRNA. It belongs to a 
family of ribosomal proteins which, on the basis of sequence similarities [1], 
groups: - Eubacterial L14. - Algal and plant chloroplast L14. - Cyanelle L14. 

- Archaebacterial L14. - Yeast L17A. - Mammalian L23. 

- Caenorhabditis elegans L23 (B0336.10). - Higher eukaryotes mitochondrial L14. 
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- Yeast mitochondrial Yml38 (gene MRPL38). 

LI 4 is a protein of 119 to 137 amino-acid residues. As a signature pattern, 

a conserved region located in the C-terminal half of these proteins was selected. 

5 -Consensus pattern: [GA]-[LIV](3)-x(9,10)-[DNS]-G-x(4)-[FY]-x(2)-[NT]-x(2)-V-[LIV] 

[ 1] Otaka E., Hashimoto T., Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301- 
313(1993). 

10 

499. Ribosomal protein LI 5 signature 

Ribosomal protein LI 5 is one of the proteins from the large ribosomal subunit. 
In Escherichia coli, L15 is known to bind the 23S rRNA. It belongs to a family 
of ribosomal proteins which, on the basis of sequence similarities [1], 
1 5 groups: - Eubacterial L15. - Plant chloroplast L15 (nuclear-encoded). 

- Archaebacterial L15. - Vertebrate L27a. - Tetrahymena thermophila L29. 

- Fungi L27a (L29, CRP-1, CYH2). 

L15 is a protein of 144 to 154 amino-acid residues. As a signature pattern, 
a conserved region was selected in the C-terminal section of these proteins. 

20 

-Consensus pattern: K-[LIVM](2)-[GASL]-x-[GT]-x-[LIVMA]-x(2,5)-[LIVM]-x-[LIVMF]- 
x(3,4)-[LIVMFCA]-[ST]-x(2)-A-x(3)-[LIVM]-x(3)-G 

[ 1] Otaka E., Hashimoto T., Mizuta K., Suzuki K, Protein Seq. Data Anal. 5:301- 
25 313(1993). 

500. Ribosomal protein L15e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
30 on the basis of sequence similarities [1]. One of these families consists of: 

- Mammalian L15. - Insect L15. - Plant L15. - Yeast YL10 (L13) (Rpl5r). 

- Thermoplasma acidophilum L15. 

These proteins have about 200 amino acid residues. As a signature pattern, 
a conserved region was selected located in the central section. 
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-Consensus pattern: [DE]-[KR]-A-R-x-L-G-[FY]-x-[SAP]-x(2)-G-[LIVMFY](4)-R-x-R- 

[IV]-x-R-G 
[ 1] Zwickl P., Lupas A., Baumeister W. 

Biochem. Biophys. Res. Commun. 209:684-688(1995). 

501. Ribosomal protein L17 signature 

Ribosomal protein L17 is one of the proteins from the large ribosomal subunit. 
L17 belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities, groups: - Eubacterial L17. 

- Yeast mitochondrial YmL8 (gene MRPL8). 

Eubacterial L17 is a protein of 120 to 130 amino-acid residues. Yeast YmL8 is 
twice larger (238 residues), the sequence of its N-terminal half is colinear 
with that of eubacterial L17. As a signature pattern, a conserved region in 
the N-terminal section was selected. 

-Consensus pattern: I-x-[ST]-[GT]-x(2)-[KR]-x-K-x(6)-[DE]-x-[LIMV]-[LIVMT]-T- 
x-[STAG]-[KR] 

502. Ribosomal protein L18e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Vertebrate L18 (known as L14 in Xenopus) [1]. - Plant L18. 

- Yeast L18 (Rp28). - Halobacterium marismortui H129. 
■ Sulfolobus acidocaldarius H129e. 

These proteins have 115 to 187 amino-acid residues., A stretch of about 13 residues in the 

first third of these proteins has been selected as a signature pattern. 

-Consensus pattern: [KRE]-x-L-x(2)-[PS]-[KR]~x(2)-[RH]-[PSA]-x-[LIVM]-[NS]- 

[LIVM]-x-[RK]-[LIVM] 
[ 1] Puder M., Barnard G.F., Staniunas R.J., Steele G.D. Jr., Chen L.B. 

Biochim. Biophys. Acta 1216:134-136(1993). 

503. Ribosomal L18p family 
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It has been shown that the amino terminal 93 amino acids 
of Swiss:P09895 are necessary and sufficient to bind 5S 
rRNA in vitro. The carboxyl-terminal half of the protein, 
comprising amino acids 151-296, serves to localize the 
protein to the nucleolus [1]. 
Number of members: 26 

[1] 

Medline: 96212235 

Distinct domains in ribosomal protein L5 mediate 5 S rRNA 
binding and nucleolar localization. 
Michael WM, Dreyfuss G; 
J Biol Chem 1996;271:11571-11574. 

504. Ribosomal protein L19 signature 

Ribosomal protein L19 is one of the proteins from the large ribosomal subunit. 
In Escherichia coli, L19 is known to be located at the 30S-50S ribosomal 
subunit interface and may play a role in the structure and function of the 
aminoacyl-tRNA binding site. It belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities, groups: - Eubacterial L19. 

- Red algal chloroplast L19. - Cyanelle L19. 

L19 is a protein of 120 to 130 amino-acid residues., 

A conserved region in the C-terminal section has been selected as a signature pattern. 
-Consensus pattern: [LIVM]-x-[KRGTI]-x-[GSAI]-[KRQ^ 
[SA]-[KY]-[KLI]-[LYS]-Y-[LIM]-R 

505. Ribosomal protein L19e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Mammalian ribosomal protein L19 [1]. - Drosophila ribosomal protein L19 [2]. 

- Slime mold (D. discoideum) vegetative specific protein VI 4 [3]. 

- Yeast ribosomal protein L19 (YL14). - Archebacterial ribosomal protein L19E. 
These proteins have 148 to 203 amino-acid residues. 
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A stretch of about 20 residues in the N-terminal part of these 
proteins has been selected as a signature pattern. 

-Consensus pattern: Q-[KR]-R-[LIVM]-x-[SA]-x(4)-[CV]-G-x(3)-[IV]-[WK]-[LIVF]- 
[DN]-P 

5 [ 1] Chan Y.-L., Lin A., McNally J., Peleg D., Meyuhas O., Wool I.G. 

J. Biol. Chem. 262:1111-1115(1987).[ 2] Hart K., Klein T., Wilcox M. 
Mech. Dev. 43:101-110(1993).[ 3] Singleton C.K., Manning S.S., Ken R. 
Nucleic Acids Res. 17:9679-9692(1989). 

10 

506. Ribosomal protein Lie signature (Ribosomal_L4) 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists [1,2,3, 
4] of: - Vertebrate LI (L4). - Drosophila LI. - Plant LI. - Yeast L2 (Rp2). 
15 - Fission yeast L2. - Halobacterium marismortui HmaL4 (HL6). 
- Methanococcus jannaschii MJ0177. 

These proteins have 246 (archaebacteria) to 427 (human) amino acids. A conserved region 
in the N-terminal part of these proteins has been selected as a signature pattern. 
-Consensus pattern: N-x(3)-[KRM]-x(2)-A-[LIVT]-x-S-A-[LIV]-x-A-[ST]-[SGA]- 

20 x(7)-[RK]-[GS]-H 

[ 1] Rafti F., Gargiulo G., Manzi A., Malva C, Graziani F. 
Nucleic Acids Res. 17:456-456(1989).[ 2] Presutti C, Villa T., Bozzoni I. 
Nucleic Acids Res. 21:3900-3900(1993). 
[ 3] Bagni C, Mariottini P., Annesi F., Amaldi F. 
2 5 Biochim. Biophys. Acta 1216:475-478(1993). 

[ 3] Arndt E., Kroemer W., Hatakeyama T. J. Biol. Chem. 265:3034-3039(1990). 



507. Ribosomal protein L2 signature 
30 Ribosomal protein L2 is one of the proteins from the large ribosomal subunit. 
In Escherichia coli, L2 is known to bind to the 23S rRNA and to have 
peptidyltransf erase activity. It belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1,2], groups: - Eubacterial L2. 
- Algal and plant chloroplast L2. - Cyanelle L2. - Archaebacterial L2. 
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- Plant L2. - Slime mold L2. - Marchantia polymorpha mitochondrial 12. 

- Paramecium tetraurelia mitochondrial L2. - Fission yeast K5, K37 and KD4. 

- Yeast YL6. - Vertebrate L8. 

The best conserved region located in the C-terminal section of these proteins has been 

selected as 

a signature pattern. 

-Consensus pattern: P-x(2)-R-G-[STAIV](2)-x-N-[APK]-x-[DE] 
[ 1] Marty I., Meyer Y. 

Nucleic Acids Res. 20:1517-1522(1992). 
[ 2] Otaka E., Hashimoto T., Mizuta K., Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 

508. Ribosomal protein L20 signature 

Ribosomal protein L20 is one of the proteins from the large ribosomal subunit. 
In Escherichia coli, L20 is known to bind directly to the 23S rRNA. It belongs 
to a family of ribosomal proteins which, on the basis of sequence 
similarities [1], groups: - Eubacterial L20. - Algal and plant chloroplast L20. 
- Cyanelle L20. 

L20 is a protein of about 120 amino-acid residues. A conserved region located in the 

central section of these proteins has been selected as a signature pattern. 

-Consensus pattern: K-x(3)-[KRC]-x-[LIVM]-W-[IV]-[STNALV]-R-[LIVM]-[NS]-x(3)- 

[RKHS] 

[ 1] Otaka E., Hashimoto T„ Mizuta K., Suzuki K. 
Protein Seq. Data Anal. 5:301-313(1993). 

509. Ribosomal protein L21e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Mammalian L21 [1]. - Entamoeba histolytica L21 [2]. 

- Caenorhabditis elegans L21 (C14B9.7). - Yeast L21E (URP1) [3]. 

- Halobacterium marismortui HL31 [4]. 

These proteins have 160 (eukaryotes) or 95 (archebacteria) amino-acid 
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residues. A conserved region in the central part of these proteins has been selected 
as a signature pattern. 

-Consensus pattern: G-[DE]-x-V-x(10)-[GV]-x(2)-[FYH]-x(2)-[FY]-x-G-x-T-G 
[ 1] Devi K.R.G., Chan Y.-L., Wool LG. 

Biochem. Biophys. Res. Commun. 162:364-370(1989). 
[ 2] Petter R., Rozenblatt S., Nuchamowitz Y. ? Mirelman D. 

Mol. Biochem. ParasitoL 56:329-333(1992). 
[ 3] Jank B., Waldherr M., Schweyen RJ. Curr. Genet 23:15-18(1993). 
[ 4] Hatakeyama T., Kimura M. Eur. J. Biochem. 172:703-711(1988). 

510. Ribosomal protein L21 signature 

Ribosomal protein L21 is one of the proteins from the large ribosomal subunit 
In Escherichia coli, L21 is known to bind to the 23S rRNA in the presence of 
L20. It belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities, groups: - Eubacterial L21. 

- Marchantia polymorpha chloroplast L21. - Cyanelle L21. 

- Spinach chloroplast L21 (nuclear-encoded). 

Eubacterial L21 is a protein of about 100 amino-acid residues, the mature form 
of the spinach chloroplast L21 has 200 residues. A conserved region located in the 
C-terminal section of these proteins has been selected as a signature pattern. 
-Consensus pattern: [IVT]-x(3)-[KR]-x(3)-[KRQ]-K-x(6)-G-[HF]-R-[RO]-x(2)-[ST] 

511. Ribosomal protein L22 signature 

Ribosomal protein L22 is one of the proteins from the large ribosomal subunit. 
In Escherichia coli, L22 is known to bind 23S rRNA. It belongs to a family of 
ribosomal proteins which, on the basis of sequence similarities [1,2,3], 
groups: - Eubacterial L22. 

- Algal and plant chloroplast L22 (in legumes L22 is encoded in the nucleus 
instead of the chloroplast). - Cyanelle L22. - Archaebacterial L22. 

- Mammalian L17. - Plant L17. - Yeast YL17. 

A conserved region located in the C- terminal section of these proteins has 
been selected as a signature pattern. 
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-Consensus pattern: [RKQN]-x(4)-[RH]-[GAS]-x-G-[KRQS]-x(9)-[HDN]-[LIVM]-x- 

[LIVMS]-x-[LIVM] 
[ 1] Gantt J.S., Baldauf S.L., Calie P.J., Weeden N.F., Palmer J.D. 

EMBO J. 10:3073-3078(1991).[ 2] Madsen L.H., Kreiberg J.D., Gausing K. 

Curr. Genet. 19:417-422(1991). 
[ 3] Otaka E., Hashimoto T., Mizuta K., Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 

512. Ribosomal protein L23 signature 

Ribosomal protein L23 is one of the proteins from the large ribosomal summit. 
In Escherichia coli, L23 is known to bind a specific region on the 23S rRNA; 
in yeast, the corresponding protein binds to a homologous site on the 26S rRNA 
[l].It belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities [2,3,4], groups: - Eubacterial L23. 

- Algal and plant chloroplast L23. - Archaebacterial L23. - Mammalian L23A. 

- Caenorhabditis elegans L23A (F55D10.2). - Fungi L25. 

- Yeast mitochondrial YmL41 (gene MRPL41 or MRP20). 

A small conserved region in the C-terminal section of these proteins, which is 
probably involved in rRNA-binding has been selected as a signature pattern [2]. 
-Consensus pattern: [RK](2)-[AM]-[IVFYT]-[IV]-[RKT]-L-[STANEQK]-x(7)-[LIVMFT] 
[ 1] El Baradi T.T.A.L., Raue H.A., van de Regt C.H.F., Verbree E.C., 

Planta R.J. EMBO J. 4:210-2107(1985). 
[ 2] Raue H.A., Otaka E., Suzuki K. J. Mol. Evol. 28:418-426(1989). 
[ 3] Fearon K., Mason T.L. J. Biol. Chem. 267:5162-5170(1992). 
[ 4] Otaka E., Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

513. Ribosomal protein L24 signature 

Ribosomal protein L24 is one of the proteins from the large ribosomal subunit. 
L24 belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities, groups: - Eubacterial L24. 
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- Plant chloroplast L24 (nuclear-encoded). - Red algal L24. - Vertebrate L26. 

- Yeast L26 (YL33). - Archaebacterial HmaL24 (HL15). 

- A probable ribosomal protein from Sulfolobus acidocaldarius [1]. 

In their mature form, these proteins have 103 to 150 amino-acid residues. 

A conserved stretch of 20 residues in their N-terminal section has been selected as a 

signature pattern. 

-Consensus pattern: [GDEN]-D-x-V-x-[IV]-[LIVMA]-x-G-x(2)-[KRA]-[GNO]-x(2 ? 3)- 

[GA]-x-[IV] 
[ 1] Ouzounis C, Kyrpides N., Sander C. 

Nucleic Acids Res. 23:565-570(1995). 

514. Ribosomal protein L24e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists [1] of: 

- Mammalian ribosomal protein L24. 

- Yeast ribosomal protein L30A/B (Rp29) (YL21). 

- Kluyveromyces lactis ribosomal protein L30. 

- Arabidopsis thaliana ribosomal protein L24 homolog. 

- Haloarcula marismortui ribosomal protein HL21/HL22. 

- Methanococcus jannaschii MJ1201. 

These proteins have 60 to 160 amino-acid residues. The most conserved region, which is 
located in the N-terminal region of these proteins has been selected as a signature pattern. 
-Consensus pattern: [FY]-x-[GSH]-x(2)-[IV]-x-P-G-x-G-x(2)-[FYV]-x-[KRHE]-x-D 
[ 1] Chan Y.-L., Olvera J., Wool I.G. 
Biochem. Biophys. Res. Commun. 202:1176-1180(1994). 

515. Ribosomal protein L27 signature 

Ribosomal protein L27 is one of the proteins from the large ribosomal subunit. 
L27 belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1,2], groups: - Eubacterial L27. 

- Plant chloroplast L27 (nuclear-encoded). - Algal chloroplast L27. 

- Yeast mitochondrial YmL2 (gene MRPL2 or MRP7). 
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The schematic relationship between these groups of proteins is shown below. 
Eub. L27 NxxxxxxxxxAlgal L27 Nxxxxxxxxx 
Plant L27 tttttNxxxxxxxxxxxxx 

Yeast MRP7 tttNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

***r t i. trans i t peptide. 
'N': N-terminal of mature protein.'* position of the pattern. 
-Consensus pattern: G-x-[LIVM](2)-x-R-Q-R-G-x(5)-G 
[ 1] Elhag G.A. ? Bourque D.P. Biochemistry 31:6856-6864(1992). 
[ 2] Otaka E. ? Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

516. Ribosomal L28 family 

The ribosomal 28 family includes L28 proteins from bacteria 
and chloroplasts. The L24 protein from yeast Swiss:P36525 
also contains a region of similarity to prokaryotic L28 
proteins. L24 from yeast is also found in the large 
ribosomal subunit 
Number of members: 24 

517. Ribosomal protein L29 signature 

Ribosomal protein L29 is one of the proteins from the large ribosomal subunit. 
L29 belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1], groups: - Eubacterial L29. - Red algal L29. 

- Axchaebacterial L29. - Mammalian L35 - Caenorhabditis elegans L35 (ZK652.4). 

- Yeast L35. 

L29 is a protein of 63 to 138 amino-acid residues. 

A conserved region located in the central section of L29 has been selected as a 
signature pattern. 

-Consensus pattern: [KNQS]-[PSTL]-x(2)»[LIMFA]-[KRGSAN]-x-[LIVYSTA]-[KR]- 

[KRHQS]-[DESTANRL]-[LIV]-A-[KRCQVT]-[LIVMA] 
[ 1] Otaka E. ? Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 
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518. Ribosomal protein L3 signature 

Ribosomal protein L3 is one of the proteins from the large ribosomal subunit. 
In Escherichia coli, L3 is known to bind to the 23S rRNA and may participate 
in the formation of the peptidyltransferase center of the ribosome. It belongs 
to a family of ribosomal proteins which, on the basis of sequence 
similarities [1,2,3,4], groups: - Eubacterial L3. - Red algal L3. - Cyanelle L3. 

- Archaebacterial Halobacterium marismortui HmaL3 (HL1). 

- Yeast L3 (also known as trichodermin resistance protein) (gene TCM1). 

- Arabidopsis thaliana L3 (genes ARP1 and ARP2). - Mammalian L3 (L4). 

- Mammalian mitochondrial L3. - Yeast mitochondrial YmL9 (gene MRPL9). 

A conserved region located in the central section of these proteins has been selected 
as a signature pattern. 

-Consensus pattern: [FL]-x(6)-[DN]-x(2)-[AGS]-x-[ST]-x-G-[KRH]-G-x(2)-G-x(3)-R 
[ 1] Arndt R, Kroemer W. ? Hatakeyama T. J. Biol. Chem. 265:3034-3039(1990). 
[ 2] Graack H.R., Grohmann L., Kitakawa NL, Schaefer K.L., Kruft V. 

Eur. J. Biochem. 206:373-380(1992). 
[ 3] Herwig S., Kruft V. ? Wittmann-Liebold B. 

Eur. J. Biochem. 207:877-885(1992). 
[ 4] Otaka E. ? Hashimoto T. ? Mizuta K., Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 

519. Ribosomal protein L30 signature 

Ribosomal protein L30 is one of the proteins from the large ribosomal subunit. 
L30 belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1], groups: - Eubacterial L30. - Archaebacterial L30. 

- Drosophila L7. - Slime mold L7. - Mammalian L7. - Fungi L7 (YL8). 

- Yeast mitochondrial L33. 

L30 from eubacteria are small proteins of about 60 residues, those from 
archaebacteria are proteins of about 150 residues. Eukaryotic L7 are proteins 
of about 250 to 270 residues. The schematic relationship between the three 
groups of proteins is shown below.Eub. L30 NxxxxxxxxxxC 
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Arc. L30 NxxxxxxxxxxxxxxxxxxxxxxxxxxxC 

Euk. L7 NxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxC 

*******t*i. position of the pattern. 
The signature pattern for this family of ribosomal proteins spans the 
N-terminal half of the region common to all these proteins. 

-Consensus pattern: [IVT]-[LIVM]-x(2)-[LF]-x-[LI]-x-[KRHQEG]-x(2)-[STNQH]-x- 

[IVT]-x(10)-[LMS]-[LIV]-x(2)-[LIVA]-x(2)-[LMFY]-[IVT] 
[ 1] Mizuta K., Hashimoto T. ? Otaka E. 
Nucleic Acids Res. 20:1011-1016(1992). 

520. Ribosomal protein L31 signature 

Ribosomal protein L31 is one of the proteins from the large ribosomal subunit. 
L31 is a protein of 66 to 97 amino-acid residues which has only been found so 
far in eubacteria and in some algal chloroplasts. 

A conserved region located in the central section of these proteins has been selected as 
a signature pattern. 

-Consensus pattern: H-P-F-[FY]-[TI]-x(9)-G-R-[AIV]-x-[KRQ] 

521. Ribosomal protein L31e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Mammalian L31 [1]. - Chlamydomonas reinhardtii L31. - Yeast L34. 

- Halobacterium marismortui HL30 [2]. 

These proteins have 87 to 128 amino-acid residues. 

A conserved region, located in the central section has been selected as a signature pattern. 
-Consensus pattern: V-[KR]-[LIVM]-x(3)-[LIVM]-N-x-[AKH]-x-W-x-[KR]-G 
[ 1] Tanaka T., Kuwano Y. ? Kuzumaki T., Ishikawa K., Ogata K. 

Eur. J. Biochem. 162:45-48(1987).[ 2] Bergmann U., Arndt E. 

Biochim. Biophys. Acta 1050:56-60(1990). 



522. Ribosomal protein L33 signature 
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Ribosomal protein L33 is one of the proteins from the large ribosomal subunit. 
In Escherichia coli, L33 has been shown to be on the surface of SOS subunit. 
L33 belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1,2,3], groups: - Eubacterial L33. 

- Algal and plant chloroplast L33. - Cyanelle L33. 

L33 is a small protein of 49 to 66 amino-acid residues. A conserved region located 
in the central section of L33 has been selected as a signature pattern. 
-Consensus pattern: Y-x-[ST]-x-[KR]-[NS]-x(4)-[PATO]-x(l,2)-[LIVM]-[EA]-x(2)- 
K-[FY]-[CSD] 

[ 1] Kruft V., Kapp U., Wittmann-Liebold B. Biochimie 73:855-860(1991). 
[ 2] Sharp P.M. Gene 139:129-130(1994). 
[ 3] Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

523. Ribosomal protein L34 signature 

Ribosomal protein L34 is one of the proteins from the large subunit of the prokaryotic 
ribosome. It is a small basic protein of 44 to 51 amino-acid residues [1]. L34 belongs to a 
family of ribosomal proteins which, on the basis of sequence similarities, groups: - 
Eubacterial L34. 

- Red algal chloroplast L34. - Cyanelle L34. 

A conserved region that corresponds to the N-teraninal half of L34 has been selected 
as a signature pattern. 

-Consensus pattern: K-[RG]-T-[FYWL]-[EQS]-x(5)-[KRHS]-x(4,5)-G-F-x(2)-R 
[ 1] Old I.G., Margarita D., Saint Girons I. 
Nucleic Acids Res. 20:6097-6097(1992). 

524. Ribosomal protein L34e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Mammalian L34. - Mosquito L31 [1]. - Plant L34 [2]. 

- Yeast putative ribosomal protein YIL052c. - Methanococcus jannaschii MJ0655. 
These proteins have 89 to 129 amino-acid residues. 

A conserved region located in the N-terminal section of these proteins has been 
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selected as a signature pattern. 

-Consensus pattern: Y-x-[ST]-x-S-[NY]-x(5)-[KR]-T-P-G 
[ 1] Lan Q., Niu L.L., Fallon A.M. 

Biochim. Biophys. Acta 1218:460-462(1994). 
[ 2] Gao J., Kim S.R., Chung Y.Y., Lee J.M., An G. 

Plant Mol. Biol. 25:761-770(1994). 

525. Ribosomal protein L35Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Vertebrate L35A. - Caenorhabditis elegans L35A (F10E7.7). 

- Yeast L37A/L37B (Rp47). - Pyrococcus woesei L35A homolog [1]. 
These proteins have 87 to 110 ammo-acid residues. 

A highly conserved stretch of 22 residues in the C-terminal part of 
these proteins has been selected as a signature pattern. 

-Consensus pattern: G-K-[LIVM]-x-R-x-H-G-x(2)-G-x-V-x-A-x-F-x(3)-[LI]-P 
[ 1] Ouzounis C, Kyrpides N., Sander C. 
Nucleic Acids Res. 23:565-570(1995). 

526. Ribosomal protein L36 signature 

Ribosomal protein L36 is the smallest protein from the large subunit of the prokaryotic 
ribosome. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1], groups: - Eubacterial L36. - Algal and plant chloroplast L36. - Cyanelle 
L36.L36 is a small basic and cysteine-rich protein of 37 amino-acid residues. As a signature 
pattern, a conserved region that corresponds to positions 11 to 36 in L36 and includes three 
conserved cysteine residues has been developed. 

Consensus pattern: C-x(2)-C-x(2)-[LIVM]-x-R-x(3)-[LIVMN]-x-[LIVM]-x-C-x(3,4)- [KR]- 
H-x-Q-x-Q- 

[ 1] Otaka E., Hashimoto T., Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 



527. Ribosomal protein L36e signature 
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A number of eukaryotic ribosomal proteins can be grouped on the basis of 
sequence similarities. One of these families consists of: - Mammalian L36 [1]. 

- Drosophila L36 (M(1)1B). - Caenorhabditis elegans L36 (F37C12.4). 

- Candida albicans L39. - Yeast YL39. 

These proteins have 99 to 104 amino acids. 

A conserved region in the central part of these proteins has been selected as a signature 
pattern. 

-Consensus pattern: P-Y-E-[KR]-R-x-[LIVM]-[DE]-[LIVM](2)-[KR] 
[ 1] Chan Y.-L., Paz V., Olvera J. ? Wool I.G. 
Biochem. Biophys. Res. Commun. 192:849-853(1993). 

528. Ribosomal protein L39e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Mammalian L39 [1]. - Plants L39. - Yeast L46 [2]. - Archebacterial L39e [3], 
These proteins are very basic. About 50 residues long, they are the smallest 
proteins of eukaryotic-type ribosomes. A conserved region in the C-terminal 
section of these proteins has been selected as a signature pattern. 

-Consensus pattern: [KRA]-T-x(3)-[LIVM]-[KRQF]-x-[NHS]-x(3)-R-[NHY]-W-R-R 
[ 1] Lin A., McNally J. ? Wool LG. J. Biol. Chem. 259:487-490(1984). 
[ 2] Leer RJ. ? van Raamsdonk-Duin M.M.C., Kraakman P., Mager W.H., 

Planta RJ. Nucleic Acids Res. 13:701-709(1985). 
[ 3] Ramirez C, Louie K.A., Matheson A.T. FEBS Lett. 250:416-418(1989). 

529. Ribosomal L40e family 

Bovine L40 has been identified as a secondary RNA binding 
protein [1]. L40 is fused to a ubiquitin protein [2]. 
Number of members: 27 

[1] 

Medline: 88203200 

RNA binding proteins of the large subunit of bovine 
mitochondrial ribosomes. 
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Piatyszek MA, Denslow ND, O'Brien TW; 
Nucleic Acids Res 1988;16:2565-2583. 
[2]Medline: 96011832 
The carboxyl extensions of two rat ubiquitin fusion proteins 
are ribosomal proteins S27a and L40. 
Chan YL, Suzuki K, Wool IG; 
Biochem Biophys Res Commun 1995;215:682-690. 

530. (Ribosomal L44) Ribosomal protein L44e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Mammalian L44 [1]. - Trypanosoma brucei L44. 

- Caenorhabditis elegans L44 (C09H10.2). - Fungal L44 (L41). 

- Halobacterium marismortui LA [2], 

These proteins have 92 to 105 amino-acid residues. 

A conserved region located in the C-terminal part of these proteins has been 
selected as a signature pattern. 

-Consensus pattern: K-x-[TV]-K-K-x(2)-L-[KR]-x(2)-C 
[ 1] Gallagher M J., Chan Y.-L., Lin A., Wool I.G. DNA 7:269-273(1988). 
[ 2] Bergmann U., Wittmann-Liebold B. 
Biochim. Biophys. Acta 1173:195-200(1993 

531. Ribosomal protein L5 signature 

Ribosomal protein L5 is one of the proteins from the large ribosomal subunit. 
In Escherichia coli, L5 is known to be involved in binding 5S RNA to the large 
ribosomal subunit. It belongs to a family of ribosomal proteins which, on the 
basis of sequence similarities [1,2,3,4], groups: - Eubacterial L5. 

- Algal chloroplast L5. - Cyanelle L5. - Archaebacterial L5. - Mammalian Lll. 

- Tetrahymena thermophila L21. - Slime mold L5 (V18). - Yeast L16 (39A). 

- Plants mitochondrial L5. 

L5 is a protein of about 180 amino-acid residues. 

A conserved region, located in the first third of these 
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proteins has been selected as a signature pattern. 

-Consensus pattern: [LIVM]-x(2)-[LIVM]-[STAVC]-[GE]-[QV]-x(2)-[LIVMA]-x-[STC]- 
x-[STAG]-[KRH]-x-[STA] 

[ 1] Hatakeyama T., Hatakeyama T. Biochim. Biophys. Acta 1039:343-347(1990). 
[ 2] Rosendahl G., Andreasen P.H., Kristiansen K. Gene 98:161-167(1991). 
[ 3] Yang D. ? Gunther I., Matheson A.T., Auer J., Spicker G., Boeck A. 

Biochimie 73:679-682(1991). 
[ 4] Otaka E., Hashimoto T., Mizuta K., Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 

532. ribosomal L5P family C-terminus 

This region is found associated with Ribosomal_L5. 
Number of members: 60 

533. Ribosomal protein L6 signatures 

Ribosomal protein L6 is one of the proteins from the large ribosomal subunit. In 
Escherichia coli, L6 is known to bind directly to the 23S rRNA and is located at the 
aminoacyl-tRNA binding site of the peptidyltransferase center. It belongs to a family of 
ribosomal proteins which, on the basis of sequence similarities [1,2,3,4], groups: - 
Eubacterial L6. 

- Algal chloroplast L6. 
Cyanelle L6. 
Archaebacterial L6. 

- Marchantia polymorpha mitochondrial L6. 

- Yeast mitochondrial YmL6 (gene MRPL6). 
Mammalian L9. 

Drosophila L9. 

- Plants L9. 

- Yeast L9 (YL11). 

While all the above proteins are evolutionary related it is very difficult to derive a 
pattern that will find them all. Two patterns were therefore created, the first to detect 
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eubacterial, cyanelle and mitochondrial L6 7 the second to detect archaebacterial L6 as well as 
eukaryotic L9. 

-Consensus pattern: [PS]-[DENS]-x-Y-K-[GA]-K-G-[LIVM] 

-Consensus pattern: Q-x(3)-[LIVM]-x(2H 

[KR] 

[1] Suzuki K. ? Olvera J. ? Wool LG. Gene 93:297-300(1990). 

[2] Schwank S., Harrer R., Schueller H.-J. ? Schweizer E. Curr. Genet. 24:136-140(1993). 

[3] Golden B.L., Ramakrishnan V., White S.W. EMBO J. 12:4901-4908(1993). 

[ 4] Otaka E., Hashimoto T. ? Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 



534. Ribosomal protein L6e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Mammalian ribosomal protein L6 (L6 was previously known as TAX-responsive 
enhancer element binding protein 107). 

- Caenorhabditis elegans ribosomal protein L6 (R151.3). 

- Yeast ribosomal protein YL16A/YL16B. 

- Mesembryanthemum crystallinum ribosomal protein YL16-like. 

These proteins have 175 (yeast) to 287 (mammalian) amino acids. A highly conserved 

region in the central part of these proteins has been selected as a signature 

pattern. 

-Consensus pattern: N-x(2)-P-L-R-R-x(4)-[FY]-V-I-A-T-S-x-K 



535. Ribosomal protein L7Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Vertebrate L7A (SURF3) [1]. - Plant L7A. - Yeast L7A (YL5) (Rp6). 

- Yeast protein NHP2 [2]. - Yeast hypothetical protein YEL026w. 

- Bacillus subtilis hypothetical protein ylxQ. - Halobacterium marismortui Hs6. 

- Methanococcus jannaschii MJ1203. 

These proteins have 100 to 265 amino-acid residues. 
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A conserved region located in the central section has been selected as a signature pattern. 
-Consensus pattern: [CA]-x(4)-[IV]-P-[FY]-x(2)-[LIVM]-x-[GSQ]-[KRQ]-x(2)-^ 
[ 1] Colombo P., Yon J., Garson K., Fried M. 

Proc. Natl. Acad. Sci. U.S.A. 89:6358-6362(1992). 
[ 2] Kolodrubetz D., Burgum A. Yeast 7:79-90(1991). 

536. Ribosomal protein L9 signature 

Ribosomal protein L9 is one of the proteins from the large ribosomal subunit. 
In Escherichia coli, L9 is known to bind directly to the 23S rRNA. It belongs 
to a family of ribosomal proteins which, on the basis of sequence similarities 
[1,2], groups: - Eubacterial L9. - Cyanobacterial L9. 
- Plant chloroplast L9 (nuclear-encoded). - Red algal chloroplast L9. 
A conserved region, located in the N-terminal section of these proteins has been selected 
as a signature pattern. 

-Consensus pattern: G-x(2)-[GN]-x(4)-V-x(2)-G-[FY]-x(2)-N-[FY]-L-x(5)-[GA]- 
x(3)-[STN] 

[ 1] Hoffman D.W., Davies C, Gerchman S.E., Kycia J.H., Porter S.J., 

White S.W., Ramakrishnan V. EMBO J. 13:205-212(1994). 
[ 2] Otaka E., Hashimoto T., Mizuta K. ? Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 

537. Ribosomal protein S10 signature 

Ribosomal protein S10 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S10 is known to be involved in binding tRNA to the 
ribosomes. It belongs to a family of ribosomal proteins which, on the basis 
of sequence similarities [1], groups: - Eubacterial S10. 

- Algal chloroplast S10. - Cyanelle S10. - Archaebacterial S10. 

- Marchantia polymorpha and Prototheca wickerhamii mitochondrial S10. 

- Arabidopsis thaliana mitochondrial S10 (nuclear encoded). - Vertebrate S20. 

- Plant S20. - Yeast URP2. 

S10 is a protein of about 100 amino-acid residues. 
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A conserved region located in the center of these proteins has been selected as a signature 
pattern. 

-Consensus pattern: [AV]-x(3)-[GDNSR]-[LIVMSTA]-x(3)-G-P-[LIVM]-x-[LIVM]-P-T 
[ 1] Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

538. Ribosomal protein Sll signature 

Ribosomal protein Sll [1] plays an essential role in selecting the correct 
tRNA in protein biosynthesis. It is located on the large lobe of the small 
ribosomal subunit Sll belongs to a family of ribosomal proteins which, on the 
basis of sequence similarities, groups [2]: - Eubacterial Sll. 

- Algal and plant chloroplast Sll. - Cyanelle Sll. - Archaebacterial Sll. 

- Marchantia polymorpha and Prototheca wickerhamii mitochondrial Sll. 

- Acanthamoeba castellanii mitochondrial Sll. - Neurospora crassa S14 (crp-2). 

- Yeast S14 (RP59 or CRY1). 

- Mammalian, Drosophila, Trypanosoma, and plant S14. 

- Caenorhabditis elegans S14 (F37C12.9). 

One of the best conserved regions in these proteins was selected as a signature 
pattern. 

-Consensus pattern: [LIVMF]-x-[GSTAC]-[LIVMF]-x(2)-[GSTAL]-x(0,l)-[GSN]- 

[LIVMF]-x-[LIVM]-x(4)-[DEN]-x-T-P-x-[PA]-[STCH]-[DN] 
[ 1] Kimura M., Kimura J., Hatakeyama T. FEBS Lett. 240:15-20(1988). 
[ 2] Otaka E., Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

539. Ribosomal protein S12 signature 

Ribosomal protein S12 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S12 is known to be involved in the translation initiation 
step. It is a very basic protein of 120 to 150 amino-acid residues. S12 
belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1], groups: - Eubacterial S12. - Archaebacterial S12. 
- Algal and plant chloroplast S12. - Cyanelle S12. 
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- Protozoa and plant mitochondrial S12. - Yeast S28. 

- Drosophila mitochondrial protein tko (Technical KnockOut). - Mammalian S23. 
The best conserved regions in these proteins, located in the center of each 
sequence have been selected as a signature pattern. 

-Consensus pattern: [RK]-x-P-N-S-[AR]-x-R 
[ 1] Otaka E., Hashimoto T. ? Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

540. Ribosomal protein S12e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of 
sequence similarities. One of these families consists of: - Vertebrate S12 [1]. 

- Trypanosoma brucei S12 [2], - Caenorhabditis elegans S12 (F54E7.2). 

- Drosophila S12. - Yeast S12, 

These proteins have 130 to 150 amino acids. 

A conserved region in the N-terminal part of these proteins has been selected 
as a signature pattern. 

-Consensus pattern: A-L-[KRQP]-x-V-L-x(2)-[SA]-x(3)-[DN]-G-L 
[ 1] Lin A., Chan Y.-L., Jones R. ? Wool I.G. 

J. BioL Chem. 262:14343-14351(1987).[ 2] Marchal C, Ismaili N., Pays E. 

Mol. Biochem. Parasitol. 57:331-334(1993). 

541. Ribosomal protein S13 signature 

Ribosomal protein S13 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S13 is known to be involved in binding fMet-tRNA and, 
hence, in the initiation of translation. It is a basic protein of 115 to 177 
amino-acid residues and belongs to a family of ribosomal proteins which, on 
the basis of sequence similarities [1,2], groups: - Eubacterial S13. 

- Plant chloroplast S13 (nuclear encoded). - Red algal chloroplast S13. 

- Cyanelle S13. - Archaebacterial S13. - Plant mitochondrial S13. 

- Mammalian and plant SI 8. 

The best conserved regions in these proteins, located in their C-terminal 
part have been selected as a signature pattern. 
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-Consensus pattern: [KRQS]-G-x-R-H-x(2)-[GSNH]-x(2)-[LIVMC]-R-G-Q 
[ 1] Chan Y.-L., Paz V., Wool I.G. 

Biochem. Biophys. Res. Commun. 178:1212-1218(1991). 
[ 2] Otaka E., Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

542. Ribosomal protein S14p/S29e (Ribosomal protein S14 signature) 

Ribosomal protein S14 is one of the proteins from the small ribosomal subunit. In 
Escherichia coli, S14 is known to be required for the assembly of 30S particles and may also 
be responsible for determining the conformation of 16S rRNA at the A site. It belongs to a 
family of ribosomal proteins which, on the basis of sequence similarities [1,2], groups: 

- Eubacterial S14. 

- Algal and plant chloroplast S14. 

- Cyanelle S14. 

- Archaebacterial Methanococcus vannielii SI 4. 

- Plant mitochondrial S14. 

- Yeast mitochondrial MRP2. 

- Mammalian S29. 

- Yeast YS29A/B. 

S14 is a protein of 53 to 115 amino-acid residues. Our signature pattern is based on 
the few conserved positions located in the center of these proteins. 

Consensus pattern: [RP]-x(0,l)-C-x(ll ; 12)-[LIVMF]-x-[LIVMF]-[SC]-[RG]-x(3)-[RN] 

[1] Chan Y.-L., Suzuki K., Olvera J., Wool I.G. Nucleic Acids Res. 21:649-655(1993). 
[2] Otaka E., Hashimoto T., Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

543. Ribosomal protein S15 signature 

Ribosomal protein S15 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, this protein binds to 16S ribosomal RNA and functions at 
early steps in ribosome assembly. It belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1,2], groups: - Eubacterial S15. 
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- Archaebacterial Halobacterium marismortui HmaS15 (HS11). 

- Plant chloroplast S15. - Yeast mitochondrial S28. - Mammalian S13. 

- Brugia pahangi and Wuchereria bancrofti S13 (S15). - Yeast S13 (YS15). 
S 15 is a protein of 80 to 250 ammo-acid residues. 

A conserved region located in the C-terminal part of these proteins has been 
selected as a signature pattern. 

-Consensus pattern: [LIVM]-x(2)-H-[LIVMFY]-x(5)-D-x(2)-[SAGN]-x(3)-[LF]-x(9)- 

[LIVM]-x(2)-[FY] 
[ 1] Dang H. ? Ellis S.R. 

Nucleic Acids Res. 18:6895-6901(1990). 
[ 2] Otaka E., Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

544. Ribosomal protein SI 6 signature 

Ribosomal protein S16 is one of the proteins from the small ribosomal subunit. It 
belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1], 
groups: 

- Eubacterial S16. 

- Algal and plant chloroplast S16. 

- Cyanelle S16. 

- Neurospora crassa mitochondrial S24 (cyt-21). 

S16 is a protein of about 100 amino-acid residues. A conserved region located in the 
N-terminal extremity of these proteins has been selected as a signature pattern. 

Consensus pattern: [LIVMT]-x-[LIVM]-[KR]-L-[STAK]-R-x-G-[AKR] 

[1] Otaka E., Hashimoto T., Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

545. Ribosomal protein S17 signature 

Ribosomal protein S17 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S17 is known to bind specifically to the 5' end of 16S 
ribosomal RNA and is thought to be involved in the recognition of termination 
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codons. It belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities [1,2,3], groups: - Eubacterial S17. 

- Plant chloroplast SI 7 (nuclear encoded). - Red algal chloroplast SI 7. 

- Cyanelle S17. - Archaebacterial S17. - Mammalian and plant cytoplasmic Sll. 

- Yeast S18a and S18b (RP41; YS12). 

The best conserved regions located in the C-terminal sections of these proteins have 
been selected as a signature pattern. 

-Consensus pattern: G-D-x-[LIV]-x-[LIVA]-x-[QEK]-x-[RK]-P-[LIV]-S 
[ 1] Gantt J.S., Thompson M.D. J. Biol. Chem. 265:2763-2767(1990). 
[ 2] Herfurth E., Hirano H., Wittmann-Liebold B. 

Biol. Chem. Hoppe-Seyler 372:955-961(1991). 
[ 3] Otaka E., Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

546. Ribosomal protein S17e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Vertebrates S17 [1]. - Drosophila S17 [2]. - Neurospora crassa S17 (crp-3). 

- Yeast S17a (RP51A) and S17b (RP51B) [3]. - Methanococcus jannaschii MJ0245. 
These proteins have from 63 (in archebacteria) to 130 to 146 amino acids and 

are highly conserved. A region in the central part of these proteins has been selected 
as a signature. 

-Consensus pattern: A-x-I-x-[ST]-K-x-L-R-N-[KR]-l-A-G-[FY]-x-T-H 

[ 1] Chen I.-T., Roufa D.J. Gene 70:107-116(1988). 

[ 2] Maki C, Rhoads D.D., Stewart M.J., van Slyke B., Denell R.E., 

Roufa D.J. Gene 79:289-298(1989). [ 3] Abovich N., Rosbash M. 

Mol. Cell. Biol. 4:1871-1879(1984). 

547. Ribosomal protein S18 signature 

Ribosomal protein S18 is one of the proteins from the small ribosomal subunit. In 
Escherichia coli, S18 has been involved in aminoacyl-tRNA bindingfl]. It appears to be 
situated at the tRNA A-site of the ribosome. It belongs to a family of ribosomal proteins 
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which, on the basis of sequence similarities^], groups: - Eubacterial S18. - Algal and plant 
chloroplast S18. - Cyanelle S18.As a signature pattern, a conserved region in the central 
section of the protein has been selected. This region contains two basic residues which may 
be involved in RNA-binding.- 

Consensus pattern: [IV]-[DY]-Y-x(2)-[LIVMT]-x(2)-[LIVM]-x(2)-[FYT]-[LIVM]- [ST]- 
[DERP]-x-[GY]-K-[LIVM]-x(3)-R-[LIVMAS]- 

[ 1] McDougall J., Choli T., Kruft V., Kapp U., Wittmann-Liebold B. FEBS Lett. 245:253- 
260(1989).[ 2] Otaka E., Hashimoto T., Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

548. Ribosomal protein S19 signature 

Ribosomal protein S19 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S19 is known to form a complex with SI 3 that binds 
strongly to 16S ribosomal RNA. S19 belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1,2], groups: - Eubacterial S19. 

- Algal and plant chloroplast S19. - Cyanelle S19. - Archaebacterial S19. 

- Plant mitochondrial S19. - Eukaryotic S15 ('rig' protein). 

S19 is a protein of 88 to 144 amino-acid residues. Our signature pattern is 
based on the few conserved positions located in the C-terminal section of 
these proteins. 

-Consensus pattern: [STDNQ]-G-[KRQM]-x(6)-[LIVM]-x(4)-[LIVM]-[GSD]-x(2)-[LF]- 

[GAS]-[DE]-F-x(2)-[ST] 
[ 1] Kitagawa M., Takasawa S., Kikuchi N., Itoh T., Teraoka H., Yamamoto H., 

Okamoto H. FEBS Lett. 283:210-214(1991). 
[ 2] Otaka E., Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

549. Ribosomal protein S19e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities [1,2]. One of these families consists 
of: - Mammalian S19. - Drosophila S19. 

- Ascaris lumbricoides S19g (ALEP-1) and S19s. - Yeast YS16 (RP55A and RP55B). 

- Aspergillus S16. - Halobacterium marismortui HS12. 
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These proteins have 143 to 155 amino acids. 

A well conserved stretch of 20 residues in the C-terminal part of these proteins has 
been selected as a signature pattern. 

-Consensus pattern: P-x(6)-[SAN]-x(2)-[LIVMA]-x-R-x-[ALIV]-[LV]-Q-x-L-[EQ] 
[ 1] Etter A., Aboutanos M., Tobler H., Mueller F. 

Proc. Natl. Acad. Sci. U.S.A. 88:1593-1596(1991). 
[ 2] Suzuki K., Olvera J., Wool I.G. Biochimie 72:299-302(1990). 

550. Ribosomal protein S2 signatures 

Ribosomal protein S2 is one of the proteins from the small ribosomal subunit. 
S2 belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1,2], groups: - Eubacterial S2. - Algal and plant chloroplast S2. 

- Cyanelle S2. - Archaebacterial S2. 

- Higher eukaryotes P40 (previously thought to be a laminin receptor). 

- Yeast NAM. - Plant mitochondrial S2. - Yeast mitochondrial MRP4. 
S2 is a protein of 235 to 394 amino-acid residues. 

Two conserved regions have been selected as signature patterns. One is 
located in the N-terminal section and the other in the central section. 

-Consensus pattern: [LIVMFA]-x(2)-[LIVMFYC](2)-x-[STAC]-[GSTANQEKR]-[STALV]- 
[HY]-[LIVMF]-G 

-Consensus pattern: P-x(2)-[LIVMF](2)-[LIVMS]-x-[GDN]-x(3)-[DENL]-x(3)-[LIVM]- 
x-E-x(4)-[GNQKRH]-[LIVM]-[AP] 
[ 1] Davis S.C., Tzagoloff A., Ellis S.R. 

J. Biol. Chem. 267:5508-5514(1992). 
[ 2] Tohgo A., Takasawa S., Munakata H., Yonekura H., Hayashi N., Okamoto H. 

FEBS Lett. 340:133-138(1994). 

551. Ribosomal protein S21 signature 

Ribosomal protein S21 is one of the proteins from the small ribosomal subunit. So far 
S21 has only been found in eubacteria. It is a protein of 55 to 70 amino-acid residues. A 
conserved region in the N-terminal section of the protein has been selected as a signature 
pattern. 
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Consensus pattern: [DE]-x-A-[LIY]-[KR]-R-F-K-[KR]-x(3)-[KR] 

552. Ribosomal protein S21e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of 
sequence similarities. One of these families consists of: - Mammalian S21 [1]. 

- Caenorhabditis elegans S21 (F37C12.11). - Rice S21 [2]. 

- Yeast S21 (Ys25) [3]. - Fission yeast S28 [4]. 
These proteins have 82 to 87 amino acids. 

A perfectly conserved nonapeptide in the N-terminal part of these proteins has 
been selected as a signature pattern. 
-Consensus pattern: L-Y-V-P-R-K-C-S-[SA] 

[ 1] Bhat K.S., Morrison S.G. Nucleic Acids Res. 21:2939-2939(1993). 
[ 2] Nishi R., Hashimoto H., Uchimiya H., Kato A. 

Biochim. Biophys. Acta 1216:113-114(1993).[ 3] Suzuki K., Otaka E. 

Nucleic Acids Res. 16:6223-6223(1988).[ 4] Itoh T., Okata E., Matsui K.A. 

Biochemistry 24:7418-7423(1985). 

553. Ribosomal protein S24e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Vertebrate S24 [1]. - Yeast Rp50. - Mucor racemosus S24 [2]. 

- Halobacterium marismortui HS15 [3]. - Methanococcus jannaschii MJ0394. 
These proteins have 101 to 148 amino acids. 

A well conserved stretch in the central part of these proteins has been selected as 
a signature pattern. 

-Consensus pattern: [FYA]-G-x(2)-[KR]-[STA]-x-G-[FY]-[GA]-x-[LIVM]-Y-[DN]- 
[SDN] 

[ 1] Brown S.J., Jewell A., Maki C.G., Roufa DJ. Gene 91:293-296(1990). 
[ 2] Sosa L., Fonzi W.A., Sypherd P.S. 

Nucleic Acids Res. 17:9319-9331(1989).[ 3] Kimura J., Arndt E., Kimura M. 

FEBS Lett. 224:65-70(1987). 
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554. Ribosomal protein S26e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of 
sequence similarities. One of these families consists of: - Mammalian S26 [1]. 

- Octopus S26 [2]. - Drosophila S26 (DS31) [3]. - Plant cytoplasmic S26. 

- Fungi S26 [4]. 

These proteins have 114 to 127 amino acids. 

A conserved octapeptide in the central part of these proteins has been selected as 
a signature pattern. 

-Consensus pattern: [YH]-C-V-S-C-A-I-H 

[ 1] Kuwano Y., Nakanishi O., Nabeshima Y., Tanaka T., Ogata K. 

J. Biochem. 97:983-992(1985).[ 2] Zinov'eva R.D., Tomarev S.I. 

Dokl. Akad. Nauk SSSR 304:464-469(1989). 
[ 3] Itoh N., Ohta K., Ohta M., Kawasaki T., Yamashina I. 

Nucleic Acids Res. 17:2121-2121(1989).[ 4] Wu M., Tan H. 

Gene 150:401-402(1994). 

555. Ribosomal protein S28e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Mammalian S28 [1]. - Plant S28 [2]. - Fungi S33 [3]. 

- Methanococcus jannaschii MJ1202. 

These proteins have from 64 to 78 amino acids. 

A highly conserved nonapeptide from the C-terminal extremity of these 

proteins has been selected as a signature pattern. 

-Consensus pattern: E-[ST]-E-R-E-A-R-x-L 

[ 1] Chan Y.-L., Olvera J., Wool I.G. 

Biochem. Biophys. Res. Commun. 179:314-318(1991). 
[ 2] Hwang L, Goodman H.M. Plant Physiol. 102:1357-1358(1993). 
[ 3] Hoekstra R., Ferreira P.M., Bootsman T.C., Mager W.H., Planta R.J. 

Yeast 8:949-959(1992). 
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556. Ribosomal protein S3Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Mammalian S3A (was originally known as v-fos transformation effector 
protein). - Caenorhabditis elegans S3A (F56F3.5). 

- Plant cytoplasmic S3A (CYC07) [1]. - Yeast RplO (PLC1 and PLC2). 

- Fission yeast RplO (SpAC13G6.02c). - Methanococcus jannaschii MJ0980. 
These proteins have from 220 to 250 amino acids. 

A conserved stretch in their N-terminal section was selected as a signature pattern. 
-Consensus pattern: [LIV]-x-[GH]-R-[IV]-x-E-x-[SC]-L-x-D-L 
[ 1] Liu J.H., Reid D.M. 
Plant Physiol. 109:338-338(1995). 

557. Ribosomal protein S3 signature 

Ribosomal protein S3 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S3 is known to be involved in the binding of initiator 
Met-tRNA. It belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities [1], groups: - Eubacterial S3. 

- Algal and plant chloroplast S3. - Cyanelle S3. - Archaebacterial S3. 

- Plant mitochondrial S3. - Vertebrate S3. - Insect S3. 

- Caenorhabditis elegans S3 (C23G103). - Yeast S3 (Rpl3). 
S3 is a protein of 209 to 559 amino-acid residues. 

A conserved region located in the C-terminal section has been selected as a signature pattern. 
-Consensus pattern: [GSTA]-[KR]-x(6)-G-x-[LIVMT]-x(2)-[NQSCH]-x(l ? 3)-[LIVFCA]- 

x(3)-[LIV]-[DENQ]-x(7)-[LMT]-x(2)-G-x(2)-G 
[ 1] Otaka E., Hashimoto T. ? Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

558. Ribosomal protein S4 signature 

Ribosomal protein S4 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S4 is known to bind directly to 16S ribosomal RNA. 
Mutations in S4 have been shown to increase translational error frequencies. 
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It belongs to a family of ribosornal proteins which, on the basis of sequence 
similarities [1,2], groups: - Eubacterial S4. - Algal and plant chloroplast S4. 

- Cyanelie S4. - Archaebacterial S4. - Mammalian S9. - Yeast YS11 (SUP45). 

- Marchantia polymorpha mitochondrial S4. - Dictyostelium discoideum rpl024. 
-Yeast protein NAM9 [3]. NAM9 has been characterized as a suppressor for 

ochre mutations in mitochondrial DNA. It could be a ribosornal protein that 

acts as a suppressor by decreasing translation accuracy. 
S4is a protein of 171 to 205 amino-acid residues (except for NAM9 which is 
much larger). The signature pattern for this protein is based on a conserved 
region located in the central section of these proteins. 

-Consensus pattern: [LIVM]-[DE]-x-R-[LI]-x(3)-[LIVMC]-[VMFYHQ]-[KRT]-x(3)- 

[STAGCVF]-x-[ST]-x(3)-[SAI]-[KR]-x-[LIVMF](2) 
[ 1] Mizuta K., Hashimoto T., Suzuki K.L, Otaka E. 

Nucleic Acids Res. 19:2603-2608(1991). 
[ 2] Otaka E., Hashimoto T. ? Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 
[ 3] Boguta M., Dmochowska A., Borsuk P., Wrobel K. ? Gargouri A. ? Lazowska J., 

Slonimski P., Szczesniak B. ? Kruszewska A. 

Mol. Cell. Biol. 12:402-412(1992). 

559. Ribosornal protein S4e signature 

A number of eukaryotic and archaebacterial ribosornal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Mammalian S4 [1]. Two highly similar isoforms of this protein exist : one 
coded by a gene on chromosome Y, and the other on chromosome X. 

- Plant cytoplasmic S4 [2] - Yeast S7 (YS6). - Archebacterial S4e. 
These proteins have 233 to 264 amino acids. 

A highly conserved stretch of 15 residues in their N-terminal section has 
been selected as a signature pattern. Four positions in this region are positively 
charged residues. 

-Consensus pattern: H-x-K-R-[LIVMF]-[SANK]-x-P-x(2)-[WY]-x-[LIVM]-x-[KRP] 
[ 1] Fisher E.M., Beer-Romero P., Brown L.G., Ridley A., McNeil J.A., 
Lawrence J.B., Willard H.F., Bieber F.R., Page D.C. 
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Cell 63:1205-1218(1990). 
[ 2] Braun H.P., Emmermann M. ? Mentzel BL, Schmitz U.K. 
Biochim. Biophys. Acta 1218:435-438(1994). 

560. Ribosomal protein S5 signature 

Ribosomal protein S5 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S5 is known to be important in the assembly and function 
of the 30S ribosomal subunit. Mutations in S5 have been shown to increase 
translational error frequencies. It belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1,2], groups: - Eubacterial S5. 

- Cyanelle S5. - Red algal chloroplast S5. - Archaebacterial S5. 

- Mammalian S2 (LLrep3). - Caenorhabditis elegans S2 (C49H3.11). 

- Drosophila S2. - Plant S2. - Yeast S4 (SUP44). - Fungi mitochondrial S5. 
S5 is a protein of 166 to 254 amino-acid residues. The signature pattern for 
this protein is based on a conserved region, rich in glycine residues, and 
located in the N-terminal section of these proteins. 

-Consensus pattern: G-[KRQ]-x(3)-[FY]-x-[ACV]-x(2)-[LIVMA]-[LIVM]-[AG]-[DN]- 
x(2>G-x-[LIVM]-G-x-[SAG]-x(5,6)-[DEQ]-[LIVMA]-x(2)-A- 
[LIVMF] 

[ 1] All-Robyn J .A., Brown N., Otaka E., Liebman S.W. 
Mol. Cell. Biol. 10:6544-6553(1990).[ 2] Otaka E., Hashimoto T. ? Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

561. Ribosomal protein S6 signature 

Ribosomal protein S6 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S6 is known to bind together with S18 to 16S ribosomal 
RNA. It belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities, groups: - Eubacterial S6. - Red algal chloroplast S6. 
- Cyanelle S6. 

S6 is a protein of 95 to 208 amino-acid residues. The signature pattern for 
this protein is based on a conserved region located in the N-terminal section 
of these proteins. 
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-Consensus pattern: G-x-[KRC]-[DENQRH]-L-[SA]-Y-x-I-[KRNSA] 
562. Ribosomal protein S6e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities. One of these families consists of: 

- Mammalian S6 [1]. - Drosophila S6 [2]. - Plant S6 [3]. - Yeast S10 (YS4). 

- Halobacterium marismortui HS13 [4]. - Methanococcus jannaschii MJ1260. 
S6 is the major substrate of protein kinases in eukaryotic ribosomes [5]; it 
may have an important role in controlling cell growth and proliferation 
through the selective translation of particular classes of mRNA. 

These proteins have 135 to 249 amino acids. 

A conserved stretch of 12 residues in the N-terminal part of these 

proteins has been selected as a signature pattern. 

-Consensus pattern: [LIVM]-[STAMR]-G-G-x-D-x(2)-G-x-P-M 

[ 1] Franco R., Rosenfeld M.G. J. BioL Chem. 265:4321-4325(1990). 

[ 2] Watson K.L., Konrad K.D., Woods D.F., Bryant PJ. 

Proc. Natl. Acad. ScL U.S.A. 89:11302-11306(1992). 
[ 3] Hansen G., Estruch J.J., Spena A. 

Nucleic Acids Res. 20:5230-5230(1992). 
[ 4] Kimura M., Arndt E., Hatakeyama T., Hatakeyama T., Kimura J. 

Can. J. Microbiol. 35:195-199(1989). 
[ 5] Bandi H.R., Ferrari S., Krieg J., Meyer H.E., Thomas G. 

J. Biol. Chem. 268:4530-4533(1993). 

563. Ribosomal protein S7 signature 

Ribosomal protein S7 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S7 is known to bind directly to part of the 3'end of 16S 
ribosomal RNA. It belongs to a family of ribosomal proteins which, on the 
basis of sequence similarities [1,2,3], groups: - Eubacterial S7. 

- Algal and plant chloroplast S7. - Cyanelle S7. - Archaebacterial S7. 

- Plant mitochondrial S7. - Mammalian S5. - Plant S5. 

- Caenorhabditis elegans S5 (T05E11.1). 
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The best conserved region located in the N-terminal section of these proteins has 
been selected as a signature pattern. 

-Consensus pattern: [DENSK]-x-[LIVMDET]-x(3)-[LIVMFTA](2)-x(6)-G-K-[KR]-x(5)- 

[LIVMF]-[LIVMFC]-x(2)-[STAC] 
[ 1] Klussmann S., Franke P., Bergmann U., Kostka S., Wittmann-Liebold B. 

Biol. Chem. Hoppe-Seyler 374:305-312(1993). 
[ 2] Otaka E*, Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 
[ 3] Ignatovich O., Cooper M., Kulesza H.M., Beggs J.D. 

Nucleic Acids Res. 23:4616-4619(1995). 

564. Ribosomal protein S7e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence 
similarities [1], One of these families consists of: 

Mammalian S7. 

Xenopus S8. 

- Insect S7. 

- Yeast probable ribosomal protein S7 (N2212). 

- Fission yeast probable ribosomal protein S7 (SpAC18G6.13c). 

These proteins have about 200 amino acids. A highly conserved stretch of 14 residues which 
is located in the central section and which is rich in charged residues was selected as a 
signature pattern. 

Consensus pattern: [KR]-L-x-R-E-L-E-K-K-F-[SAP]-x-[KR]-H 

[1] Salazar C.E., Mills-Hamm D.M., Kumar V., Collins F.H. Nucleic Acids Res. 21:4147- 
4147(1993). 

565. Ribosomal protein S8 signature 

Ribosomal protein S8 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S8 is known to bind directly to 16S ribosomal RNA. It 
belongs to a family of ribosomal proteins which, on the basis of sequence 
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similarities [1], groups: - Eubacterial S8. - Algal and plant chloroplast S8. 

- Cyanelle S8. - Archaebacterial S8. - Marchantia polymorpha mitochondrial S8. 

- Mammalian S15A. - Plant S15A. - Yeast S22 (S24). 

The best conserved region located in the C-terminal section of these proteins 
has been selected as a signature pattern. 

-Consensus pattern: [GE]-x(2)-[LIV](2)-[STY]-[ST]~x(2)-G-[LIVM](2)-x(4)-[AG]- 

[KRHAYI] 
[ 1] Otaka E., Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

566. Ribosomal protein S8e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped 
on the basis of sequence similarities [1]. One of these families consists of: 

- Mammalian S8. - Caenorhabditis elegans S8 (F42C5.8). - Leishmania major S8. 

- Plant S8. - Yeast S8 (S14) (Rpl9). - Archebacterial S8e. 

These proteins have either about 220 amino acids (in eukaryotes) or about 125 
amino acids (in archebacteria). A conserved stretch which is located in the 
N-terminal section and which is rich in positively charged residues has 
been selected as a signature pattern. 

-Consensus pattern: [KR]-x(2)-[ST]-G-[GA]-x(5)-[HR]-[KG]-[KR]-x-K~x-E-[LM]-G 
[ 1] Engemann S., Herfurth E. ? Briesemeister IL, Wittmann-Liebold B. 
J. Protein Chem. 14:189-195(1995). 

567. Ribosomal protein S9 signature 

Ribosomal protein S9 is one of the proteins from the small ribosomal subunit. 
It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1,2], groups: - Eubacterial S9. - Algal chloroplast S9. 

- Cyanelle S9. - Archaebacterial S9. - Mammalian S16. - Plant S16. 

- Yeast mitochondrial ribosomal S9. 

A conserved region containing many charged residues and located in the 
central section of these proteins has been selected as a signature pattern. 
-Consensus pattern: G-G-G-x(2)-[GSA]-Q-x(2)-[SA]-x(3)-[GSA]-x-[GSTAV]-[KR]- 



Attorney No. 2750-1237P 

476 

[GSAL]-[LIF] 

[ 1] Chan Y.-L, Paz V., Olvera J. ? Wool I.G. FEBS Lett. 263:85-88(1990). 
[ 2] Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

5 

568. Ribulose-phosphate 3-epimerase family signatures 

Ribulose-phosphate 3-epimerase (EC 5.1.3.1) (also known as pentose-5-phosphate 
3-epimerase or PPE) is the enzyme that converts D-ribulose 5-phosphate into 

10 D-xylulose 5-phosphate in Calvin's reductive pentose phosphate cycle. In 
Alcaligenes eutrophus two copies of the gene coding for PPE are known [1], 
one is chromosomally encoded (cbbEC), the other one is on a plasmid (cbbeP). 
PPE has been found in a wide range of bacteria, archebacteria, fungi and 
plants. The sequence of PPE is highly related to: 

1 5 - Escherichia coli D-allulose-6-phosphate 3-epimerase (gene alsE). 

- Escherichia coli protein sgcE. 

- Mycoplasma genitalium hypothetical protein MG112. 

All these proteins have from 209 to 241 amino acid residues. 
Two conserved regions which are located respectively in the N-terminal and in the 
2 0 central part of these proteins have been selected as signature patterns. 

-Consensus pattern: [LIVMF]-H-[LIVMFY]-D-[LIVM]-x-D-x(l,2)-[FY]-[LIVM]-x-N-x- 

[STAV] 

-Consensus pattern: [LIVMA]-x-[LIVM]-M-[ST]-[VS]-x-P-x(3)-G-Q-x-F-x(6)-[NK]- 
[LIVMC] 

2 5 [1] Kusian B. ? Yoo J.G., Bednarski R., Bowien B. 
J. Bacteriol. 174:7337-7344(1992). 

569. (Ricin B lectin) Similarity to lectin domain of ricin beta-chain, 3 copies. 

30 

This family consists of a triplicated domain involved in 
cell agglutination in ricin. 
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570. (Rotamase) PpiC-type peptidyl-prolyl cis-trans isomerase signature 
Peptidyl-prolyl cis-trans isomerase (EC 5.2.1.8) (PPIase or rotamase) is an 
enzyme that accelerates protein folding by catalyzing the cis-trans 
isomerization of proline imidic peptide bonds in oligopeptides [1], Most 
characterized PPiases belong to two families, the cyclophilin-type (see 
<PDOC00154>) and the the FKBP-type (see <PDOC00426>). Recently a third family 
has been discovered [2,3]. So far, the only biochemically characterized 
member of this family is the Escherichia coli protein parvulin (gene ppiC), a 
small (92 residues) cytoplasmic enzyme that prefers amino acid residues with 
hydrophobic side chains like leucine and phenylalanine in the PI position of 
the peptides substrates. PpiC is evolutionary related to a number of proteins 
that are also probably PPiases: 

- Escherichia coli and Haemophilus influenzae ppiD. PpiD is a PPIase which 
contains a periplasmic ppiC-like domain anchored to the inner membrane and 
which seems to be involved in the folding of outer membrane proteins. 

-Escherichia coli surA. SurA is a periplasmic protein that contains two 
ppiC-like domains. 

- Nitrogen-assimilating bacteria protein nifM which is involved in the 
activation and stabilization of the iron-component (nifH) of nitrogenase. 

- Bacillus subtilis protein prsA, a membrane-bound lipoprotein involved in 
protein export. 

- Lactococcus and lactobacillus protease maturation protein prtM, a membrane- 
bound lipoprotein involved in the maturation of a secreted serine 
proteinase. - Yeast protein ESS1/PTF1 (processing/termination factor 1). 

- Drosophila protein dodo (gene dod). - Mammalian protein PIN1, 

- Campylobacter jejuni cell binding factor 2 (CBF2), a secreted antigen. 

- Bacillus subtilis hypothetical protein yacD. 

- Helicobacter pylori hypothetical protein HP0175. 

- A hypothetical slime mold protein. 

A conserved region that contains a serine which could play a role in the catalytic 

mechanism of these enzymes has been selected as a signature pattern. 

-Consensus pattern: F-[GSADEI]-x-[LVAQ]-A-x(3)-[ST]-x(3 ? 4)-[STQ]-x(3 ? 5)-[GER]- 

G-x-[LIVM]-[GS] 
[ 1] Fischer G. ? Schmid F.X. 
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Biochemistry 29:2205-2212(1990). 
[ 2] Rudd K.E., Sofia H.J., Koonin E.V., Plunkett G. Ill, Lazar S., 

Rouviere P.E. Trends Biochem. Sci. 20:14-15(1995). 
[ 3] Rahfeld J.-U., Ruecknagel K.P., Schelbert B., Ludwig B., Hacker J., 

Mann K., Fischer G. FEBS Lett. 352:180-184(1994). 

571. (RrnaAD) Ribosomal RNA adenine dimethylases signature 
A number of enzymes responsible for the dimethylation of adenosines if 
ribosomal RNAs (EC 2.1.1.48) have been found [1,2] to be evolutionary related. 
These enzymes are: 

- Bacterial 16S rRNA dimethylase (gene ksgA), which acts in the biogenesis 
of ribosomes by catalyzing the dimethylation of two adjacent adenosines in 
the loop of a conserved hairpin near the 3'-end of 16S rRNA. Inactivation 
of ksgA leads to resistance to the aminoglycoside antibiotic kasugamycin. 

- Yeast 18S rRNA dimethylase (gene DIM1), which is functionally similar to 
ksgA and that dimethylates twin adenosines in the 3'-end of 18S rRNA. 

- Bacterial 'erm' methylases. These enzymes confer resistance to macrolide- 
lincosamide-streptogramin B (MLS) antibiotics - such as erythromycin - by 
dimethylating the adenine residue at position 2058 of 23S rRNA thus 
resulting in a reduced affinity between ribosomes and the MLS antibiotics. 

- Caenorhabditis elegans hypothetical protein E02H1.1. 

The best conserved regions in these enzymes is located in the N-terminal 
section and corresponds to a region that is probably involved in S-adenosyl 
methionine (SAM) binding. 

-Consensus pattern: [LIVM]-[LIVMFY]-[DE]-x-G-[STAPV]-G-x-[GA]-x-[LIVMF]-[ST]- 

x(2)-[LIVM]-x(6)-[LIVMY]-x-[STAGV]-[LIVMFYHC]-E-x-D 
[ 1] van Gemen B., van Knippenberg P.H. 

(In) Nucleic acid methylation, Clawson G.A., Willis D.B., Weissbach A., 

Jones P.A., Eds., pp.19-36, Alan R. Liss Inc, New- York, (1990). 
[ 2] Lafontaine D., Delcour J., Glasser A.L., Desgres J., Vandenhaute J. 

J. Mol. Biol. 241:492-497(1994). 
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572. (RuBisCO small) Ribulose bisphosphate carboxylase, small chain. 206 members 

573. ATP/GTP-binding site motif A (P-loop) (ras) 

From sequence comparisons and crystallographic data analysis it has been shown 
[1,2,3,4,5,6] that an appreciable proportion of proteins that bind ATP or GTP share a number 
of more or less conserved sequence motifs. The best conserved of these motifs is a glycine- 
rich region, which typically forms a flexible loop between a beta-strand and an alpha-helix. 
This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is 
generally referred to as the 'A* consensus sequence [1] or the T-loop' [5]. There are numerous 
ATP- or GTP-binding proteins in which the P-loop is found. A number of protein families for 
which the relevance of the presence of such a motif has been noted are listed below: - ATP 
synthase alpha and beta subunits. - Myosin heavy chains. - Kinesin heavy chains and kinesin- 
like proteins. - Dynamins and dynamin-like proteins - Guanylate kinase - Thymidine kinase (- 
Thymidylate kinase. - Shikimate kinase. - Nitrogenase iron protein family (nifH/frxC) - ATP- 
binding proteins involved in 'active transport' (ABC transporters) [7] - DNA and RNA 
helicases [8,9,10]. - GTP-binding elongation factors (EF-Tu, EF-lalpha, EF-G, EF-2, etc.). - 
Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, Yptl, SEC4, etc.). - Nuclear protein 
ran. - ADP-ribosylation factors family - Bacterial dnaA protein - Bacterial recA protein - 
Bacterial recF protein - Guanine nucleotide-binding proteins alpha subunits (Gi, Gs, Gt, GO, 
etc.). - DNA mismatch repair proteins mutS family - Bacterial type II secretion system 
protein E. Not all ATP- or GTP-binding proteins are picked-up by this motif. A number of 
proteins escape detection because the structure of their ATP-binding site is completely 
different from that of the P-loop. Examples of such proteins are the E1-E2 ATPases or the 
glycolytic kinases. In other ATP- or GTP-binding proteins the flexible loop exists in a 
slightly different form; this is the case for tubulins or protein kinases. A special mention must 
be reserved foradenylate kinase, in which there is a single deviation from the P-loop pattern: 
in the last position Gly is found instead of Ser or Thr. 
Consensus pattern: [AG]-x(4)-G-K-[ST] 

In addition to the proteins listed above, the A' motif is also found in a number of other 
proteins. Most of these proteins probably bind a nucleotide, but others are definitively not 
ATP- or GTP-binding (as for example chymotrypsin, or human ferritin light chain). 
[ 1] Walker J.E., Saraste M., Runswick M.J., Gay N.J. EMBO J. 1:945-951(1982).[ 2] Moller 
W., Anions R. FEBS Lett. 186:1-7(1985).[ 3] Fry D.C., Kuby S.A., Mildvan A.S. Proc. Natl. 
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Acad. Sci. U.S.A. 83:907-91 1(1986).[ 4] Dever T.E., Glynias M.J., Merrick W.C. Proc. Natl. 
Acad. Sci. U.S.A. 84:1814-1818(1987).[ 5] Saraste M., Sibbald P.R., Wittinghofer A. Trends 
Biochem. Sci. 15:430-434(1990).[ 6] Koonin E.V. J. Mol. Biol. 229:1165-1174(1993).[ 7] 
Higgins C.F., Hyde S.C., Mimmack M.M., Gileadi U., Gill D.R., Gallagher M.P. J. Bioenerg. 
Biomembr. 22:571-592(1990).[ 8] Hodgman T.C. Nature 333:22-23(1988) and Nature 
333:578-578(1988) (Errata).[ 9] Under P., Lasko P., Ashburner M., Leroy P., Nielsen P.J., 
Nishi BL, Schnier J., Slonimski P.P. Nature 337:121-122(1989).[10] Gorbalenya A.E., 
Koonin E.V., Donchenko A.P., Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 

GTP-binding nuclear protein ran signature (ras) 

Ran (or TC4) is a small abundant nuclear protein that binds and hydrolyzes GTP and which 
has been implicated in a large number of processes including nucleocytoplasmic transport, 
RNA synthesis, processing and export and cell cycle checkpoint control [1,2]. Ran is 
generally included in the RAS 'superfamily' of small GTP-binding proteins [3], but it is only 
slightly related to the other RAS proteins. It also differs from RAS proteins in that it lacks 
cysteine residues at its C- terminal and is therefore not subject to prenylation. Instead ran has 
an acidic C-terminus. It is, however similar to RAS family members in requiring a specific 
guanine nucleotide exchange factor (GEF) and a specific GTPase activating protein (GAP) as 
stimulators of overall GTPase activity. The region of the GTP-binding B motif which, in ran, 
is perfectly conserved has been selected as a signature pattern. 

Consensus pattern: D-T-A-G-Q-E-K-[LF]-G-G-L-R-[DE]-G-Y-Y- Proteins belonging to this 
family also contain a copy of the ATP/GTP- binding motif A' (P-loop). 
[ 1] Scheffzek K., Klebe C, Fritz-Wolf K., Kabsch W., Wittinghofer A. Nature 374:378- 
381(1995).[ 2] Rush M.G., Drivas G., d'Eustachio P. BioEssays 18:103-112(1996).[ 3] 
Valencia A., Chardin P., Wittinghofer A., Sander C. Biochemistry 30:4637-4648(1991). 

574. recA signature 

The bacterial recA protein [1,2,3,E1] is essential for homologous recombination and 
recombinational repair of DNA damage. RecA has many activities: it filaments, it binds to 
single- and double-stranded DNA, itbinds and hydrolyzes ATP, it is also a recombinase and, 
finally, it interacts with lexA causing its activation and leading to its autocatalytic cleavage. 
RecA is a protein of about 350 amino-acid residues. Its sequence is very well conserved 
[3,4,5 ,E1] among eubacterial species. It is also found in the chloroplast of plants [6]. The best 
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conserved region, a nonapeptide located in the middle of the sequence which is part of the 
monomer-monomer interface in a recA filament has been selected as a signature pattern,. 
Consensus pattern: A-L-[KR]-[IF]-[FY]-[STA]-[STAD]-[LIVMQ]-R- 
[ 1] Smith K.C., Wang T.-C. V. BioEssays 10:12-16(1989).[ 2] Lloyd A.T., Sharp P.M. J. 
Mol. Evol. 37:399-407(1993).[ 3] Roca A.I., Cox M.M. Prog. Nucleic Acids Res. Mol. Biol. 
56:129-223(1997).[ 4] Karlin S., Weinstock G.M., Brendel V. J. Bacteriol. 177:6881- 
6893(1995).[ 5] Eisen J.A. J. Mol. Evol. 41:1105-1123(1995).[ 6] Cerutti H.D., Osman M., 
Grandoni P., Jagendorf A.T. Proc. Natl. Acad. Sci. U.S.A. 89:8068-8072(1992).[El] 
http://www.tigr.org/~jeisen/RecA/RecA.html 

575. Response regulator receiver domain 

This domain receives the signal from the sensor partner inComment: bacterial two- 
component systems. It is usually found N-terminalComment: to a DNA binding effector 
domain. 

[1] Pao GM, Saier MH; J Mol Evol 1995;40:136-154. 

576. Ribonucleotide reductase large subunit signature 

* Ribonucleotide reductase (EC 1.17.4.1 ) [1,2] catalyzes the reductive synthesis of 
deoxyribonucleotides from their corresponding ribonucleotides. It provides the precursors 
necessary for DNA synthesis. Ribonucleotide reductase is an oligomeric enzyme composed 
of a large subunit (700 to 1000 residues) and a small subunit (300 to 400 residues). There are 
regions of similarities in the sequence of the large chain from prokaryotes, eukaryotes and 
viruses. One of these regions has been selected as a signature pattern. 
Consensus pattern: W-x(2)-[LF]-x(6,7)-G-[LIVM]-[FYRA]-[NH]-x(3)-[STAQLIVM]- 
[ASC]-x(2)-[PA]- 

[ 1] Nillson O., Lundqvist T., Hahne S., Sjoberg B.-M. Biochem. Soc. Trans. 16:91- 
94(1988).[ 2] Reichard P. Science 260:1773-1777(1993). 

577. Ribonuclease T2 family histidine active sites 

The fungal ribonucleases T2 from Aspergillus oryzae, M from Aspergillus saitoiand Rh from 
Rhizopeus niveus are structurally and functionally related 30 Kdglycoproteins [1] that cleave 
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the 3'-5' internucleotide linkage of RNA via a nucleotide 2',3'-cyclic phosphate intermediates 
(EC 3.1.27.1 ).A number of other RNAses have been found to be evolutionary related to these 
fungal enzymes: - Self-incompatibility [2] in flowering plants is often controlled by a single 
gene (S-gene) that has several alleles. This gene prevents fertilization by self-pollen or by 
pollen bearing either of the two S- alleles expressed in the style. The self-incompatibility 
glycoprotein from several higher plants of the solanaceae family has been shown [2,3] to be a 
ribonuclease. - Phosphate-starvation induced RNAses LE and LX from tomato [4]. These two 
enzymes are probably involved in a phosphate-starvation rescue system. - Escherichia coli 
periplasmic RNAse I (EC 3.1.27.6) (gene rna) [5]. - Aeromonas hydrophila periplasmic 
RNAse. - Haemophilus influenzae hypothetical protein HI0526.Two histidines residues have 
been shown [6,7] to be involved in the catalytic mechanism of RNase T2 and Rh. These 
residues and the region around them arehighly conserved in all the sequence described above. 
Two signature patterns have been developed, one for each of the two active-site histidines. 
The second pattern also contains a cysteine which is known to be involved in a disulfide 
bond. 

Consensus pattern: [FYWL]-x-[LIVM]-H-G-L-W-P [H is an active site residue] 
Consensus pattern: [LIVMF]-x(2)-[HDGTY]-[EQ]-[FYW]-x-[KR]-H-G-x-C [H is an active 
site residue] [C is involved in a disulfide bond] 

[ 1] Watanabe H., Naitoh A., Suyama Y., Inokuchi N., Shimada H., Koyama T., Ohgi K., Irie 
M. J. Biochem. 108:303-310(1990).[ 2] Haring V., Gray J.E., McClure B.A., Anderson M.A., 
Clarke A.E. Science 250:937-941(1990).[ 3] McClure B.A., Haring V., Ebert P.R., Anderson 
M.A., Simpson R.J., Sakiyama F., Clarke A.E. Nature 342:95957(1989).[ 4] Loeffler A., 
Glund K., Irie M. Eur. J. Biochem. 214:627-633(1993).[ 5] Meador J. Ill, Kennell D. Gene 
95:1-7(1990).[ 6] Kawata Y., Sakiyama F., Hayashi F., Kyogoku Y. Eur. J. Biochem. 
187:255-262(1990).[ 7] Kurihara H., Mitsui Y., Ohgi K., Irie M., Mizuno H., Nakamura K.T. 
FEBS Lett. 306:189-192(1992). 

578. Ribonucleotide reductase large subunit signature. Ribonucleotide reductase (EC 
1.17.4.1 ) [1,2] catalyzes the reductive synthesis of deoxyribonucleotides from their 
corresponding ribonucleotides. It provides the precursors necessary for DNA synthesis. 
Ribonucleotide reductase is an oligomeric enzyme composed of a large subunit (700 to 1000 
residues) and a small subunit (300 to 400 residues). There are regions of similarities in the 
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sequence of the large chain from prokaryotes, eukaryotes and viruses. One of these regions 
has been developed as a signature pattern. 

Consensus pattern: W-x(2)-[LF]-x(6,7)-G-[LIVM]-[FYRA]-[NH]-x(3)-[STAQLIVM]- 
[ASC]-x(2)-[PA]- 

[ 1] Nillson O., Lundqvist T., Hahne S., Sjoberg B.-M. Biochem. Soc. Trans. 16:91- 
94(1988).[ 2] Reichard P. Science 260:1773-1777(1993). 

579. RNase H 

RNase H digests the RNA strand of an RNA/DNA hybrid. Important enzyme in retroviral 
replication cycle, and often found as a domain associated with reverse transcriptases. 
Structure is a mixed alpha+beta fold with three a/b/a layers. 

580. Eukaryotic putative RNA-binding region RNP-1 signature (rrm) 

Many eukaryotic proteins that are known or supposed to bind single-strandedRNA contain 
one or more copies of a putative RNA-binding domain of about 90amino acids [1,2]. This 
region has been found in the following proteins: ** Heterogeneous nuclear 
ribonucleoproteins ** - hnRNP Al (helix destabilizing protein) (twice). - hnRNP A2/B1 
(twice). - hnRNP C (C1/C2) (once). - hnRNP E (UP2) (at least once). - hnRNP G (once). ** 
Small nuclear ribonucleoproteins ** - Ul snRNP 70 Kd (once). - Ul snRNP A (once). - U2 
snRNP B" (once). ** Pre-RNA and mRNA associated proteins ** - Protein synthesis 
initiation factor 4B (eIF-4B) [3], a protein essential for the binding of mRNA to ribosomes 
(once). - Nucleolin (4 times). - Yeast single-stranded nucleic acid-binding protein (gene 
SSB1) (once). - Yeast protein NSR1 (twice). NSR1 is involved in pre-rRNA processing; it 
specifically binds nuclear localization sequences. - Poly(A) binding protein (PABP) (4 
times). ** Others ** - Drosophila sex determination protein Sex-lethal (Sxl) (twice). - 
Drosophila sex determination protein Transformer-2 (Tra-2) (once). - Drosophila 'elav' 
protein (3 times), which is probably involved in the RNA metabolism of neurons. - Human 
paraneoplastic encephalomyelitis antigen HuD (3 times) [4], which is highly similar to elav 
and which may play a role in neuron-specific RNA processing. - Drosophila t»icoid' protein 
(once) [5], a segment-polarity homeobox protein that may also bind to specific mRNAs. - La 
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antigen (once), a protein which may play a role in the transcription of RNA polymerase III. - 
The 60 Kd Ro protein (once), a putative RNP complex protein. - A maize protein induced by 
abscisic acid in response to water stress, which seems to be a RNA-binding protein. - Three 
tobacco proteins, located in the chloroplast [6], which may be involved in splicing and/or 
processing of chloroplast RNAs (twice). - X16 [7], a mammalian protein which may be 
involved in RNA processing in relation with cellular proliferation and/or maturation. - 
Insulin-induced growth response protein Cl-4 from rat (twice). - Nucleolysins TIA-1 and 
TIAR (3 times) [8] which possesses nucleolytic activity against cytotoxic lymphocyte target 
cells, may be involved in apoptosis. - Yeast RNA15 protein, which plays a role in mRNA 
stability and/or poly-(A) tail length [9] .Inside the putative RNA-binding domain there are two 
regions which are highly conserved. The first one is a hydrophobic segment of six residues 
(which is called the RNP-2 motif), the second one is an octapeptide motif (which is called 
RNP-1 or RNP-CS). The position of both motifs in the domain is shown in the following 
schematic representation: 

xxxxxxx######xxxxxxxxxxxxxxxxxxxxxxxxxxxxx########xxxxxxxxxxxxxxxxxxxxxxxxx 
RNP-2 RNP-1 

The RNP-1 motif has been used as a signature pattern for this type of domain. 

Consensus pattern: [RK]-G-{EDRKHPCG}-[AGSCI]-[FY]-[LIVA]-x-[FYLM] In most cases 

the residue in position 3 of the pattern is either Tyr or Phe. 

[ 1] Bandziulis R.J., Swanson M.S., Dreyfuss G. Genes Dev. 3:43 1-437(1989). [ 2] Dreyfuss 

G. , Swanson M.S., Pinol-Roma S. Trends Biochem. Sci. 13:86-91(1988).[ 3] Milburn S.C., 
Hershey J.W.B., Davies M.V., Kelleher K., Kaufman R J. EMBO J. 9:2783-2790(1990).[ 4] 
Szabo A., Dalmau J., Manley G., Rosenfeld M., Wong E., Henson J., Posner J.B., Furneaux 

H. M. OH 67:325-333ri99D. f 5] Rebagliati M. Cell 58:231 -232(1989).[ 6] Li Y., Sugiura M. 
EMBO J. 9:3059-3066(1990).[ 7] Ayane M., Preuss U., Koehler G., Nielsen P.J. Nucleic 
Acids Res. 19:1273-1278(1991).[ 8] Kawakami A., Tian Q., Duan X., Streuli M., Schlossman 
S.F., Anderson P. Proc. Natl. Acad. Sci. U.S.A. 89:8681-8685(1992).[ 9] Minvielle-Sebastia 
L., Winsor B., Bonneaud N., Lacroute F. Mol. Cell. Biol. 11:3075-3087(1991). 



581. Rubredoxin signature 



Attorney No. 2750-1237P 

485 

Rubredoxins [1] are small electron-transfer prokaryotic proteins. They contain an iron 
atom which is ligated by four cysteine residues. Rubredoxins are, in some cases, 
functionally interchangeable with ferredoxins. 

A conserved region that includes two of the cysteine residues that bind the iron atom 
has been selected as a pattern for these proteins. 

Consensus pattern: [LIVM]-x(3)-W-x-C-P-x-C-[AGD] [The two Cs bind the iron 

atom] 

In Pseudomonas oleovorans rubredoxin 2 (gene alkG) [2], this pattern is found twice because 
alkG has two rubredoxin domains. 

Rubrerythrin [3], a protein with inorganic pyrophosphatase activity from Desulfovibrio 
vulgaris possesses a C-terminal rubredoxin-like domain, but this domain is too divergent to 
be detected by the above pattern. 

[ 1] Berg J.M., Holm R.H.(In) Iron-sulfur proteins, Spiro T.G., Ed., ppl-66, Wiley, 
New- York, (1982). [ 2] Kok M, Oldenhuis R., der Linden M.P.G., Meulenberg C.H.C., 
Kingma J., Witholt B. ? J. Biol. Chem. 264:5442-5451(1989). [ 3] van Beeumen J.J., van 
Driessche G., Liu M.-Y., Le Gall L, J. Biol. Chem. 266:20645-20653(1991). 

582. (rvp) Eukaryotic and viral aspartyl proteases active site 

Aspartyl proteases, also known as acid proteases, (EC 3.4.23.-) are a widely distributed 
family of proteolytic enzymes [1,2,3] known to exist invertebrates, fungi, plants, retroviruses 
and some plant viruses. Aspartate proteases of eukaryotes are monomeric enzymes which 
consist of two domains. Each domain contains an active site centered on a catalytic aspartyl 
residue. The two domains most probably evolved from the duplication of an ancestral gene 
encoding a primordial domain. Currently known eukaryotic aspartyl proteases are: - 
Vertebrate gastric pepsins A and C (also known as gastricsin). - Vertebrate chymosin 
(rennin), involved in digestion and used for making cheese. - Vertebrate lysosomal cathepsins 
D (EC 3.4.23.5) and E (EC 3.4.23.34) . - Mammalian renin (EC 3.4.23.15) whose function is 
to generate angiotensin I from angiotensinogen in the plasma. - Fungal proteases such as 
aspergillopepsin A (EC 3.4.23.18 V candidapepsin (EC 3.4.23.24), mucoropepsin (EC 
3.4.23.23 ^ (mucor rennin), endothiapepsin (EC 3.4.23.22 ), polyporopepsin (EC 3.4.23.29 ), 
and rhizopuspepsin (EC 3.4.23.21 V - Yeast saccharopepsin (EC 3.4.23.25) (proteinase A) 
(gene PEP4). PEP4 is implicated in posttranslational regulation of vacuolar hydrolases. - 
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Yeast barrier pepsin (EC 3.4.23.35 ) (gene BAR1); a protease that cleaves alpha-factor and 
thus acts as an antagonist of the mating pheromone. - Fission yeast sxal which is involved in 
degrading or processing the mating pheromones. Most retroviruses and some plant viruses, 
such as badnaviruses, encode for anaspartyl protease which is an homodimer of a chain of 
about 95 to 125 amino acids. In most retroviruses, the protease is encoded as a segment of a 
polyprotein which is cleaved during the maturation process of the virus. It is generally part of 
the pol polyprotein and, more rarely, of the gagpolyprotein. Conservation of the sequence 
around the two aspartates of eukaryotic aspartyl proteases and around the single active site of 
the viral proteases allows us to develop a single signature pattern for both groups of protease. 
Consensus pattern: [LIVMFGAC]-[LIVMTADN]-[LIVFSA]-D-[ST]-G-[STAV]- 
[STAPDENQ]- x-[LIVMFSTNC]-x-[LIVMFGTA] [D is the active site residue] - 
[ 1] Foltmann B. Essays Biochem. 17:52-84(1981).[ 2] Davies D.R. Annu. Rev. Biophys. 
Chem. 19:189-215(1990).[ 3] Rao J.K.M., Erickson J.W., Wlodawer A. Biochemistry 
30:4663-4671(1991).[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:105-120(1995). 

583. (rvt) Reverse transcriptase (RNA-dependent DNA polymerase) 

A reverse transcriptase gene is usually indicative of a mobile element such as a 
retrotransposon or retrovirus. Reverse transcriptases occur in a variety of mobile elements, 
including retrotransposons, retroviruses, group II introns, bacterial msDNAs, hepadnaviruses, 
and caulimoviruses. Number of members: 1233 

[1] Medline: 91006031. Origin and evolution of retroelements based upon their reverse 
transcriptase sequences. Xiong Y, Eickbush TH; EMBO J 1990;9:3353-3362. 

584. (S-AdoMet synt) S-adenosylmethionine synthetase signatures 

S-adenosylmethionine synthetase (EC 2.5.1.6 ) is the enzyme that catalyzes theformation of S- 
adenosylmethionine (AdoMet) from methionine and ATP [1]. AdoMet is an important methyl 
donor for transmethylation and is also the propylamino donor in polyamine biosynthesis. In 
bacteria there is a single isoform of AdoMet synthetase (gene metK), there are two in 
budding yeast (genes SAM1 and SAM2) and in mammals while in plants there is generally a 
multigene family .The sequence of AdoMet synthetase is highly conserved throughout 
isozymes and species. Two signature patterns have been selected for this type of enzyme; the 
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first is a hexapeptide which seems to be involved in ATP-binding; the second is an almost 
perfectly conserved glycine-rich nonapeptide. 

Consensus pattern: G-A-G-D-Q-G-x(3)-G~[FYH]-Sequences known to belong to this class 
detected by the pattern: 

Consensus pattern: G-[GA]-G-[ASC]-F-S-x-K-[DE] 

[ 1] Horikawa S., Sasuga J., Shimizu K. ? Ozasa H., Tsukada K. J. Biol. Chem. 265:13683- 
13686(1990). 

585. SI RNA binding domain 

The SI domain occurs in a wide range of RNAComment: associated proteins. It is 
structurally similarComment: to cold shock protein which binds nucleic acids.Comment: The 
SI domain has an OB-fold structure. 

[1 j Bycroft M, Hubbard TJ, Proctor M, Freund SM, Murzin AG; Cell 1997;88:235-242. 

586. SAICAR synthetase signatures 

Phosphoribosylaminoimidazole-succinocarboxamide synthase (EC 6.3.2.6) 

(SAICARsynthetase) catalyzes the seventh step in the de novo purine biosynthetic pathway; 

the ATP-dependent conversion of 5 , -phosphoribosyl-5-aminoimidazole-4-carboxylic acid and 

aspartic acid to SAICAR [1]. In bacteria (gene purC),fungi (gene ADE1) and plants, 

SAICAR synthetase is a monofunctional protein;in higher vertebrates it is the N-terminal 

domain of a bifunctional enzyme that also catalyze phosphoribosylaminoimidazole 

carboxylase (AIRC) activity. Two conserved regions in the central section of this enzyme 

have been selected as signature patterns for SAICAR synthetase. 

Consensus pattern: [LIVMF](2)-P-[LIVM]-E-x-[LIVM]-[LIVMCA]-R-x(3)-[TA]-G-S- 

Consensus pattern: [LIVM]-[LIVMA]-D-x-K-[LIVMFY]-E-F-G 

[ 1] Zalkin H., Dixon J.E. Prog. Nucleic Acid Res. Mol. Biol. 42:259-287(1992). 

587. (SCP) Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 signatures 

A variety of extracellular proteins from eukaryotes have been found to be evolutionary 
related: - Rodent sperm-coating glycoprotein (SCP), also known as acidic epididymal 
glycoprotein (AEG) . This protein is thought to be involved in sperm maturation [1]. It is a 
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protein of about 220 residues and probably contains eight disulfide bonds. - Mammalian 
testis-specific protein Tpx-1 [2]. Tpx-1 is highly related to SCP's. - Mammalian glioma 
pathogenesis-related protein (GliPR). - Lizard helothermine, a toxin that blocks ryanodine 
receptors. - Venom allergen 5 (Ag5) from vespid wasps and venom allergen 3 (Ag3) from 
fire ants. These proteins are potent allergens and are the main cause of allergic reactions to 
stings from insects of the hymenoptera family [3]. Ag5/3 are proteins of about 200 residues 
and contain four disulfide bonds. - Plant pathogenesis proteins of the PR-1 family [4]. These 
proteins are synthesized during pathogen infection or other stress-related responses. They are 
proteins of about 130 to 140 residues and probably contain three disulfide bonds. - Proteins 
Sc7 and Scl4 from the basidomycete fungus Schizophyllum commune. These extracellular 
proteins are loosely associated with fruit body hyphal walls [5]. Sc7/14 are proteins of about 
180 residues and probably contain two disulfide bonds. - Ancylostoma secreted protein from 
dog hookworm. - Yeast hypothetical proteins YJL078c, YJL079c and YKR013w.The exact 
function of these proteins is not yet known. Two conserved regions located in their C- 
terminal half have been selected as signature patterns. The second signature contains a 
cysteine which is known to be involved in a disulfide bond in Ag5. 
Consensus pattern: [GDER]-H-[FYWH]-T-Q-[LIVM](2)-W-x(2)-[STN] 
Consensus pattern: [LIVMFYH]-[LIVMFY]-x-C-[NQRHS]-Y-x-[PARH]-x-[GL]-N- 
[LIVMFYWDN] [C is involved in a disulfide bond] 

[ 1] Mizuki N., Kasahara M. Mol. Cell. Endocrinol. 89:25-32(1992).[ 2] Kasahara M., 
Gutknecht J., Brew K., Spurr N., Goodfellow P.N. Genomics 5:5 27-534(1 989). [ 3] Lu G., 
Villalba M., Coscia M.R., Hoffman D.R., King T.P. J. Immunol. 150:2823 -2830(1 993). [ 4] 
Dixon D.C., Curt J.R., Klessig D.F. EMBO J. 10:1317-1324(1991).[ 5] Schuren F.H.J., 
Asgeirsdottir S.A., Kothe E.M., Scheer J.M.J., Wessels J.G.H. J. Gen. Microbiol. 139:2083- 
2090(1993). 

588. SET domain 

SET domains appear to be protein-protein interactionComment: domains. It has been 
demonstrated that SET domainsComment: mediate interactions with a family of proteins 
thatComment: display similarity with dual-specificity phosphatasesComment: (dsPTPases) 

[2]- 

[1] Tripoulas N, LaJeunesse D, Gildea J, Shearn A; Genetics 1996;143:913-928. [2] Cui X, 
De Vivo I, Slany R, Miyamoto A, Firestein R, Cleary, ML; Nat Genet 1998;18:331-337. 
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589. Src homology 3 (SH3) domain profile 

The Src homology 3 (SH3) domain is a small protein domain of about 60 amino-acid residues 
first identified as a conserved sequence in the non-catalytic part of several cytoplasmic 
protein tyrosine kinases (e.g. Src, Abl, Lck) [IJ.Since then, it has been found in a great 
variety of other intracellular or membrane-associated proteins [2,3 ,4,5] .The SH3 domain has 
a characteristic fold which consists of five or six beta-strands arranged as two tightly packed 
anti-parallel beta sheets. The linker regions may contain short helices [6] .The function of the 
SH3 domain is not well understood. The current opinion is that they mediate assembly of 
specific protein complexes via binding to proline-rich peptides [7]. In general SH3 domains 
are found as single copies in a given protein, but there is a significant number of protein with 
two SH3 domains and a few with 3 or 4 copies. So far, SH3 domains have been identified in 
the following proteins: - Many vertebrate, invertebrate and retroviral cytoplasmic (non- 
receptor) protein tyrosine kinases. In particular in the Src, Abl, Bkt, Csk and ZAP70 families 
of kinases. - Mammalian phosphatidylinositol-specific phospholipase C-gamma-1 and -2. - 
Mammalian phosphatidyl inositol 3-kinase regulatory p85 subunit. - Mammalian Ras 
GTPase-activating protein (GAP). - Adaptor proteins mediating binding of guanine 
nucleotide exchange factors to growth factor receptors: vertebrate GRB2, Caenorhabditis 
elegans sem-5 and Drosophila DRK. All of which have two SH3 domains. - Mammalian Vav 
oncoprotein, a guanine nucleotide exchange factor of the CDC24 family. - Some guanine- 
nucleotide releasing factors of the CDC25 family: yeast CDC25, yeast SCD25, fission yeast 
ste6. - MAGUK proteins. These proteins consist of at least three types of domains: one or 
more copies of the DHR domain, a SH3 domain and a C-terminal guanylate kinase domain. 
Members of this family are: Drosophila lethal(l)discs large-1 tumor suppressor protein (gene 
Dlgl), mammalian tight junction protein ZO-1, vertebrate erythrocyte membrane protein p55, 
Caenorhabditis elegans protein lin-2, rat protein CASK and mammalian synaptic proteins 
SAP90/PSD-95, CHAPSYN-110/PSD-93, SAP97/DLG1 and SAP102. - Miscellanous 
proteins interacting with vertebrate receptor protein tyrosine kinases: mammalian 
cytoplasmic protein Nek (3 copies), oncoprotein Crk (2 copies). - Chicken Src substrate 
p80/85 protein (cortactin) and the similar human hemopoietic lineage cell specific protein 
Hsl. - Mammalian dihydrouridine-sensitive L-type calcium channel beta (regulatory) subunit 
including the related human myasthenic syndrome antigen B (MSYB). - Mammalian 
neutrophil cytosolic activators of NADPH oxidase: p47 (NCF-1), p67 (NCF-2), and a 
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potential homolog from Caenorhabditis elegans (B0303 .7). NCF-1 and -2 have two copies of 
the SH3 domain, while B0303.7 has four, - Some myosin heavy chains from amoebae, slime 
molds and yeast (gene MY03). - Vertebrate and Drosophila spectrin and fodrin alpha-chain. - 
Human amphiphysin. - Yeast actin-binding protein ABP1. - Yeast actin-binding protein 
5 SLA1 (3 copies). - Yeast protein BEM1 and the fission yeast homolog scd2 (or ral3) (2 
copies). - Yeast BEMl-binding proteins BOI2 (BEB1) and BOB1 (BOI1). - Yeast fusion 
protein FUS1. - Yeast protein RSV167. - Yeast protein SSU81. - Yeast hypothetical proteins 
YAR014c (1 copy), YFR024c (1 copy), YHL002w (1 copy), YHR016c (1 copy), YJL020C 
(1 copy), YHR114w (2 copies) and the fission yeast homolog SpAC12C2.05c. - 
1 0 Caenorhabditis elegans hypothetical proteins F42H10.3. The profile developed to detect SH3 
domains is based on a structural alignment consisting of 5 gap-free blocks and 4 linker 
regions totaling 62 match positions. 

[ 1] Mayer B.J., Hamaguchi M., Hanafusa H. Nature 332:272-275(1988).[ 2] Musacchio A., 
Gibson T., Lehto V.P., Saraste M. FEBS Lett. 307:55-61(1992).[ 3] Pawson T., Schlessinger 
15 J. Curr. Biol. 3:434-442(1993),[ 4] Mayer B.J., Baltimore D. Trends Cell Biol. 3:8-13(1993).[ 
5] Pawson T. Nature 373:573-580(1995).[ 6] Kuriyan J., Cowburn D. Curr. Opin. Struct. 
Biol. 3:828-837(1993).[ 7] Morton C.J., Campbell I.D. Curr. Biol. 4:615-617(1994). 



2 0 590. Serine hydroxymethyltransferase pyridoxal-phosphate attachment site (SHMT) 

Serine hydroxymethyltransferase (EC 2.1.2.1) (SHMT) [1] catalyzes the transfer of the 
hydroxymethyl group of serine to tetrahydrofolate to form 5,10-methylenetetrahydrofolate 
and glycine. In vertebrates, it exists in acytoplasmic and a mitochondrial form whereas only 
one form is found in prokaryotes. Serine hydroxymethyltransferase is a pyridoxal-phosphate 

2 5 containing enzyme. The pyridoxal-P group is attached to a lysine residue around which the 
sequence is highly conserved in all forms of the enzyme. 

Consensus pattern: [DEH]-[LIVMFY]-x-[STMV]-[GST]-[ST](2)-H-K-[ST]-[LF]-x-G- 

[PAC]-[RQ]-[GSA]-[GA] [K is the pyridoxal-P attachment site] 

[ 1] Usha R., Savithri H.S., Rao N.A. Biochim. Biophys. Acta 1204:75-83(1994). 



591. SIS domain 

SIS (Sugar ISomerase) domains are found in many phosphosugar isomerases and 
phosphosugar binding proteins. 



Attorney No. 2750-1237P 

491 

[1] Teplyakov A, Obmolova G, Badet-Denisot MA, Badet B, Polikarpov I; Structure 
1998;6:1047-1055, 

592. (SKI) Shikimate kinase signature 

Shikimate kinase (EC 2.7.1.71) catalyzes the fifth step in the biosynthesis from chorismate of 
the aromatic amino acids (the shikimate pathway) inbacteria (gene aroK or aroL), plants and 
in fungi (where it is part of a multifunctional enzyme which catalyzes five consecutive steps 
in this pathway).Shikimate kinase is a small protein of about 200 residues. A conserved 
region that contains a run of three glycines has been selected as a signature pattern. 
Consensus pattern: [KR]-x(2)-E-x(3)-[LIVMF]-x(8,12)-[LIVMF](2)-[SA]-x-G(3)- x- 
[LIVMF], Proteins belonging to this family also contain a copy of the ATP/GTP- binding 
motif A* (P-loop). 

593. SNAP-25 family 

SNAP-25 (synaptosome-associated protein 25 kDa) proteins are components of 
SNARE complexes. Members of this family contain a cluster of cysteine residues that can be 
palmitoylated for membrane attachment [2]. 

[ljBrennwald P, Kearns B, Champion K, Keranen S, Bankaitis V, Novick P; Cell 
1994;79:245-258. [2] Risinger C, Blomqvist AG, Lundell I, Lambertsson A, Nassel D, 
Pieribone VA, Brodin L, Larhammar D; J Biol Chem 1993;268:24408-24414. 

594. SNF2 and others N-terminal domain 

This domain is found in proteins involved in a variety of 
processes including transcription regulation (e.g., SNF2, STH1, 
brahma, MOT1) , DNA repair (e.g., ERCC6, RAD16, RAD5), DNA 
recombination (e.g., RAD54), and chromatin unwinding (e.g., ISWI) 
as well as a variety of other proteins with little functional 
information (e.g., lodestar, ETL1). 
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595. Staphylococcal nuclease homologues (Snase) 

Present in all three domains of cellular life. Four copies in the transcriptional coactivator 
plOO. These, however, appear to lack the active site residues of Staphylococcal nuclease. 
Positions 14 (Asp-21), 34 (Arg-35), 39 (Asp-40), 42 (Glu-43) andComment: 110 (Arg-87) 
5 [SNase numbering in parentheses] are thought to be involved in substrate-binding and 
catalysis. 

[1] Ponting CP; Protein Sci 1997;6:459-463. [2] Callebaut I, Mornon JP; Biochem J 
1997;321:125-132. 

10 

596. SPRY domainA 

SPRY Domain is named from SPla and the RYanodine Receptor. Domain of unknown 
function. Distant homologues are domains in Comment: butyrophilin/marenostrin/pyrin 
homologues. 

15 [1] Ponting C, Schultz J, Bork P; Trends Biochem Sci 1997;22:193-194. 



597. (SQS PSY) Squalene and phytoene synthases signatures 

Two different polyisoprene synthases have been shown [1,2,3] to share a number of regions 
2 0 of sequence similarities: - Squalene synthase (EC 2.5.1.21) (farnesyl-diphosphate 

farnesyltransferase) (SQS), which catalyzes the conversion of two molecules of farnesyl 
diphosphate (FPP) into squalene. It is the first committed step in the cholesterol biosynthetic 
pathway. The reaction carried out by SQS is catalyzed in two separate steps: the first is a 
head-to-head condensation of the two molecules of FPP to form presqualene diphosphate; 

2 5 this intermediate is then rearranged in a NADP-dependent reduction, to form squalene. SQS 

is found in eukaryotes. In yeast it is encoded by the ERG9 gene, in mammals by the FDFT1 
gene. SQS seems to be membrane-bound. - Phytoene synthase (EC 2.5.1.-) (PSY), which 
catalyzes the conversion of two molecules of geranylgeranyl diphosphate (GGPP) into 
phytoene. It is the second step in the biosynthesis of carotenoids from isopentenyl 

3 0 diphosphate. The reaction carried out by PSY is catalyzed in two separate steps: the first is a 

head-to-head condensation of the two molecules of GGPP to form prephytoene diphosphate; 
this intermediate is then rearranged to form phytoene. PSY is found in all organisms that 
synthesize carotenoids: plants and photosynthetic bacteria as well as some non- 
photosynthetic bacteria and fungi. In bacteria PSY is encoded by the gene crtB. In plants PSY 
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is localized in the chloroplast. As it can be seen from the description above, both SQS and 
PSY share a number of functional similarities which are also reflected at the level of their 
primary structure. In particular three well conserved regions are shared bySQS and PSY; they 
could be involved in substrate binding and/or the catalytic mechanism. Signature patterns 
5 have been developed for the second and third conserved regions; they are localized in the 
central part of these enzymes. 

Consensus pattern: Y-[CSAM]-x(2)-[VSG]-A-[GSA]-[LIVAT]-[IV]-G-x(2)-[LMSC]« x(2)- 
[LIV] 

Consensus pattern: [LIVM]-G-x(3)-Q-x(2 ? 3)-N-[IF]-x-R-D-[LIVMFY]-x(2)-[DE]- x(4,7)-R- 
10 x-[FY]-x-P- 

[ 1] Summers C, Karst F., Charles A.D. Gene 136:185-192(1993).[ 2] Robinson G.W., Tsay 
Y.H., Kienzle B.K., Smith-Monroy C.A., Bishop R.W. MoL Cell. BioL 13:2706- 
2727(1993).[ 3] Roemer S., Hugueney P., Bouvier F., Camara B., Kuntz M. Biochem. 
Biophys. Res. Commun. 196:1414-1421(1993). 



598. SRP54-type proteins GTP-binding domain signature 

The signal recognition particle (SRP) is an oligomeric complex that mediates targeting and 
insertion of the signal sequence of exported proteins into the membrane of the endoplasmic 
2 0 reticulum. SRP consists of a 7S RNA and six protein subunits. One of these subunits, the 54 
Kd protein (SRP54), is a GTP-binding protein that interacts with the signal sequence when it 
emerges from the ribosome. The N-terminal 300 residues of SRP54 include the GTP-binding 
site (G-domain) and are evolutionary related to similar domains in other proteins which are 
listed below [1], - Escherichia coli and Bacillus subtilis ffh protein (P48), a protein which 

2 5 seems to be the prokaryotic counterpart of SRP54. Ffh is associated with a 4.5S RNA in the 

prokaryotic SRP complex. - Signal recognition particle receptor alpha subunit (docking 
protein), an integral membrane GTP-binding protein which ensures, in conjunction with SRP, 
the correct targeting of nascent secretory proteins to the endoplasmic reticulum membrane. 
The G-domain is located at the C-terminal extremity of the protein. - Bacterial ftsY protein, a 

3 0 protein which is believed to play a similar role to that of the docking protein in eukaryotes. 

The G-domain is located at the C-terminal extremity of the protein. - The pilA protein from 
Neisseria gonorrhoeae which seems to be the homolog of ftsY. - A protein from the 
archaebacteria Sulfolobus solfataricus. This protein is also believed to be a docking protein. 
The G-domain is also at the C- terminus. - Bacterial flagellar biosynthesis protein flhF. The 
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best conserved regions in those domains are the sequence motifs that are part of the GTP- 
binding site, but as those regions are not specific to these proteins, they were not used as a 
signature pattern. Instead, a conserved region located at the C-terminal end of the domain was 
selected. 

Consensus pattern: P-[LIVM]-x-[FYL]-[LIVMAT]-[GS]-x-[GS]-[EO]-x(4>[LIVMF] 
[ 1] Althoff S., Selinger D., Wise J.A. Nucleic Acids Res. 22:1933-1947(1994). 



599. (STphosphatase) Serine/threonine specific protein phosphatases signature 
Serine/threonine specific protein phosphatases (EC 3.13.16 ) (PP) [1,2,3] are enzymes that 
catalyze the removal of a phosphate group attached to a serine or evolutionary related. - 
Protein phosphatase-1 (PP1) is an enzyme of broad specificity. It is inhibited by two 
thermostable proteins, inhibitor- 1 and -2. In mammals, there are two closely related isoforms 
of PP-1: PP-lalpha and PP-lbeta, produced by alternative splicing of the same gene. In 
Emericella nidulans, PP-1 (gene bimG) plays an important role in mitosis control by 
reversing the action of the nimA kinase. In yeast, PP-1 (gene SIT4) is involved in 
dephosphorylating the large subunit of RNA polymerase II. - Protein phosphatase-2A (PP2A) 
is also an enzyme of broad specificity. PP2A is a trimeric enzyme that consist of a core 
composed of a catalytic subunit associated with a 65 Kd regulatory subunit and a third 
variable subunit. In mammals, there are two closely related isoforms of the catalytic subunit 
of PP2A: PP2A-alpha and PP2A-beta, encoded by separate genes. - Protein phosphatase-2B 
(PP2B or calcineurin), a calcium-dependent enzyme whose activity is stimulated by 
calmodulin. It is composed of two subunits: the catalytic A-subunit and the calcium-binding 
B-subunit. The specificity of PP2B is restricted.In addition to the above-mentioned enzymes, 
some additional serine/threoninespecific protein phosphatases have been characterized and 
are listed below. - Mammalian phosphatase-X (PP-X), and Drosophila phosphatase- V (PP-V) 
which are closely related but yet distinct from PP2A. - Yeast phosphatase PPH3, which is 
similar to PP2A, but with different enzymatic properties. - Drosophila phosphatase-Y (PP-Y), 
and yeast phosphatases Zl and Z2 (genes PPZ1 and PPZ2) which are closely related but yet 
distinct from PP1. - Drosophila retinal degeneration protein C (gene rdgC), a calcium-binding 
phosphatase required to prevent light-induced retinal degeneration. - Phages Lambda and Phi- 
80 ORF-221 which have been shown to have phosphatase activity and are related to 
mammalian PP's. The best conserved regions in these proteins is a perfectly conserved 
pentapeptide that can be used as a signature pattern. 
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Consensus pattern: [LIVM] -R-G-N-H-E- 

[ 1] Cohen P. Annu. Rev. Biochem. 58:453-508(1989).[ 2] Cohen P., Cohen P.T.W. J. Biol. 
Chem. 264:21435-21438(1989).[ 3] Cohen P.T.W., Brewis N.D., Hughes V., Mann D J. 
FEBS Lett. 268:355-359(1990). 

5 

600. Translation initiation factor SUI1 signature 

In budding yeast (Saccharomyces cerevisiae), SUI1 is a translation initiation factor that 
functions in concert with eIF-2 and the initiator tRNA-Met in directing the ribosome to the 
1 0 proper start site of translation [1]. SUI1 is a protein of 108 residues. Close homologs of SUI1 
have been found [2] in mammals, insects and plants. SUI1 is also evolutionary related to 
hypothetical proteins from Escherichia coli (yciH) ? Haemophilus influenzae (HI1225) and 
Methanococcus vannielii. A conserved region in the C-terminal section has been selected as a 
signature pattern. 

1 5 Consensus pattern: [LIVM]-[EQ]-[LIVM]-Q-G-[DEN]-[KHQ]-[KRV] 

[ 1] Yoon H.> Donahue T.F. Mol. Cell. Biol. 12:248-260(1992).[ 2] Fields C.A., Adams M.D. 
Biochem. Biophys. Res. Commun. 198:288-291(1994). 

2 0 601. (S T dehydratase) Serine/threonine dehydratases pyridoxal-phosphate attachment site 
Serine and threonine dehydratases [1,2] are functionally and structurally related pyridoxal- 
phosphate dependent enzymes: - L-serine dehydratase (EC 4.2.1.13 ) and D-serine 
dehydratase (EC 4.2.1.14 ) catalyze the dehydratation of L-serine (respectively D-serine) into 
ammonia and pyruvate. - Threonine dehydratase (EC 4.2.1.16 ) (TDH) catalyzes the 

2 5 dehydratation of threonine into alpha-ketobutarate and ammonia. In Escherichia coli and 

other microorganisms, two classes of TDH are known to exist. One is involved in the 
biosynthesis of isoleucine, the other in hydroxamino acid catabolism.Threonine synthase (EC 
4.2.99.2 ) is also a pyridoxal-phosphate enzyme, it catalyzes the transformation of 
homoserine-phosphate into threonine. It has been shown [3] that threonine synthase is 

3 0 distantly related to the serine/threonine dehydratases. In all these enzymes, the pyridoxal- 

phosphate group is attached to a lysine residue. The sequence around this residue is 
sufficiently conserved to allow the derivation of a pattern specific to serine/threonine 
dehydratases and threonine synthases. 
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Consensus pattern: [DESH]-x(4 ? 5)-[STVG]-x-[AS]-[FYI]-K-[DLIFSA]-[RVMF]-[GA]- 
[LIVMGA] [The K is the pyridoxal-P attachment site] 

[ 1] Ogawa H., Gomi T., Konishi K., Date T., Naakashima H., Nose K, ? Matsuda Y. ? Peraino 
C, Pitot H.C., Fujioka M. J. Biol. Chem. 264:15818-15823(1989).[ 2] Datta P., Goss T.J., 
5 Omnaas J.R., Patil R.V. Proc. Natl. Acad. Sci. U.S A. 84:393-397(1987).[ 3] Parsot C. 
EMBO J. 5:3013-3019(1986).[ 4] Grabowski R., Hofineister A.E.M., Buckel W. Trends 
Biochem. Sci. 18:297-300(1993). 

Cysteine synthase/cystathionine beta-synthase P-phosphate attachment site 
10 Cysteine synthase (CSase) is the pyridoxal-phosphate dependent enzyme responsible [1] for 
the formation of cysteine from O-acetyl-serine and hydrogen sulfide with the concomitant 
release of acetic acid. In bacteria suchas Escherichia coli, two forms of the enzyme are 
known (genes cysK and cysM).In plants there are also two forms, one located in the 
cytoplasm and the otherin chloroplasts.Cystathionine beta-synthase [2] catalyzes the first 
15 irreversiblestep in homocysteine transulfuration; the conjugation of homocysteine andserine 
forming cystathionine. Like Csase it is a pyridoxal-phosphate dependent enzyme. The two 
types of enzymes are evolutionary related. The pyridoxal-phosphategroup of CSases has been 
shown to be attached to a lysine residue which is located in the N-terminal section of these 
enzymes; the sequence around this residue is highly conserved and can be used as a signature 
2 0 pattern to detect this class of enzymes. 

Consensus pattern: K-x-E-x(3)-[PA]-[STAGC]-x-S-[IVAP]-K-x-R-x-[STAG]-x(2)- [LIVM] 
[The 2nd K is the pyridoxal-P attachment site 

[ 1] Saito K., Kurosawa M., Murakoshi I. FEBS Lett. 328:111-114(1993).[ 2] Swaroop M. ? 
Bradley K. ? Ohura T., Tahara T., Roper M.D., Rosenberg L.E., Kraus J.P. J. BioL Chem. 
2 5 267:11455-11461(1992). 



602. S locus glycop 

S-locus glycoprotein family. In Brassicaceae, self-incompatible plants have a self/non-self 
3 0 Comment: recognition system. This is sporophytically controlled by Comment: multiple 

alleles at a single locus (S). S-locus glycoproteins ? Comment: as well as S-receptor kinases, 
are in linkage with the S-alleles [l].Number of members: 128 

[1] Evolutionary aspects of the S-related genes of the Brassica self-incompatibility system: 
synonymous and nonsynonymous base substitutions. Hinata K, Watanabe M, Yamakawa S, 
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Satta Y, Isogai A; Genetics 1995;140:1099-1104. [2] Polymorphism of the S-locus 
glycoprotein gene (SLG) and the S-locus related gene (SLR1) in Raphanus sativus L. and 
self-incompatible ornamental plants in the Brassicaceae. Sakamoto K, Kusaba M, Nishio T; 
Mol Gen Genet 1998;258:397-403. 

603. (sdh cyt) Succinate dehydrogenase cytochrome b subunit signatures 
Succinate dehydrogenase (SDH) is a membrane-bound complex of two main components: a 
membrane-extrinsic component composed of an FAD-binding flavoprotein and an iron-sulfur 
protein, and a hydrophobic component composed of a cytochrome B and a membrane anchor 
protein. The cytochrome b component is a mono heme transmembrane protein [1,2,3] 
belonging to a family that groups: - Cytochrome b-556 from bacterial SDH (gene sdhC). - 
Cytochrome b560 from the mammalian mitochondrial SDH complex. - Cytochrome b560 
subunit encoded in the mitochondrial genome of some algae and in the plant Marchantia 
polymorpha. - Cytochrome b from yeast mitochondrial SDH complex (gene SDH3 or CYB3). 
- Protein cyt-1 from Caenorhabditis.These cytochromes are proteins of about 130 residues 
that comprise threetransmembrane regions. There are two conserved histidines which may 
beinvolved in binding the heme group. Two signature patterns have been developed that 
include these histidine residues. 

Consensus pattern: R-P-[LIVMT]-x(3)-[LIVM]-x(6)-[LIVMWPK]-x(4)-S-x(2)-H-R-x- [ST] 
[H could be a heme ligand] 

Consensus pattern: H-x(3)-[GA]-[LIVMT]-R-[HF]-[LIVMF]-x-[FYWM]-D-x-[GVA] [H 
could be a heme ligand] 

[ 1] Yu L., Wei Y.-Y., Usui S., Yu C.-A. J. BioL Chem. 267:24508-245 15(1992).[ 2] 
Abraham P.R., Mulder A., Van't Riet J., Raue H.A. Mol. Gen. Genet. 242:708-716(1994).[ 3] 
Leblanc C, Boyen C, Richard CX, Bonnard G., Grienenberger LM. ? Kloareg B. J. Mol. BioL 
250:484-495(1995). 

604. Seel family 

[1] The Seel family: a novel family of proteins involved in synaptic transmission and 
general secretion. Halachmi N, Lev Z; J Neurochem 1996;66:889-897. 
Number of members: 40 
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605. Protein secE/sec61 -gamma signature 

In bacteria, the secE protein plays a role in protein export; it is one of the components - with 
secY and secA - of the preprotein translocase. In eukaryotes, the evolutionary related protein 
sec61-gamma playsa role in protein translocation through the endoplasmic reticulum; it is 
part of a trimeric complex that also consist of sec61-alpha and beta [1]. Both secE and sec61- 
gamma are small proteins of about 60 to 90 amino acids that contain a single transmembrane 
region at their C-terminal extremity (Escherichia colisecE is an exception, in that it possess 
an extra N-terminal segment of 60residues that contains two additional transmembrane 
domains).The sequence of secE/sec61 -gamma is not extremely well conserved, however it is 
possible to derive a signature pattern centered on a conserved proline located 10 residues 
before the beginning of the transmembrane domain. 

Consensus pattern: [LIVMFY]-x(2)-[DENQGA]-x(4)-[LIVMFTA]-x-[KRV]-x(2)-[KW]-P- 
x(3)-[SEQ]-x(7)-[LIVT]-[LIVGA]-[LIVFGAST] 

[ 1] Hartmann E., Sommer T., Prehn S. ? Goerlich D., Jentsch S., Rapoport T.A. Nature 
367:654-657(1994). 



606. 11-S plant seed storage proteins signature 

Plant seed storage proteins, whose principal function appears to be the major nitrogen source 
for the developing plant, can be classified, on the basis of their structure, into different 
families. 11-S are non-glycosylated proteins which form hexameric structures [1,2]. Each of 
the subunits in the hexamer is itself composed of an acidic and a basic chain derived from a 
single precursor and linked by a disulfide bond. This structure is shown in the following 
representation. + + | | 

xxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxNGxCxxxxxxxxxxxxxxxxxxxxxxx *********<- 

____Acidic-subunit >< Basic-subunit > < About-480-to-500- 

residues >'C: conserved cysteine involved in a disulfide bond. 1 *': position of the 

pattern. Proteins that belong to the 11-S family are: pea and broad bean legumins, rape 
cruciferin, rice glutelins, cotton beta-globulins, soybean glycinins, pumpkin 11-S globulin, 
oat globulin, sunflower helianthinin G3, etc. The region that includes the conserved cleavage 
site between the acidic and basic subunits (Asn-Gly) and a proximal cysteine residue which is 
involved in the interchain disulfide bond have been used as a signature pattern for this family 
of proteins. 
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Consensus pattern: N-G-x-[DE](2)-x-[LIVMF]-C-[ST]-x(ll ? 12)-[PAG]-D [C is involved in a 
disulfide bond 

[ 1] Hayashi M., Mori H., Nishimura M., Akazawa T., Hara-Nishimura I. Eur. J. Biochem. 
172:627-632(1988).[ 2] Shotwell M.A. ? Afonso C. ? Davies E., Chesnut R.S., Larkins B.A. 
Plant Physiol. 87:698-704(1988). 

607. 7S seed storage protein 

7S globulin is one of the main storage proteins of most angiosperms and 
gymnosperms. The 7S storage proteins are homotrimers. 
Number of members: 67 

[1] The three-dimensional structure of canavalin from jack bean (Canavalia 
ensiformis). Ko TP, Ng JD, McPherson A; Plant Physiol 1993;101:729-744. 

608. Aspartate-semialdehyde dehydrogenase signature 

Aspartate-semialdehyde dehydrogenase (ASD) catalyzes the second step in the common 
biosynthetic pathway leading from Asp to diaminopimelate and Lys, to Met, and to Thr; the 
NADP-dependent reductive dephosphorylation of L-aspartyl phosphate to L-aspartate- 
semialdehyde. In bacteria and fungi, ASDis a protein of about 40 Kd (340 to 370 residues) 
whose sequence is not extremely well conserved [1]. A conserved cysteine residue has been 
implicated as important for the catalytic activity [2] .The region of conservation around the 
active site residue is too small to be used as signature pattern. Another more conserved 
region, located in the last third of the sequence, and which contains both a conserved cysteine 
as well as an histidine has been used instead. 

Consensus pattern: [LIVM]-[SADN]-x(2)-C-x-R-[LIVM]-x(4)-[GSC]-H-[STA 

[ 1] Baril C, Richaud C, Fourni E., Baranton G., Saint Girons I. J. Gen. Microbiol. 138:47- 

53(1992).[ 2] Karsten W.E., Viola R.E. Biochim. Biophys. Acta 1121:234-238(1992). 

N-acetyl-gamma-glutamyl-phosphate reductase active site 

N-acetyl-gamma-glutamyl-phosphate reductase (EC 1.2.1.38) (AGPR) [1,2] is the enzyme 
that catalyzes the third step in the biosynthesis of arginine from glutamate, the NADP- 
dependent reduction of N-acetyl-5-glutamyl phosphate into N-acetylglutamate 5- 
semialdehyde.In bacteria it is a monofunctional protein of 35 to 38 Kd (gene argC) while in 
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fungi it is part of a Afunctional mitochondrial enzyme (gene ARG5,6, argil orarg-6) which 
contains a N-terminal acetylglutamate kinase (EC 2.7.2.8 ) domain and a C-terminal AGPR 
domain. In the Escherichia coli enzyme, a cysteine has been shown to be implicated in the 
catalytic activity, the region around this residue is well conserved and can be used as a 
signature pattern. 

Consensus pattern: [LIVM]-[GSA]-x-P-G-C-[FY]-[AVP]-T-[GA]-x(3)-[GTAC]-[LIVM]- x- 
P [C is the active site residue] 

[ 1] Ludovice M., Martin J.F., Carrachas P., Liras P. J. BacterioL 174:4606-4613(1992).[ 2] 
Gessert S.F., Kim J.H., Nargang F.E., Weiss R.L. J. Biol. Chem. 269:8189-8203(1994). 

609. Sialyltransferase family, 
Number of members: 18 

610. SpoU rRNA Methylase family 

This family of proteins probably use S-AdoMet. Number of members: 58 

[1] SpoU protein of Escherichia coli belongs to a new family of putative rRNA methylases. 

Koonin EV ? Rudd KE; Nucleic Acids Res 1993;21:5519-5519. [2] The spoU gene of 

escherichia coli , the fourth gene of the spoT operon, is essential for tRNA (Gml8) 2 ' 

methyltransferase activity. Persson BC, Jager G, Gustafsson C; Nucleic Acids Res 

1997;25:4093-4097. 

611. Stathmin family signatures 

Stathmin [1] (from the Greek 'stathmos' which means relay), is an ubiquitous intracellular 
protein, present in a variety of phosphorylated forms and which serves as a relay for diverse 
second messenger pathways. Its expression and phosphorylation are regulated throughout 
development and in response to extracellular signals regulating cell proliferation, 
differentiation and function. Stathmin is a highly conserved protein of 149 amino acid 
residues. Structurally, it consists of an N-terminal domain of about 45 residues followed by a 
78 residue alpha-helical domain consisting of a heptad repeat coiled coil structure and a C- 
terminal domain of 25 residues. Protein SCG10 is a neuron-specific, membrane-associated 
protein that accumulates in the growth cones of developing neurons. It is highly similar in its 
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sequence to stathmin, but differs in that it contains an additional N-terminal hydrophobic 
segment of 32 residues which is probably responsible for its interaction with membranes. 
Xenopus protein XB3 is also evolutionary related to stathmin and also contains an additional 
N-terminal hydrophobic domain [2]. A conserved decapeptide which ends with the first three 
5 residues of the coiled coil domain and a second pattern that corresponds to part of the central 
region of the coiled coil have been selected as signatures for proteins of the stathmin family. 
Consensus pattern: P-[KRQ]-[KR](2)-[DE]-x-S-L-[EG]-E- 
Consensus pattern: A-E-K-R-E-H-E-[KR]-E- 

[1] Sobel A. Trends Biochem. Sci. 16:301-305(1991).[ 2] Maucuer A., Moreau J., Mechali 
1 0 M., Sobel A. J. Biol. Chem. 268:16420-16429(1993). 

612. SUA5/yciO/yrdC family signature. The following uncharacterized proteins have been 
shown [1] to share regions of similarities: - Yeast protein SUA5. - Escherichia coli 

15 hypothetical protein yciO and HI1198, the corresponding Haemophilus influenzae protein. - 
Escherichia coli hypothetical protein yrdC and HI0656, the corresponding Haemophilus 
influenzae protein. - Bacillus subtilis hypothetical protein ywlC. - Mycobacterium leprae 
hypothetical protein in rfe-hemK intergenic region. - Methanococcus jannaschii hypothetical 
protein MJ0062.These are proteins of from 20 to 46 Kd which contain a number of conserved 

2 0 regions in their N-terminal section. They can be picked up in the database by the following 
pattern. 

Consensus pattern: [LIVMTA](3)-[LIVMFYC]-[PG]-T-[DE]-[STA]-x-[FY]-[GA]- [LIVM]- 
[GS]- 

25 

[ 1] Bairoch A., Rudd K.E., Robison K. Unpublished observations (1995). 

613. Sucrose synthase 

30 Sucrose synthases catalyse the synthesis of sucrose from UDP-glucose and fructose. This 
family includes the bulk of the sucrose synthase protein. However the carboxyl terminal 
region of the sucrose synthases belongs to the glycosyl transferase family Glycos_transfJL 
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614. Sulfotransferase proteins 
Number of members: 59 

615. Synaptophysin / synaptoporin signature 

Synaptophysin and synaptoporin [1] are structurally related proteins, found in the membrane 
of synaptic vesicles, which may function as ionic or solute channels. These two glycoproteins 
seem to span the membrane four times. Both their N- and C-termini sequences seem to be 
cytoplasmically located. As a signature pattern for this family of proteins, a highly conserved 
region located in the beginning of the first intravesicular loop just after the first 
transmembrane domain has been selected. This region contains a cysteine residue that may be 
involved in a disulfide bond. 

Consensus pattern: L-S-V-[DE]-C-x-N-K-T [C may be involved in a disulfide bond 
[ 1] Knaus P., Marqueze-Pouey B., Scherer R, Betz H. Neuron 5:453-462(1990). 

616. Syndecans signature 

Syndecans [1,2] (from the greek syndein; to bind together) are a family of transmembrane 
heparan sulfate proteoglycans which are implicated in the binding of extracellular matrix 
components and growth factors. Syndecans bind a variety of molecules via their heparan 
sulfate chains and can act as receptors or as co-receptors. Structurally, these proteins consist 
of four separate domains: a) A signal sequence; b) An extracellular domain (ectodomain) of 
variable length and whose sequence is not evolutionary conserved in the various forms of 
syndecans. The ectodomain contains the sites of attachment of the heparan sulfate 
glycosaminoglycan side chains; c) A transmembrane region; d) A highly conserved 
cytoplasmic domain of about 30 to 35 residues which could interact with cytoskeletal 
proteins. The proteins known to belong to this family are: - Syndecan 1. - Syndecan 2 or 
fibroglycan. - Syndecan 3 or neuroglycan or N-syndecan. - Syndecan 4 or amphiglycan or 
ryudocan. - Drosophila syndecan. - Caenorhabditis elegans probable syndecan (F57C7.3).The 
signature pattern that has been developed for syndecans starts with the last residue of the 
transmembrane region and includes the first 10 residues of the cytoplasmic domain. This 
region, which contains four basic residues, could act as a stop transfer site. 
Consensus pattern: [FY]-R-[IM]-[KR]-K(2)-D-E-G-S-Y 
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[ 1] Bernfield M., Kokenyesi R., Kato M., Hinkes M.T, Spring J., Gallo R.L., Lose E J. 
Annu. Rev. Cell Biol. 8:365-393(1992).[ 2] David G. FASEB J. 7:1023-1030(1993). 

617. Syntaxin / epimorphin family signature 

The following proteins have been shown to be evolutionary related [1,2,3]: - Epimorphin (or 
syntaxin 2), a mammalian mesenchymal protein which plays an essential role in epithelial 
morphogenesis. - Syntaxin 1A (also known as antigen HPC-1) and syntaxin IB which are 
synaptic proteins which may be involved in docking of synaptic vesicles at presynaptic active 
zones. - Syntaxin 3. - Syntaxin 4, which is potentially involved in docking of synaptic 
vesicles at presynaptic active zones. - Syntaxin 5, which mediates endoplasmic reticulum to 
golgi transport. - Syntaxin 6, which is involved in intracellular vesicle trafficking. - Syntaxin 
7. - Yeast PEP12 (or VPS6) which is required for the transport of proteases to the vacuole. - 
Yeast SED5 which is required for the fusion of transport vesicles with the Golgi complex. - 
Yeast SSOl and SS02 which are required for vesicle fusion with the plasma membrane. - 
Yeast VAM3, which is required for vacuolar assembly. - Arabidopsis thaliana protein 
KNOLLE which may be involved in cytokinesis. - Caenorhabditis elegans hypothetical 
proteins F35C8.4, F48F7.2, F55A11.2 and T01B11.3.The above proteins share the following 
characteristics: a size ranging from30 Kd to 40 Kd; a C-terminal extremity which is highly 
hydrophobic and isprobably involved in anchoring the protein to the membrane; a central, 
well conserved region, which seems to be in a coiled-coil conformation. The pattern specific 
for this family is based on the most conserved region of the coiled coil domain. 
Consensus pattern: [RQ]-x(3)-[LIVMA]-x(2)-[LIVM]-[ESH]-x(2)-[LIVMT]-x-[DEVM]- 
[LIVM]-x(2)-[LIVM]-[FS]-x(2)-[LIVM]-x(3)-[LIVT]-x(2)-Q- [GADEQ]-x(2)-[LIVM]- 
[DNQT]-x-[LIVMF]-[DESV]-x(2)-[LIVM] 

[ 1] Bennett M.K., Garcia-Arraras J.E., Elferink L.A., Peterson K. ? Fleming A.M., Hazuka 
CD., Scheller R.H. Cell 74:863-873(1993). [ 2] Spring J., Kato M., Bernfield M. Trends 
Biochem. ScL 18:124-125(1993).[ 3] Pelham H.R.B. Cell 73:425-426(1993). 

618. Sm protein 

The Ul 7 U2 ? U4/U6, and U5 small nuclear ribonucleoprotein 
particles (snRNPs) involved in pre-mRNA splicing contain seven 
Sm proteins (B/B' ? Dl, D2, D3, E, F and G) in common, which 
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assemble around the Sm site present in four of the major 
spliceosomal small nuclear RNAs. These proteins contain a 
common sequence motif in two segments, Sml and Sm2, separated 
by a short variable linker. 

[1] Hermann H, Fabrizio P, Raker VA, Foulaki K, Hornig H, Brahms H, Luhrmann R EMBO 
J 1995;14:2076-2088. [2] Kambach C, Walke S, Young R, Avis JM, de la Fortelle E, Raker 
VA, Luhrmann R, Li J, Nagai K; Cell 1999;96:375-387. 

619. Skpl family 

[1] Stebbins CE, Kaelin WG Jr, Pavletich NP; Science 1999;284:455-461. 

620. Protein secY signatures 

The eubacterial secY protein [1] plays an important role in protein export. It interacts with the 
signal sequences of secretory proteins as well as with two other components of the protein 
translocation system: secA and secE. SecY is an integral plasma membrane protein of 419 to 
492 amino acid residues that apparently contains ten transmembrane segments. Such a 
structure probablyconfers to secY a 'translocator' function, providing a channel for 
periplasmic and outer-membrane precursor proteins.Homologs of secY are found in 
archaebacteria [2]. SecY is also encoded in the chloroplast genome of some algae [3] where it 
could be involved in a prokaryotic-like protein export system across the two membranes of 
the chloroplast endoplasmic reticulum (CER) which is present in chromophyte 
andcryptophyte algae. Two signature patterns have been developed for secY proteins. The 
first corresponds to the second transmembrane region, which is the most conserved section of 
these proteins. The second spans the C-terminal part of the fourth transmembrane region, a 
short intracellular loop, and the N-terminal part of the fifth transmembrane region. 
Consensus pattern: [GST]-[LIVMF](2)-x-[LIVM]-G-[LIVM]-x-P-[LIVMFY](2)-x-[AS]- 
[GSTO]-[LIVMFAT](3)-Q-[LIVMFA](2) 

Consensus pattern: [LIVMFYW](2)-x-[DE]-x-[LIVMF]-[STN]-x(2)-G-[LIVMF]-[GST]- 
[NST]-G-x-[GST]-[LIVMF](3) 
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[ 1] Ito K. MoL MicrobioL 6:2423-2428(1992).[ 2] Auer J., Spicker G., Boeck A. Biochimie 
73:683-688(1991).[ 3] Douglas S.E. FEBS Lett. 298:93-96(1992). 

621. (Seed protein) Small hydrophilic plant seed proteins signature. The following small 
hydrophilic plant seed proteins are structurally related: - Arabidopsis thaliana proteins GEA1 
and GEA6. - Cotton late embryogenesis abundant (LEA) protein D-19. - Carrot EMB-1 
protein. - Barley LEA proteins B19.1A, B19.1B, B19.3 and B19.4. - Maize late 
embryogenesis abundant protein Emb564. - Radish late seed maturation protein p8B6. - Rice 
embryonic abundant protein Empl. - Sunflower 10 Kd late embryogenesis abundant protein 
(DS10). - Wheat Em proteins. These proteins contains from 83 to 153 amino acid residues 
and may play a role[l,2] in equipping the seed for survival, maintaining a minimal level of 
hydration in the dry organism and preventing the denaturation of cytoplasmic components. 
They may also play a role during imbibition by controlling water uptake. As a signature 
pattern, the best conserved region in the sequence of these proteins has been developed, it is a 
gly cine-rich nonapeptide located in the N-terminal section. - 

Consensus pattern: G-[EQ]-T-V-V-P-G-G-T- 

[ 1] Dure L. Ill, Crouch M., Harada J., Ho T.-H. D., Mundy J., Quatrano R., Thomas T., Sung 
Z.R. Plant Mol. Biol. 12:475-486(1989).[ 2] Gaubier P., Raynal M. f Hull G., Huestis G.M., 
Grellet F., Arenas C, Pages M., Delseny M. Mol. Gen. Genet. 238:409-418(1993). 

622. Serine carboxypeptidases, active sites 

All known carboxypeptidases are either metallo carboxypeptidases or 
serinecarboxypeptidases. The catalytic activity of the serine carboxypeptidases, like that of 
the trypsin family serine proteases, is provided by a charge relay system involving an aspartic 
acid residue hydrogen-bonded to a histidine, which is itself hydrogen-bonded to a serine [1]. 
Proteins known to be serine carboxypeptidases are: - Barley and wheat serine 
carboxypeptidases I, II, and III [2]. - Yeast carboxypeptidase Y (YSCY) (gene PRC1), a 
vacuolar protease involved in degrading small peptides. - Yeast KEX1 protease, involved in 
killer toxin and alpha-factor precursor processing. - Fission yeast sxa2, a probable 
carboxypeptidase involved in degrading or processing mating pheromones [3]. - Penicillium 
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janthinellum carboxypeptidase SI [4]. - Aspergillus niger carboxypeptidase pepF. - 
Aspergillus satoi carboxypeptidase cpdS. - Vertebrate protective protein / cathepsin A [5], a 
lysosomal protein which is not only a carboxypeptidase but also essential for the activity of 
both beta-galactosidase and neuraminidase. - Mosquito vitellogenic carboxypeptidase (VCP) 
[6]. - Naegleria fowleri virulence-related protein Nf314 [7]. - Yeast hypothetical protein 
YBR139w. - Caenorhabditis elegans hypothetical proteins C08H9.1, F13D12.6, F32A5.3, 
F41C3.5 and K10B2.2.This family also includes: - Sorghum (s)-hydroxymandelonitrile lyase 
(hydroxynitrile lyase) (HNL) [8], an enzyme involved in plant cyanogenesis. The sequences 
surrounding the active site serine and histidine residues are highly conserved in all these 
serine carboxypeptidases. 

Consensus pattern: [LIVM]-x-[GTA]-E-S-Y-[AG]-[GS] [S is the active site residue] 
Consensus pattern: [LIVF]-x(2)-[LIVSTA]-x-[IVPST]»x-[GSDNQL]-[SAGV]-[SG]-H-x- 
[IVAQ]-P-x(3)-[PSA] [H is the active site residue] 

[ 1] Liao D.L, Remington SJ. J. Biol. Chem. 265:6528-6531(1990).[ 2] Sorensen S.B., 
Svendsen L, Breddam K. Carlsberg Res. Commun. 54:193-202(1989).[ 3] Imai Y., 
Yamamoto M. Mol. Cell. Biol. 12:1827-1834(1992).[ 4] Svendsen L ? Hofmann T., Endrizzi 
J. 7 Remington J., Breddam K. FEBS Lett 333:39-43(1993).[ 5] Galjart NJ., Morreau H., 
Willemsen R., Gillemans N., Bonten E.J., dAzzo A. J. Biol. Chem. 266:14754-14762(1991).[ 
6] Cho W.L., Deitsch K.W., Raikhel A.S. Proc. Natl. Acad. Sci. U.SA 88:10821- 
10824(1991).[ 7] Hu W.N., Kopachik W., Band R.N. Infect. Immun. 60:2418-2424(1992).[ 
8] Wajant H., Mundry K.W., Pfitzenmaier K. Plant Mol. Biol. 26:735-746(1994).[ 9] 
Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994).[E1] 

623. Serpins signature. Serpins (SERine Proteinase INhibitors) [1,2,3,4] are a group of 
structurally related proteins. They are high molecular weight (400 to 500 amino 
acids) ? extracellular, irreversible serine protease inhibitors with a well defined structural- 
functional characteristic: a reactive region that acts as a 'bait' for an appropriate serine 
protease. This region is found in the C-terminal part of these proteins. Proteins which are 
known to belong to the serpin family are listed below (references are only provided for 
recently determined sequences): - Alpha-1 protease inhibitor (alpha-l-antitrypsin, 
contrapsin). - Alpha- 1-antichymotrypsin, - Antithrombin III. - Alpha-2-antiplasmin. - 
Heparin cof actor II. - Complement CI inhibitor. - Plasminogen activator inhibitors 1 (PAI-1) 
and 2 (PAI-2). - Glia derived nexin (GDN) (Protease nexin I). - Protein C inhibitor. - Rat 



Attorney No. 2750-1237P 

507 

hepatocytes SPM, SPI-2 and SPI-3 inhibitors. - Human squamous cell carcinoma antigen 
(SCCA) which may act in the modulation of the host immune response against tumor cells. - 
A lepidopteran protease inhibitor. - Leukocyte elastase inhibitor which, in contrast to other 
serpins, is an intracellular protein. - Neuroserpin [5], a neuronal inhibitor of plasminogen 
activators and plasmim - Cowpox virus crmA [6], an inhibitor of the thiol protease 
interleukin-lB converting enzyme (ICE). CrmA is the only serpin known to inhibit a non- 
serine proteinase. - Some orthopoxviruses probable protease inhibitors, which may be 
involved in the regulation of the blood clotting cascade and/or of the complement cascade in 
the mammalian host. On the basis of strong sequence similarities, a number of proteins with 
no known inhibitory activity are said to belong to this family: - Birds ovalbumin and the 
related genes X and Y proteins. - Angiotensinogen; the precursor of the angiotensin active 
peptide. - Barley protein Z; the major endosperm albumin. - Corticosteroid binding globulin 
(CBG). - Thyroxine-binding globulin (TBG). - Sheep uterine milk protein (UTMP) and pig 
uteroferrin-associated protein (UFAP). - Hsp47, an endoplasmic reticulum heat-shock protein 
that binds strongly to collagen and could act as a chaperone in the collagen biosynthetic 
pathway [7]. - Maspin, which seems to function as a tumor supressor [5]. - Pigment 
epithelium-derived factor precursor (PEDF), a protein with a strong neutrophic activity [8]. - 
Ep45, an estrogen-regulated protein from Xenopus [9]. A signature pattern has been 
developed for this family of proteins, centered on a well conserved Pro-Phe sequence which 
is found ten to fifteen residues on the C-terminal side of the reactive bond 

Consensus pattern: [LIVMFY]-x-[LIVMFYAC]-[DNQ]-[RKHQS]-[PST]-F-[LIVMFY]- 
[LIVMFYC]-x-[LIVMFAH]- 

[ 1] Carrell R., Travis J. Trends Biochem. Sci. 10:20-24(1985).[ 2] Carrell R., Pemberton 
P.A., Boswell D.R. Cold Spring Harbor Symp. Quant. Biol. 52:527-535(1987).[ 3] Huber R., 
Carrell R.W. Biochemistry 28:8951-8966(1989).[ 4] Remold-O'Donneel R FEBS Lett. 
315:105-108(1993).[ 5] Osterwalder T., Contartese J., Stoeckli E.T., Kuhn T.B., Sonderegger 
P. EMBO J. 15:2944-2953(1996).[ 6] Komiyama T., Ray C.A., Pickup D.J., Howard A.D., 
Thornberry N.A., Peterson E.P., Salvesen G. J. Biol. Chem. 269:19331-19337(1994).[ 7] 
Clarke E., Sandwal B.D. Biochim. Biophys. Acta 1129:246-248(1992).[ 8] Zou Z., 
Anisowicz A., Neveu M., Rafidi K., Sheng S., Sager R., Hendrix M.L, Seftor E., Thor A. 
Science 263:526-529(1994).[ 9] Steele F.R., Chader G.J., Johnson L.V., Tombran-Tink J. 
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Proc. Natl. Acad. Sci. U.S.A. 90:1526-1530(1993).[10] Holland L.J, Suksang C, Wall A.A., 
Roberts L.R., Moser D.R., Bhattacharya A. J. Biol. Chem. 267:7053-7059(1992). 

624. Sigma-54 interaction domain signatures and profile 

Some bacterial regulatory proteins activate the expression of genes from promoters 
recognized by core RNA polymerase associated with the alternative sigma-54 factor. These 
have a conserved domain of about 230 residues involved in the ATP-dependent [1,2] 
interaction with sigma-54. This domain has been found in the proteins listed below: - acoR 
from Alcaligenes eutrophus, an activator of the acetoin catabolism operon acoXABC. - algB 
from Pseudomonas aeruginosa, an activator of alginate biosynthetic gene algD. - dctD from 
Rhizobium, an activator of dctA, the C4-dicarboxylate transport protein. - dhaR from 
Citrobacter freundii, a regulator of the dha operon for glycerol utilization. - fhlA from 
Escherichia coli, an activator of the formate dehydrogenase H and hydrogenase III structural 
genes. - flbD from Caulobacter crescentus, an activator of flagellar genes. - hoxA from 
Alcaligenes eutrophus, an activator of the hydrogenase operon. - hrpS from Pseudomonas 
syringae, an activator of hprD as well as other hrp loci involved in plant pathogenicity. - 
hupRl from Rhodobacter capsulatus, an activator of the [NiFe] hydrogenase genes hupSL. - 
hydG from Escherichia coli and Salmonella typhimurium, an activator of the hydrogenase 
activity. - levR from Bacillus subtilis, which regulates the expression of the levanase operon 
(levDEFG and sacC). - nifA (as well as anfA and vnfA) from various bacteria, an activator of 
the nif nitrogen-fixing operon. - ntrC, from various bacteria, an activator of nitrogen 
assimilatory genes such as that for glutamine synthetase (glnA) or of the nif operon. - pgtA 
from Salmonella typhimurium, the activator of the inducible phospho- glycerate transport 
system. - pilR from Pseudomonas aeruginosa, an activator of pilin gene transcription. - rocR 
from Bacillus subtilis, an activator of genes for arginine utilization - tyrR from Escherichia 
coli, involved in the transcriptional regulation of aromatic amino-acid biosynthesis and 
transport. - wtsA, from Erwinia stewartii, an activator of plant pathogenicity gene wtsB. - 
xylR from Pseudomonas putida, the activator of the tol plasmid xylene catabolism operon 
xylCAB and of xylS. - Escherichia coli hypothetical protein yfhA. - Escherichia coli 
hypothetical protein yhgB. About half of these proteins (algB, dcdT, flbD, hoxA, hupRl, 
hydG, ntrC, pgtA and pilR) belong to signal transduction two-component systems [3] and 
possess a domain that can be phosphorylated by a sensor-kinase protein in their N- terminal 
section. Almost all of these proteins possess a helix-turn-helix DNA-binding domain in their 
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C-terminal section. The domain which interacts with the sigma-54 factor has an ATPase 
activity. This may be required to promote a conformational change necessary for 
theinteraction [4]. The domain contains an atypical ATP-binding motif A (P-loop) as well as 
a form of motif B. The two ATP-binding motifs are located in the N-terminal section of the 
domain; signature patterns have been developed for both motifs. Other regions of the domain 
are also conserved. One of them, located in the C-terminal section, has been selected as a 
third signature pattern. 

Consensus pattern: [LIVMFY](3)-x-G-[DEQ]-[STE]-G-[STAV]-G-K-x(2)-[LIVMFY] 
Consensus pattern: [GS]-x-[LIVMF]-x(2)-A-[DNEQASH]-[GNEK]-G-[STIM]- 
[LIVMFY](3)-[DE]-[EK]-[LIVM] 

Consensus pattern: [FYW]-P-[GS]-N-[LIVM]-R-[EQ]-L-x-[NHAT] 

[ 1] Morrett E. ? Segovia L. J. Bacteriol. 175:6067-6074(1993).[ 2] Austin S., Kundrot C, 

Dixon R. Nucleic Acids Res. 19:2281-2287(1991).[ 3] Albright L.M., Huala E., Ausubei 

F.M. Annu. Rev. Genet. 23:311-336(1989).[ 4] Austin S., Dixon R. EMBO J. 11:2219- 

2228(1992). 

625. Sigma-70 factors family signatures 

Sigma factors [1] are bacterial transcription initiation factors that promote the attachment of 
the core RNA polymerase to specific initiation sites and arethen released. They alter the 
specificity of promoter recognition. Most bacteria express a multiplicity of sigma factors. 
Two of these factors, sigma-70 (gene rpoD), generally known as the major or primary sigma 
factor, and sigma-54 (gene rpoN or ntrA) direct the transcription of a wide variety of genes. 
The other sigma factors, known as alternative sigma factors, are required for the transcription 
of specific subsets of genes. With regard to sequence similarity, sigma factors can be grouped 
into two classes: the sigma-54 and sigma-70 families. The sigma-70 family includes, in 
addition to the primary sigma factor, a wide variety of sigma factors, some of which are listed 
below: - Bacillus sigma factors involved in the control of sporulation-specific genes: sigma-E 
(sigE or spoIIGB), sigma-F (sigF or spoIIAC), sigma-G (sigG or spoIIIG), sigma-H (sigH or 
spoOC) and sigma-K (sigK or spoIVCB/spoIIIC). - Escherichia coli and related bacteria 
sigma-32 (gene rpoH or htpR) involved in the expression of heat shock genes. - Escherichia 
coli and related bacteria sigma-27 (gene fliA) involved in the expression of the flagellin gene. 
- Escherichia coli sigma-S (gene rpoS or katF) which seems to be involved in the expression 
of genes required for protection against external stresses. - Myxococcus xanthus sigma-B 
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(sigB) which is essential for the late-stage differentiation of that bacteria. Alignments of the 
sigma-70 family permit the identification of four regions of high conservation [2,3]. Each of 
these four regions can in turn be subdivided into a number of sub-regions. Signature patterns 
based on the two best-conserved sub-regions have been developed. The first pattern 
corresponds to sub-region 2.2;the exact function of this sub-region is not known although it 
could be involved in the binding of the sigma factor to the core RNA polymerase. The second 
pattern corresponds to sub-region 4.2 which seems to harbor a DNA-binding 'helix-turn-helix' 
motif involved in binding the conserved -35region of promoters recognized by the major 
sigma factors. The second pattern starts one residue before the N-terminal extremity of the 
HTH region and ends six residues after its C-terminal extremity. 

Consensus pattern: [DE]-[LIVMF](2)-[HEQS]-x-G-x-[LIVMFA]-G-L-[LIVMFYE]-x- 
[GSAM]-[LIVMAP] 

Consensus pattern: [STN]-x(2)-[DEQ]-[LIVM]-[GAS]-x(4)-[LIVMF]-[PSTG]-x(3)- 

[LIVMA]-x-[NQR]-[LIVMA]-[EQH]-x(3)-[LIVMFW]-x(2)-[LIVM] 

[ 1] Helmann J.D., Chamberlin M.J. Annu. Rev. Biochem. 57:839-872(1988).[ 2] Gribskov 

M., Burgess R.R. Nucleic Acids Res. 14:6745-6763(1986).[ 3] Lonetto M.A., Gribskov M., 

Gross C.A. J. Bacteriol. 174:3843-3849(1992).[ 4] Lonetto M.A., Brown K.L., Rudd K.E., 

Buttner M.J. Proc. Natl. Acad. Sci. U.S.A. 91:7573-7577(1994). 

626. Signal carboxyl-terminal domain. 430 members. 

627. Signal peptidases I signatures 

Signal peptidases (SPases) [1] (also known as leader peptidases) remove the signal peptides 
from secretory proteins. In prokaryotes three types of Spases are known: type I (gene lepB) 
which is responsible for the processing of the majority of exported pre-proteins; type II (gene 
lsp) which only process lipoproteins, and a third type involved in the processing of pili 
subunits. SPase I is an integral membrane protein that is anchored in the cytoplasmic 
membrane by one (in B. subtilis) or two (in E. coli) N-terminal transmembrane domains with 
the main part of the protein protuding in the periplasmic space. Two residues have been 
shown [2,3] to be essential for the catalytic activity of SPase I: a serine and an lysine.SPase I 
is evolutionary related to the yeast mitochondrial inner membrane protease subunit 1 and 2 
(genes IMP1 and IMP2) which catalyze the removal of signal peptides required for the 



Attorney No. 2750-1237P 

511 

targeting of proteins from the mitochondrial matrix, across the inner membrane, into the 
inter-membrane space [4]. In eukaryotes the removal of signal peptides is effected by an 
oligomeric enzymatic complex composed of at least five subunits: the signal peptidase 
complex (SVC). The SPC is located in the endoplasmic reticulum membrane. Two 
components of mammalian SPC, the 18 Kd (SPC 18) and the 21 Kd (SPC21) subunits as well 
as the yeast SEC11 subunit have been shown [5] to share regions of sequence similarity with 
prokaryotic SPases I and yeast IMP1/IMP2. Three signature patterns for these proteins have 
been developed. The first signature contains the putative active site serine, the second 
signature contains the putative active site lysine which is not conserved in the SPC subunits, 
and the third signature corresponds to a conserved region of unknown iological significance 
which is located in the C-terminal section of all these proteins. 
Consensus pattern: [GS]-x-S-M-x-[PS]-[AT]-[LF] [S is an active site residue] 
Consensus pattern: K-R-[LIVMSTA](2)-G-x-[PG]-G-[DE]-x-[LIVM]-x-[LIVMFY] [Kis an 
active site residue] 

Consensus pattern: [LIVMFYW](2)-x(2>G-D-[NH]-x(3)-[SND]-x(2)-[SG] 
[ 1] Dalbey R.E., von Heijne G. Trends Biochem. Sci. 17:474-478(1992).[ 2] Sung M., 
Dalbey R.E. J. Biol. Chem. 267:13154-13159(1992).[ 3] Black M.T. J. Bacteriol. 175:4957- 
4961(1993).[ 4] Nunnari J. ? Fox T.D., Walter P. Science 262:1997-2004(1993).[ 5] van Dijl 
J.M., de Jong A. ? Vehmaanpera J., Venema G., Bron S. EMBO J. 11:2819-2828(1992).[ 6] 
Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994).[E1] 

628. (sodcu) Copper/Zinc superoxide dismutase signatures 

Copper/Zinc superoxide dismutase (SODC) [1] is one of the three forms of an enzyme that 
catalyzes the dismutation of superoxide radicals. SODC binds one atom each of zinc and 
copper. Various forms of SODC are known: acytoplasmic form in eukaryotes, an additional 
chloroplast form in plants, an extracellular form in some eukaryotes, and a periplastic form 
in prokaryotes. The metal binding sites are conserved in all the known SODC sequences [2]. 
Two signature patterns have been derived for this family of enzymes: the first one contains 
two histidine residues that bind the copper atom; the second one islocated in the C-terminal 
section of SODC and contains a cysteine which is involved in a disulfide bond. 
Consensus pattern: [GA]-[IMFAT]-H-[LIVF]-H-x(2)-[GP]-[SDG]-x-[STAGDE] [The two 
H's are copper ligands] 
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Consensus pattern: G-[GN]-[SGA]-G-x-R-x-[SGA]-C-x(2)-[IV] [C is involved in a disulfide 
bond] 

[ 1] Bannister LV., Bannister W.H., Rotilio G. CRC Crit. Rev. Biochem. 22:111-154(1987).[ 
2] Smith M.W., Doolittle R.F. J. Mol. Evol. 34:175-184(1992). 

629. (sodfe) Manganese and iron superoxide dismutases signature 

Manganese superoxide dismutase (SODM) [1] is one of the three forms of an enzyme that 
catalyzes the dismutation of superoxide radicals. The four ligands of the manganese atom are 
conserved in all the known SODM sequences. These metal ligands are also conserved in the 
related iron form of superoxide dismutases [2,3]. A short conserved region which includes 
two of the four ligands: an aspartate and a histidine has been selected as a signature. 
Consensus pattern: D-x-W-E-H-[STA]-[FY](2) [D and H are manganese/iron ligands] 
[ 1] Bannister J.V., Bannister W.H., Rotilio G. CRC Crit. Rev. Biochem. 22:111-154(1987).[ 
2] Parker M.W., Blake C.C.F. FEBS Lett. 229:377-382(1988).[ 3] Smith M.W., Doolittle 
R.F. J. Mol. Evol. 34:175-184(1992). 

630. Spectrin repeat 

Spectrin repeats are found in several proteins involved in 
cytoskeletal structure. These include spectrin, alpha-actinin 

and dystrophin.The sequence repeat used in this family is taken from the structural repeat in 
reference [2]. The spectrin repeat forms a three helix bundle. The second helix is interrupted 
by proline in some sequences. 
Number of members: 898 

[1] Actin-binding proteins. 1: Spectrin super family. Hartwig JH; Protein Profile 
1995;2:732-732. [2] Crystal structure of the repetitive segments of spectrin. Yan Y, 
Winograd E, Viel A, Cronin T, Harrison SC, Branton D; Science 1993;262:2027-2030. 

631. (subtilase) Streptomyces subtilisin-type inhibitors signature 

Bacteria of the Streptomyces family produce a family of proteinase inhibitors[l] 
characterized by their strong activity toward subtilisin. They arecollectively known as SSI's: 
Streptomyces Subtilisin Inhibitors. Some SSFsalso inhibit trypsin or chymotrypsin. In their 
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mature secreted form, SSI's areproteins of about 110 residues with two conserved disulfide 
bonds. + + + + 1 1 1 1 

xxxxxxxxxxxxxxCxxxxxxxCxxxxxxxxxCx#xxxxxxxxxxxxCxxxxxx ************ 'C' : 
conserved cysteine involved in a disulfide bond.* 1 : active site residue/*': position of the 
pattern. 

Consensus pattern: C-x-P-x(2 ? 3)-G-x-H-P-x(4)-A-C-[ATD]-x-L [The two Cs are involved in 
a disulfide bond] 

[ 1] Taguchi S., Kojima S., Terabe M., Miura K.-L, Momose H. Eur. J. Biochem. 220:911- 
918(1994). 



632. Sugar transport proteins signatures 

In mammalian cells the uptake of glucose is mediated by a family of closely related transport 
proteins which are called the glucose transporters [1,2,3] At least seven of these transporters 
are currently known to exist (in Human they are encoded by the GLUT1 to GLUT7 
genes).These integral membrane proteins are predicted to comprise twelve membrane 
spanning domains. The glucose transporters show sequence similarities [4,5] with a number 
of other sugar or metabolite transport proteins listed below (references are only provided for 
recently determined sequences). - Escherichia coii arabinose-proton symport (araE). - 
Escherichia coli galactose-proton symport (galP). - Escherichia coli and Klebsiella 
pneumoniae citrate-proton symport (also known as citrate utilization determinant) (gene cit). 
- Escherichia coli alpha-ketoglutarate permease (gene kgtP). - Escherichia coli 
proline/betaine transporter (gene proP) [6]. - Escherichia coli xylose-proton symport (xylE). - 
Zymomonas mobilis glucose facilitated diffusion protein (gene glf). - Yeast high and low 
affinity glucose transport proteins (genes SNF3, HXT1 to HXT14). - Yeast galactose 
transporter (gene GAL2). - Yeast maltose permeases (genes MAL3T and MAL6T). - Yeast 
myo-inositol transporters (genes ITR1 and ITR2). - Yeast carboxylic acid transporter protein 
homolog JEN1. - Yeast inorganic phosphate transporter (gene PHQ84). - Kluyveromyces 
lactis lactose permease (gene LAC12). - Neurospora crassa quinate transporter (gene Qa-y), 
and Emericella nidulans quinate permease (gene qutD). - Chlorella hexose carrier (gene 
HUP1). - Arabidopsis thaliana glucose transporter (gene STP1). - Spinach sucrose 
transporter. - Leishmania donovani transporters Dl and D2. - Leishmania enriettii probable 
transport protein (LTP). - Yeast hypothetical proteins YBR241c, YCR98c and YFL040w. - 
Caenorhabditis elegans hypothetical protein ZK637.1. - Escherichia coli hypothetical proteins 
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yabE, ydjE and yhjE. - Haemophilus influenzae hypothetical proteins HI0281 and HI0418. - 
Bacillus subtilis hypothetical proteins yxbC and yxdF. It has been suggested [4] that these 
transport proteins have evolved from theduplication of an ancestral protein with six 
transmembrane regions, this hypothesis is based on the conservation of two G-R-[KR] motifs. 
5 The first one is located between the second and third transmembrane domains and the second 
one between transmembrane domains 8 and 9. Two patterns have been developed to detect 
this family of proteins. The first pattern is based on the G-R-[KR] motif; but because this 
motif is too short to be specific to this family of proteins, a pattern from a larger region 
centered on the second copy of this motif was derived. The second pattern is based on a 
1 0 number of conserved residues which are located at the end of the fourth transmembrane 
segment and in the short loop region between the fourth and fifth segments. 
Consensus pattern: [LIVMSTAG]-[LIVMFSAG]-x(2)-[LIVMSA]-[DE]-x-[LIVMFYWA]- 
G- R-[RK]-x(4,6)-[GSTA] 

Consensus pattern: [LIVMF]-x-G-[LIVMFA]-x(2)-G-x(8)-[LIFY]-x(2)-[EQ]-x(6)- [RK] 
15 [1] Silverman M. Annu. Rev. Biochem. 60:757-794(1991).[ 2] Gould G.W., Bell G.I. Trends 
Biochem. Sci. 15:18-23(1990).[ 3] Baldwin S.A. Biochim. Biophys. Acta 1154:17-49(1993).[ 
4] Maiden M.C.J., Davis E.O., Baldwin S A., Moore D.C.M., Henderson P.J.F. Nature 
325:641-643(1987).[ 5] Henderson P.J.F. Curr. Opin. Struct. Biol. 1:590-601(1991).[ 6] 
Culham D.E., Lasby B., Marangoni A.G., Milner J.L., Steer B.A., van Nues R.W., Wood 
2 0 J.M. J. Mol. Biol. 229:268-276(1993). 



633. Synaptobrevin signature 

Synaptobrevin [1] is an intrinsic membrane protein of small synaptic vesicles whose function 
25 is not yet known, but which is highly conserved in mammals, electric ray (where its is known 
as VAMP-1), Drosophila and yeast [2]. In yeast there are two closely related forms of 
synaptobrevin (genes SNC1 andSNC2) while in mammals there is at least 4 (genes SYB1, 
SYB2, SYB3 and SYBLl).Structurally synaptobrevin consist of a N-terminal cytoplasmic 
domain of from 90 to 110 residues, followed by a transmembrane region, and then by a short 
30 (from 2 to 22 residues) C-terminal intravesicular domain. As a signature pattern for 
synaptobrevin, a highly conserved stretch of residues located in the central part of the 
sequence was selected. 

Consensus pattern: N-[LIVM]-[DENS]-[KL]-V-x-[DEQ]-R-x(2)-[KR]-[LIVM]-[STDE]- x- 
[LIVM]-x-[DE]-[KR]-[TA]-[DE] 
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[ 1] Suedhof T.C., Baumert M., Perin M.S., Jahn R. Neuron 2:1475-1481(1989).[ 2] Gerst 
J.E., Rodgers L., Riggs M., Wigler M. Proc. Natl. Acad. Sci. U.S.A. 89:4338-4342(1992). 

634. TBC domain. Identification of a TBC domain in GYP6_YEAST and GYP7_YEAST, 
which are GTPase activator proteins of yeast Ypt6 and Ypt7, imply that these domains are 
GTPase activator proteins of Rab-like small GTPases. Number of members; 55 

[1] Medline: 96032578. Molecular cloning of a cDNA with a novel domain present in the 
tre-2 oncogene and the yeast cell cycle regulators BUB2 and cdcl6. Richardson PM ? Zon LI; 
Oncogene 1995;11:1139-1148. 

[2]Medline: 97398935. A shared domain between a spindle assembly checkpoint protein and 
Ypt/Rab-specific GTPase-activators. Neuwald AF; Trends Biochem Sci 1997;22:243-244. 

635. Transcription factor TFIID repeat signature (TBP) 

Transcription factor TFIID (or TATA-binding protein, TBP) [1,2] is a general factor that 
plays a major role in the activation of eukaryotic genes transcribed by RNA polymerase II. 
TFIID binds specifically to the TATA box promoter element which lies close to the position 
of transcription initiation. There is a remarkable degree of sequence conservation of a C- 
terminal domain of about 180 residues in TFIID from various eukaryotic sources. This region 
isnecessary and sufficient for TATA box binding. The most significant structural feature of 
this domain is the presence of two conserved repeats of a 77 amino-acid region. The 
intramolecular symmetry generates a saddle-shaped structure that sits astride the DNA [3]. 
Drosophila TRF (TBP-related factor) [4] is a sequence-specific transcription factor that also 
binds to the TATA box and is highly similar to TFIID. Archaebacteria also possess a TBP 
homolog [5]. A signature pattern that spans the last 50 residues of the repeated region has 
been derived.- 

Consensus pattern: Y-x-P-x(2)-[IF]-x(2)-[LIVM](2)-x-[KRH]-x(3)-P-[RKQ]-x(3)- L- 
[LIVM]-F-x-[STN]-G-[KR]-[LIVM]-x(3)-G-[TAGL]-[KR]-x(7)-[AGC]-x(7)-[LIVM 
[ 1] Hoffmann A., Sinn E. ? Yamamoto T., Wang J., Roy A., Horikoshi M., Roeder R.G. 
Nature 346:387-390(1990).[ 2] Gash A., Hoffmann A., Horikoshi M., Roeder R.G., Chua N.- 
H. Nature 346:390-394(1990).[ 3] Nikolov D.B., Hu S.-H., Lin J. ? Gasch A. ? Hoffmann A., 
Horikoshi M., Chua N.-H., Roeder R.G., Burley S.K. Nature 360:40-46(1992).[ 4] Crowley 
T.E., Hoey T., Liu J.-K., Jan Y.N., Jan L.Y., Tjian R. Nature 361:557-561(1993).[ 5] Marsh 
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T.L., Reich C.I., Whitelock R.B., Olsen G.J. Proc. Natl. Acad. Sci. U.S.A. 91:4180- 
4184(1994). 

636. Translationally controlled tumor protein signatures (TCTP) 

Mammalian translationally controlled tumor protein (TCTP) (or P23) is a protein which has 
been found to be preferentially synthesized in cells during the early growth phase of some 
types of tumor [1,2], but which is also expressed in normal cells. The physiological function 
of TCTP is still not known. It is a hydrophilic protein of 18 to 20 Kd. Close homologs have 
been found in plants [3], earthworm [4], Caenorhabditis elegans (F52H2.11), Hydra, budding 
yeast (YKL056c) [5] and fission yeast (SpAClF12.02c) Two of the best conserved regions 
have been selected as signature patterns for TCTP. 

Consensus pattern: [IFA]-[GA]-[GAS]-N-[PAK]-S-[GA]-E-[GDE]-[PAGE]-[DEQGA] 
Consensus pattern: [FLVH]-[FY]-[IVCT]-G-E-x-[MA]-x(2,5)-[DEN]-[GAST]-x-[LV]- 
[AV]-x(3)-[FYW] 

[ 1] Boehm H., Beendorf R., Gaestel M., Gross B., Nuernberg P., Kraft R., Otto A., Bielka H. 
Biochem. Int. 19:277-286(1989).[ 2] Makrides S., Chitpatima S.T., Bandyopadhyay R., 
Brawerman G. Nucleic Acids Res. 16:2350-2350(1988).[ 3] Pay A., Heberle-Bors E., ffirt H. 
Plant Mol. Biol. 19:501-503(1992).[ 4] Stuerzenbaum S.R., Kille P., Morgan A.J. Biochim. 
Biophys. Acta 1398:294-304(1998).[ 5] Rasmussen S.W. Yeast 10:S63-S68(1994). 

637. TFIIS zinc ribbon domain signature 

Transcription factor S-II (TFIIS) [1] is a eukaryotic protein necessary for efficient RNA 
polymerase II transcription elongation, past template-encoded pause sites. TFIIS shows 
DNA-binding activity only in the presence of RNA polymerase II. It is a protein of about 300 
amino acids whose sequence is highly conserved in mammals, Drosophila, yeast (where it 
was first known as PPR2, a transcriptional regulator of URA4, and then as DST1, the DNA 
strand transfer protein alpha [2]) and in the archaebacteria Sulfolobus acidocaldarius [3] .This 
family also includes the eukaryotic and archebacterial RNA polymerase subunits of the 15 
Kd / M family (see <PDOC00790>) as well as the following viral proteins: - Vaccinia virus 
RNA polymerase 30 Kd subunit (rpo30) [4]. - African swine fever virus protein I243L 
[5]. The best conserved region of all these proteins contains four cysteines that bind a zinc ion 
and fold in a conformation termed a 'zinc ribbon' [6]. Besides these cysteines, there are a 
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number of other conserved residues which can be used to help define a specific pattern for 
this type of domain. 

Consensus pattern: C-x(2)-C-x(9)-[LIVMQSAR]-[QH]-[STQL]-[RA]-[SACR]-x-[DE]- 
[DET]-[PGSEA]-x(6)-C-x(2,5)-C-x(3)-[FW] [The four Cs are zinc ligands] 
[ 1] Hirashima S., Hirai H., Nakanishi Y., Natori S. L Biol. Chem. 263:3858-3863(1988).[ 2] 
Kipling D., Kearsey S.E. Nature 353:509-509(1991).[ 3] Langer D., Zillig W. Nucleic Acids 
Res. 21:2251-2251(1993).[ 4] Ahn B.Y., Gershon P.D., Jones E.V., Moss B. Mol. Cell. Biol. 
10:5433-5441(1990).[ 5] Rodriguez J.M., Salas M.L., Vinuela E. Virology 186:40-52(1992).[ 
6] Qian X., Jeon C, Yoon H., Agarwal K., Weiss MA Nature 365:277-279(1993). 

638. Tetrahydrofolate dehydrogenase/cyclohydrolase signatures (THF DHG CYH) 
Enzymes that participate in the transfer of one-carbon units are involved in various 
biosynthetic pathways. In many of these processes the transfers of one-carbon units are 
mediated by the coenzyme tetrahydrofolate (THF). Various reactions generate one-carbon 
derivatives of THF which can be interconverted between different oxidation states by 
formyltetrahydrofolate synthetase(EC 6.3.4.3 ). methylenetetrahydrofolate dehydrogenase 
(EC 1.5.1.5 or EC 1.5.1.15 ) and methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9 ).The 
dehydrogenase and cyclohydrolase activities are expressed by a variety of multifunctional 
enzymes: - Eukaryotic C-l -tetrahydrofolate synthase (Cl-THF synthase), which catalyzes all 
three reactions described above. Two forms of Cl-THF synthases are known [1], one is 
located in the mitochondrial matrix, while the second one is cytoplasmic. In both forms the 
dehydrogenase/cyclohydrolase domain is located in the N-terminal section of the 900 amino 
acids protein and consists of about 300 amino acid residues. The Cl-THF synthases are 
NADP- dependent. - Eukaryotic mitochondrial bifunctional dehydrogenase/cyclohydrolase 
[2]. This is an homodimeric NAD-dependent enzyme of about 300 amino acid residues. - 
Bacterial folD [3]. FolD is an homodimeric bifunctional NADP-dependent enzyme of about 
290 amino acid residues. The sequence of the dehydrogenase/cyclohydrolase domain is 
highly conserved in all forms of the enzyme. Two conserved regions have been selected as 
signature patterns. The first one is located in the N-terminal part of these enzymes and 
contains three acidic residues. The second pattern is a highly conserved sequence of 9 amino 
acids which is located in the C-terminal section. 

Consensus pattern: [EQ]-x-[EQK]-[LIVM](2)-x(2)-[LIVM]-x(2)-[LIVMY]-N-x-[DN]- x(5)- 
[LIVMF](3)-Q-L-P-[LV] 
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Consensus pattern: P-G-G-V-G-P-[MF]-T-[IV] 

[ 1] Shannon K.W., Rabinowitz J.C. J. Biol. Chem. 263:7717-7725(1988).[ 2] Belanger C, 
Mackenzie R.E. J. Biol. Chem. 264:4837-4843(1989).[ 3] d'Ari L., Rabinowitz J.C. J. Biol. 
Chem. 266:23953-23958(1991). 

639. Triosephosphate isomerase active site (TIM) 

Triosephosphate isomerase (EC 5.3.1.1) (TIM) [1] is the glycolytic enzyme that catalyzes the 
reversible interconversion of glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. 
TIM plays an important role in several metabolic pathways and is essential for efficient 
energy production. It is a dimer of identical subunits, each of which is made up of about 250 
amino-acid residues. A glutamic acid residue is involved in the catalytic mechanism [2]. The 
sequence around the active site residue is perfectly conserved in all known TIM's and can be 
used as a signature pattern for this type of enzyme. 

Consensus pattern: [AV]-Y-E-P-[LIVM]-W-[SA]-I-G-T-[GK] [E is the active site residue] 
[ 1] Lolis E., Alber T., Davenport R.C., Rose D., Hartman F.C., Petsko G.A. Biochemistry 
29:6609-6618(1990).[ 2] Knowles J.R. Nature 350:121-124(1991). 

640. Thymidine kinase cellular-type signature (TK) 

Thymidine kinase (TK) (EC 2.7.1.2D is an ubiquitous enzyme that catalyzes the ATP- 
dependent phosphorylation of thymidine. A comparison of TK sequences has shown [1,2,3] 
that there are two different families of TK. One family groups together TK from herpes 
viruses as well as cellular thymidylate kinases, while the second family currently consists of 
TK from the following sources: - Vertebrates. - Bacterial. - Bacteriophage T4. - Pox viruses. 
- African swine fever virus (ASF). - Fish lymphocystis disease virus (FLDV).A conserved 
region which is located in the C-terminal section of these enzymes has been selected as a 
signature pattern for this family of TKA. 

Consensus pattern: [GA]-x(l,2)-[DE]-x-Y-x-[STAP]-x-C-[NKR]-x-[CH]-[LIVMFYWH] 
[ 1] Boyle D.B., Coupar B.E.H., Gibbs A.J., Seigman L.J., Both G.W. Virology 156:355- 
365(1987).[ 2] Blasco R., Lopez-Otin C, Munoz M., Bockamp E.-O., Simon-Mateo C, 
Vinuela E. Virology 178:301-304(1990).[ 3] Robertson G.R., Whalley J.M. Nucleic Acids 
Res. 16:11303-11317(1988). 
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641. Thymidine kinase from herpesvirus (TK herpes) 
[1] 

Medline: 96003730 

Crystal structures of the thymidine kinase from herpes 
simplex virus type-1 in complex with deoxy thymidine and 
ganciclovir. 

Brown DG, Visse R, Sandhu G, Davies A, Rizkallah PJ, Melitz 
C, Summers WC, Sanderson MR; 
Nat Struct Biol 1995;2:876-881. 
Number of members: 65 

642. Nuclear transition protein 2 signatures (TP2) 

In mammals, the second stage of spermatogenesis is characterized by the conversion of 
nucleosomal chromatin to the compact, non-nucleosomal and transcriptionally inactive form 
found in the sperm nucleus. This condensation is associated with a double-protein transition. 
The first transition corresponds to the replacement of histones by several spermatid-specific 
proteins, also called transition proteins, which are themselves replaced by protamines during 
the second transition. Nuclear transition protein 2 (TP2) is one of those spermatid-specific 
proteins. TP2 is a basic, zinc-binding protein [1] of 116 to 137 amino-acid residues. 
Structurally, TP2 consists of three distinct parts: a conserved serine-rich N-terminal domain 
of about 25 residues, a variable central domain of 20 to 50 residues which contains cysteine 
residues, and a conserved C-terminal domain of about 70 residues rich in lysines and 
arginines. Two signature patterns for TP2 have been developed: one located in the N-terminal 
domain, the other in the C-terminal. 
Consensus pattern: H-x(3)-H-S-[NS]-S-x-P-Q-S 
Consensus pattern: K-x-R-K-x(2)-E-G-K-x(2)-K-[KR]-K 

[ 1] Baskaran R., Rao M.R.S. Biochem. Biophys. Res. Commun. 179:1491-1499(1991). 
643. Thiamine pyrophosphate enzymes signature (TTP enzymes) 

A number of enzymes require thiamine pyrophosphate (TPP) (vitamin Bl) as a cofactor. It 
has been shown [1] that some of these enzymes are structurally related. These related TPP 
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enzymes are: - Pyruvate oxidase (POX) (EC 1.2.3.3) Reaction catalyzed: pyruvate + 
orthophosphate + 0(2) + H(2)0 = acetyl phosphate + CO(2) + H(2)0(2). - Pyruvate 
decarboxylase (PDC) (EC 4.1.1.1) Reaction catalyzed: pyruvate = acetaldehyde + CO(2). - 
Indolepyruvate decarboxylase (EC 4.1.1.74 ) [2] Reaction catalyzed: indole-3-pyruvate = 
indole-3-acetaldehyde + CO(2). - Acetolactate synthase (ALS) (EC 4.1.3.18) Reaction 
catalyzed: 2 pyruvate = acetolactate + CO(2). - Benzoylformate decarboxylase (BFD) (EC 
4.1.1.7 ) [3] Reaction catalyzed: benzoylformate = benzaldehyde + CO(2). A conserved 
region which is located in their C-terminal section has been selected as a signature pattern for 
these enzymes. 

Consensus pattern: [LIVMF]-[GSA]-x(5)-P-x(4)-[LIVMFYW]-x-[LIVMF]-x-G-D-[GSA]- 
[GSAC] 

[ 1] Green J.B.A. FEBS Lett. 246 : 1-5(1 989). [ 2] Koga J., Adachi T., Hidaka H. Mol. Gen. 
Genet. 226:10-16(1991).[ 3] Tsou A.Y., Ransom S.C., Gerlt J.A., Buechter D.D., Babbitt 
P.C., Kenyon G.L. Biochemistry 29:9856-9862(1990). 

644. TPR Domain 
[1] 

Medline: 95397415 

Tetratrico peptide repeat interactions: to TPR or not to TPR? 
Lamb JR, Tugendreich S, Hieter P; 

Trends Biochem Sci 1995;20:257-259. 

[2]Medline: 98151343 
The structure of the tetratricopeptide repeats of protein 
phosphatase 5: implications for TPR-mediated protein-protein 
interactions. 
Das AK, Cohen PW, Barford D; 

EMBO J 1998;17:1192-1199. 
Number of members: 621 

645. Uroporphyrin-III C-methyltransf erase signatures (TP methylase) 
Uroporphyrin-III C-methyltransferase (EC 2.1.1.107) (SUMT) [1,2] catalyzes the transfer of 
two methyl groups from S-adenosyl-L-methionine to the C-2 and C-7atoms of 
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uroporphyrinogen III to yield precorrin-2 via the intermediate formation of precorrin-1. 
SUMT is the first enzyme specific to the cobalamin pathway and precorrin-2 is a common 
intermediate in the biosynthesis of corrinoids such as vitamin B12, siroheme and coenzyme 
F430.The sequences of SUMT from a variety of eubacterial and archaebacterial species are 
currently available. In species such as Bacillus megaterium (gene cobA), Pseudomonas 
denitrificans (cob A) or Methanobacterium ivanovii (gene corA) SUMT is a protein of about 
25 to 30 Kd. In Escherichia coli and related bacteria, the cysG protein, which is involved in 
the biosynthesis of siroheme, is a multifunctional protein composed of a N-terminal domain, 
probably involved in transforming precorrin-2 into siroheme, and a C-terminal domain which 
has SUMT activity. The sequence of SUMT is related to that of a number of P. denitrificans 
and Salmonella typhimurium enzymes involved in the biosynthesis of cobalamin which also 
seem to be SAM-dependent methyltransf erases [3,4]. The similarity is especially strong with 
two of these enzymes: cobl/cbiL which encodes S-adenosyl-L-methionine~precorrin-2 
methyltransferase and cobM/cbiF whose exact function is not known. Two signature patterns 
have been developed for these enzymes. The first corresponds to a well conserved region in 
the N-terminal extremity (called region 1 in [1,3]) and the second to a less conserved region 
located in the central part of these proteins (this pattern spans what are called regions 2 and 3 
in [1,3]). 

Consensus pattern: [LIVM]-[GS]-[STAL]-G-P-G-x(3)-[LIVMFY]-[LIVM]-T-[LIVM]- 
[KRHQG]-[AG] 

Consensus pattern: V-x(2)-[LI]-x(2)-G-D-x(3)-[FYW]-[GS]-x(8)-[LIVF]-x(5,6)- 
[LIVMFYWPAC]-x-[LIVMY]-x-P-G 

[ 1] Blanche F., Robin C, Couder M., Faucher D., Cauchois L., Cameron B., Crouzet J. J. 
Bacteriol. 173:4637-4645(1991).[ 2] Robin C, Blanche F., Cauchois L., Cameron B., Couder 
M., Crouzet J. J. Bacteriol. 173:4893-4896(1991).[ 3] Crouzet J., Cameron B. ? Cauchois L., 
Rigault S., Rouyez M.-C, Blanche F., Thibaut D., Debussche L. J. Bacteriol. 172:5980- 
5990(1990).[ 4] Roth J.R., Lawrence J.G., Rubenfield M., Kieffer-Higgins S., Church G.M. J. 
Bacteriol. 175:3303-3316(1993).[ 5] Mattheakis L.C., Shen W.H., Collier R.J. Mol. Cell. 
Biol. 12:4026-4037(1992). 

646. Tudor domain 

Domain of unknown function present in several RNA-binding proteins, copies in the 
Drosophila Tudor protein. Slight ambiguities in the alignment.Number of members: 18 
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[l]Medline: 97200561 Tudor domains in proteins that interact with RNA. Ponting CP; 
Trends Biochem Sci 1997;22:51-52. [2]Medline: 97157029 The human EBNA-2 
coactivator plOO: multidomain organization and relationship to the staphylococcal nuclease 
fold and to the tudor protein involved in Drosophila melanogaster development. Callebaut I, 
Mornon JP; Biochem J 1997;321:125-132. 

647. Terpene synthase family 
It has been suggested that this gene family be designated 
tps (for terpene synthase) [1]. It has been split into six 
subgroups on the basis of phylogeny, called tpsa-tpsf, 
tpsa includes vetispiridiene synthase Swiss:Q39979, 5-epi- 
aristolochene synthase, Swiss:Q40577 and (-f)-delta-cadinene 
synthase Swiss:P93665. 

tpsb includes (-)-limonene synthase, Swiss: Q40322. 
tpsc includes kaurene synthase A, Swiss:O04408. 
tpsd includes taxadiene synthase, Swiss:Q41594, pinene synthase, 
Swiss:024475 and myrcene synthase, Swiss:024474. 
tpse includes kaurene synthase B. 
tpsf includes linalool synthase. 
Number of members: 51 

[1] 

Medline: 97413772 

Monoterpene synthases from grand fir (Abies grandis). cDNA 
isolation, characterization, and functional expression of 
myrcene synthase, (-)-(4S)-limonene synthase, and 
(-)-(lS,5S)-pinene synthase. 
Bohlmann J, Steele CL, Croteau R; 
J Biol Chem 1997;272:21784-21792. 

648. ThiF family 
This family contains a repeated domain in ubiquitin 
activating enzyme El and members of the bacterial 
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ThiF/MoeB/HesA family .Number of members: 87 

649. Thioester dehydrase 

Members of this family are involved in fatty acid biosynthesis. 
Number of members: 19 

[i] 

Medline: 96398612 

Structure of a dehydratase-isomerase from the bacterial 
pathway for biosynthesis of unsaturated fatty acids: two 
catalytic activities in one active site. 
Leesong M, Henderson BS, Gillig JR, Schwab JM, Smith JL; 
Structure 1996;4:253-264. 
Database Reference: SCOP; lmka; fa; [SCOP-USA] [CATH-PDBSUM] 
Database reference: PFAMB; PB058036; 



650. Tub family signatures 

The mouse tubby mutation is the cause of maturity-onset obesity, insulin resistance and 
sensory deficits. This mutation maps to a gene, tub [l,2],which codes for a protein that 
belongs to a family which currently consists of the following members: - Mammalian tub, an 
hydrophilic protein of about 500 residues, which could be involved in the hypothalamic 
regulation of body weight. - Human protein TULP1 [3] which may be involved in retinis 
pigmentosa 14, a retinal degeneration disease. - Mouse protein p4-6 whose function is not 
known. - Caenorhabditis elegans hypothetical protein F10B5.4. - Several fragmentary 
sequences from plants, Drosophila and human ESTs. While the N-terminal part of these 
protein is not conserved in length nor in the sequence, the C-terminal 250 residues are highly 
conserved. Therefore, two regions were selected in the C-terminal part as signature patterns. 
The secondr egion is located at the C-terminal extremity and contains a penultimate cysteine 
residue that could be critical to the normal functioning of these proteins. 
Consensus pattern: F-[KHQ]-G-R-V-[ST]-x-A-S-V-K-N-F-Q 
Consensus pattern: A-F-[AG]-I-[SAC]-[LIVM]-[ST]-S-F-x-[GST]-K-x-A-C-E 
[ 1] Kleyn P.W., Fan W., Kovats S.G., Lee J.L., Pulido J.C., Wu Y., Berkemeier L.R., 
Misumi D.J., Holmgren L., Charlat O., Woolf E.A., Tayber O., Brody T., Shu P., Hawkins F., 
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Kennedy B., Baldini L., Ebeling C, Alperin G.D., Deeds J., Lakey N.D., Culpepper J., Chen 
H., Gluecksmann-Kuis MA., Carlson G A., Duyk G.M., Moore KJ. Cell 85:281-290(1996).[ 
2] Noben-Trauth K., Naggert J.K., North M.A, Nishina P.M. Nature 380:534-538(1996).[ 3] 
North MA., Naggert J.K., Yan Y., Noben-Trauth K., Nishina P.M. Proc. Natl. Acad. Sci. 
U.SA. 94:3128-3133(1997). 

651. Eukaryotic DNA topoisomerase I active site 

DNA topoisomerase I (EC 5.99.1.2 ) [1,2,3,4,E1] is one of the two types of enzyme that 
catalyze the interconversion of topological DNA isomers. Type Itopoisomerases act by 
catalyzing the transient breakage of DNA, one strand at a time, and the subsequent rejoining 
of the strands. When a eukaryotic type 1 topoisomerase breaks a DNA backbone bond, it 
simultaneously forms a protein-DNA link where the hydroxyl group of a tyrosine residue is 
joined to a 3 -phosphate on DNA, at one end of the enzyme-severed DNA strand. In 
eukaryotes and pox virus topoisomerases I, there are a number of conserved residues in the 
region around the active site tyrosine. 

Consensus pattern: [DEN]-x(6)-[GS]-[IT]-S-K-x(2)-Y-[LIVM]-x(3)-[LIVM] [Y is the active 
site tyrosine] 

[ 1] Sternglanz R. Curr. Opin. Cell Biol. 1:533-535(1990).[ 2] Sharma A., Mondragon A. 
Curr. Opin. Struct. Biol. 5:39-47(1995).[ 3] Lynn R.M., Bjornsti M.-A, Caron P.R., Wang 
J.C. Proc. Natl. Acad. Sci. U.SA. 86:3559-3563(1989).[ 4] Roca J. Trends Biochem. Sci. 
20:156-160(1995).[E1] 

652. Transaldolase signatures 

Transaldolase (EC 2.2.1.2 ) catalyzes the reversible transfer of a three-carbonketol unit from 
sedoheptulose 7-phosphate to glyceraldehyde 3-phosphate to form erythrose 4-phosphate and 
fructose 6-phosphate. This enzyme, together with transketolase, provides a link between the 
glycolytic and pentose-phosphate pathways. Transaldolase is an enzyme of about 34 Kd 
whose sequence has been well conserved throughout evolution. A lysine has been implicated 
[l]in the catalytic mechanism of the enzyme; it acts as a nucleophilic group that attacks the 
carbonyl group of fructose-6-phosphate/Transaldolase is evolutionary related [2] to a 
bacterial protein of about 20Kd (known as talC in Escherichia coli), whose exact function is 
not yet known. Two signature patterns have been developed for these proteins. The first, 
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located in the N-terminal section, contains a perfectly conserved pentapeptide; these cond, 
includes the active site lysine. 

Consensus pattern: [DG]-[IVSA]-T-[ST]-N-P-[STA]-[LIVMF](2) 

Consensus pattern: [LIVM]-x-[LIVM]-K-[LIVM]-[PAS]-x-[ST]-x-[DENQPAS]-G- [LIVM]- 
x-[AGV]-x-[QEKRST]-x-[LIVM] [K is the active site residue] 

[ 1] Miosga T., Schaaff-Gerstenschlaeger I., Franken E., Zimmermann F.K. Yeast 9:1241- 
1249(1993). [ 2] Reizer J., Reizer A., Saier M.H. Jr. Microbiology 141:961-971(1995). 

653. (Transpeptidase) Penicillin binding protein transpeptidase domain 

The active site serine (residue 337 in Swiss:P l467T> is conserved in all members of this 
family. 

[1] Pares S, Mouz N, Petillot Y, Hakenbeck R, Dideberg O Nat Struct Biol 1996;3:284-289. 

654. Trehalase signatures 

Trehalase (EC 3.2.1.28^ is the enzyme responsible for the degradation of the disaccharide 
alpha, alpha-trehalose yielding two glucose subunits [1]. It is an enzyme found in a wide 
variety of organisms and whose sequence has been highly conserved throughout evolution. 
Two of the most highly conserved regions have been selected as signature patterns. The first 
pattern is located in the central section, the second one is in the C-terminal region. 
Consensus pattern: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y 
Consensus pattern: Q-W-D-x-P-x-[GA]-W-[PAS]-P 

[ 1] Kopp M., Mueller H., Holzer H. J. Biol. Chem. 268:4766-4774(1993).[ 2] Henrissat B., 
Bairoch A. Biochem. J. 293:781-788(1993).[E1] 

655. Trehalose-6-phosphate synthase domain 

OtsA (Trehalose-6-phosphate synthase) is homologous to regions 
in the subunits of yeast trehalose-6-phosphate synthase/phosphate complex, [1]. 
[1] Kaasen I, McDougall J, Strom AR; Gene 1994;145:9-15. 
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656. Tropomyosins signature 

Tropomyosins [1,2] are family of closely related proteins present in muscle and non-muscle 
cells. In striated muscle, tropomyosin mediate the interactions between the troponin complex 
and actin so as to regulate muscle contraction. The role of tropomyosin in smooth muscle and 
non-muscle tissues is not clear. Tropomyosin is an alpha-helical protein that forms a coiled- 
coil dimer. Muscle isoforms of tropomyosin are characterized by having 284 amino acid 
residues and a highly conserved N-terminal region, whereas non-muscle forms are generally 
smaller and are heterogeneous in their N-terminal region. The signature pattern for 
tropomyosins is based on a very conserved region in the C-terminal section of tropomyosins 
and which is present in both muscle and non-muscle forms. 
Consensus pattern: L-K-E-A-E-x-R-A-E 

[ 1] Smilie L.B. Trends Biochem. Sci. 4:151-155(1979).[ 2] McLeod A.R. BioEssays 6:208- 
212(1986). 

657. Troponin 

Troponin (Tn) contains three subunits, Ca2+ binding (TnC), 
inhibitory (Tnl), and tropomyosin binding (TnT). this Pfam contains 
members of the TnT subunit. 

Troponin is a complex of three proteins, Ca2+ binding (TnC), 

inhibitory (Tnl), and tropomyosin binding (TnT). 

The troponin complex regulates Ca++ induced muscle contraction. 

This family includes troponin T and troponin L Troponin I 

binds to actin and troponin T binds to tropomyosin. 
Number of members: 81 [1] 
Medline: 87144593 

Structure of co-crystals of tropomyosin and troponin. 
White SP, Cohen C, Phillips GN Jr; 
Nature 1987;325:826-828. [2] 
Medline: 95155315 

A direct regulatory role for troponin T and a dual role for 
troponin C in the Ca2+ regulation of muscle contraction. 
Potter JD, Sheng Z, Pan BS, Zhao J; 
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J Biol Chem 1995;270:2557-2562. 
[3]Medline: 95324796 
The troponin complex and regulation of muscle contraction. 
Farah CS, Reinach FC; 
FASEB J 1995;9:755-767. 

658. (Tryp mucin) Mucin-like glycoprotein 

This family of trypanosomal proteins resemble vertebrate mucins. The protein consists of 
three regions. The N and C terminii are conserved between all members of the family, 
whereas the central region is not well conserved and contains a large number of threonine 
residues which can be glycosylated [1]. 

Indirect evidence suggested that these genes might encode the core protein of parasite 
mucins, glycoproteins that were proposed to be involved in the interaction with, and invasion 
of, mammalian host cells. 

[1] Di Noia JM, Sanchez DO, Frasch AC; J Biol Chem 1995;270:24146-24149. 

[2] Di Noia JM, D'Orso I, Aslund L, Sanchez DO, Frasch AC; J Biol Chem 1998;273: 10843- 

10850. 

659. Aminoacyl-transfer RNA synthetases class-I signature (tRNA synt 1) 
Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino 
acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 
prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each differentamino acid. In eukaryotes there are generally two 
aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
mitochondrial form. While all these enzymes have a common function, they are widely 
diverse interms of subunit size and of quaternary structure. A few years ago it was found [2] 
that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particular the consensus tetrapeptide His-Ile-Gly-His ('HIGH') is very well 
conserved. The "HIGH' region has been shown [3] to be part of the adenylate binding site. 
The 'HIGH' signature has been found in the aminoacyl-tRNA synthetases specific for 
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arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, 
tryptophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-I 
synthetases [4,5,6] and seem to share the same tertiarystructure based on a Rossmann fold. 
Consensus pattern: P-x(0,2)-[GSTAN]-[DENQGAPK]-x-[LIVMFP]-[HT]-[LIVMYAC]-G- 
[HNTG]-[LIVMFYSTAGPC] 

[ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987).[ 2] Webster T., Tsai H., Kula M., 
Mackie G.A., Schimmel P. Science 226:1315-1317(1984).[ 3] Brick P., Bhat T.N., Blow 
D.M. J. Mol. Biol. 208:83-98(1988).[ 4] Delarue M., Moras D. BioEssays 15:675- 
687(1993).[ 5] Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[ 6] Nagel G.M., Doolittle 
R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 

660. Aminoacyl-transfer RNA synthetases class-I signature (tRNA synt lb) 
Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino 
acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 
prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two 
aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
mitochondrial form. While all these enzymes have a common function, they are widely 
diverse in terms of subunit size and of quaternary structure. A few years ago it was found [2] 
that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particular the consensus tetrapeptide His-Ile-Gly-His (HIGH') is very well 
conserved. The 'HIGH' region has been shown [3] to be part of the adenylate binding site. 
The HIGH' signature has been found in the aminoacyl-tRNA synthetases specific 
forarginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, 
tryptophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-I 
synthetases [4,5,6] and seem to share the same tertiary structure based on a Rossmann fold. 
Consensus pattern: P-x(0,2)-[GSTAN]-[DENQGAPK]-x-[LIVMFP]-[HT]-[LIVMYAC]-G- 
[HNTG] - [LI VMFYSTAGPC 

[ 1] Schimmel P. Annu. Rev. Biochem. 56: 125-158(1987). [ 2] Webster T., Tsai H., Kula M., 
Mackie G.A., Schimmel P. Science 226:1315-1317(1984).[ 3] Brick P., Bhat T.N., Blow 
D.M. J. Mol. Biol. 208:83-98(1988).[ 4] Delarue M., Moras D. BioEssays 15:675- 
687(1993).[ 5] Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[ 6] Nagel G.M., Doolittle 
R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
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661. (tRNA-synt 1C) tRNA synthetases class I (E and Q) 

Other tRNA synthetase sub-families are too dissimilar to be included. 

This family includes only glutamyl and glutaminyl tRNA synthetases. 

In some organisms, a single glutamyl-tRNA synthetase aminoacylates both tRNA(Glu) and 

tRNA(Gln). 

[1] Rath VL, Silvian LF, Beijer B, Sproat BS, Steitz TA; Structure 1998;6:439-449. 

662. (tRNA-synt Id) tRNA synthetases class I (R) 

Other tRNA synthetase sub-families are too dissimilar to be included. 
This family includes only arginyl tRNA synthetase. 

663. Aminoacyl-transfer RNA synthetases class-II signatures (tRNA synt 2) 
Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino 
acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 
prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two 
aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
mitochondrial form. While all these enzymes have a common function, they are widely 
diverse interms of subunit size and of quaternary structure. The synthetases specific for 
alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-II synthetases [2 to 6] and probably have a common 
folding pattern in their catalytic domain for the binding of ATP and amino acid which is 
different to the Rossmann fold observed for the class I synthetases [7].Class-II tRNA 
synthetases do not share a high degree of similarity, however at least three conserved regions 
are present [2,5,8]. Signature patterns have been derived from two of these regions. 
Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE 
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Consensus pattern: [GSTALVF]-{DENQHRKP}-[GSTA]-[LIVMF]-[DE]-R-[LIVMF]-x- 
[LI VMSTAG] - [LI VMFY] 

[ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987).[ 2] Delarue M., Moras D. 
BioEssays 15:675-687(1993).[ 3] Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[ 4] Nagel 
G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). [ 5] Cusack S., 
Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991).[ 6] Cusack S. 
Biochimie 75:1077-1081(1993).[ 7] Cusack S., Berthet-Colominas C, Haertlein M., Nassar 
N., Leberman R. Nature 347:249-255(1990).[ 8] Leveque F., Plateau P., Dessen P., Blanquet 
S. Nucleic Acids Res. 18:305-312(1990). 

664. Aminoacyl-transfer RNA synthetases class-I signature (tRNA synt le) 
Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino 
acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 
prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two 
aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
mitochondrial form. While all these enzymes have a common function, they are widely 
diverse in terms of subunit size and of quaternary structure. A few years ago it was found [2] 
that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particular the consensus tetrapeptide His-Ile-Gly-His ('HIGH') is very well 
conserved. The HIGH' region has been shown [3] to be part of the adenylate binding site. 
The 'HIGH' signature has been found in the aminoacyl-tRNA synthetases specific 
forarginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, 
tryptophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-I 
synthetases [4,5,6] and seem to share the same tertiary structure based on a Rossmann fold. 
Consensus pattern: P-x(0,2)-[GSTAN]-[DENQGAPK]-x-[LIVMFP]-[HT]-[LIVMYAC]-G- 
[HNTG]-[LIVMFYSTAGPC 

[ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987).[ 2] Webster T., Tsai H., Kula M., 
Mackie G.A., Schimmel P. Science 226:1315-1317(1984).[ 3] Brick P., Bhat T.N., Blow 
D.M. J. Mol. Biol. 208:83-98(1988).[ 4] Delarue M., Moras D. BioEssays 15:675- 
687(1993).[ 5] Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[ 6] Nagel G.M., Doolittle 
R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
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665. Aminoacyl-transfer RNA synthetases class-II signatures (tRNA synt 2b) 
Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino 
acids and transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 
prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two 
aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
mitochondrial form. While all these enzymes have a common function, they are widely 
diverse interms of subunit size and of quaternary structure. The synthetases specific for 
alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-II synthetases [2 to 6] and probably have a common 
folding pattern in their catalytic domain for the binding of ATP and amino acid which is 
different to the Rossmann fold observed for the class I synthetases [7].Class-II tRNA 
synthetases do not share a high degree of similarity, however at least three conserved regions 
are present [2,5,8]. Signature patterns have been derived from two of these regions. 
Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE 
Consensus pattern: [GSTALVF]-{DENQHRKP}-[GSTA]-[LIVMF]-[DE]-R-[LIVMF]-x- 
[LI VMSTAG] - [LI VMF Y] 

[ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987).[ 2] Delarue M., Moras D. 
BioEssays 15:675-687(1993).[ 3] Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[ 4] Nagel 
G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). [ 5] Cusack S., 
Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991).[ 6] Cusack S. 
Biochimie 75:1077-1081(1993).[ 7] Cusack S., Berthet-Colominas C, Haertlein M., Nassar 
N., Leberman R. Nature 347:249-255(1990).[ 8] Leveque F., Plateau P., Dessen P., Blanquet 
S. Nucleic Acids Res. 18:305-312(1990). 

666. Thaumatin family signature 

Thaumatin [1] is an intensively sweet-tasting protein (100 000 times sweeter than sucrose on 
a molar basis) from Thaumatococcus daniellii, an African brush. The protein is made of about 
200 residues and contains 8 disulfide bonds. A number of proteins have been found to be 
related to thaumatins. These protein are listed below (references are only provided for 
recently determined sequences). - A maize alpha-amylase/trypsin inhibitor. - Two tobacco 
pathogenesis-related proteins: PR-R major and minor forms, which are induced after 
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infection with viruses. - Salt-induced protein NP24 from tomato. - Osmotin, a salt-induced 

protein from tobacco. - Osmotin-like proteins OSML13, OSML15 and OSML81 from potato 

[2]. - P21, a leaf protein from soybean. - PWIR2, a leaf protein from wheat. - Zeamatin, a 

maize antifunal protein [3]. The exact biological function of all these proteins is not yet 

5 known. A conserved region that includes three cysteine residues known (in thaumatin) to be 

involved in disulfide bonds has been selected as a signature pattern. 

_i_ j _j |_ | | ******* | 

II 

xxCxxxxxxxxxxxxxxxxCxxCxxCxCxxxxxxxxxxxxxxCxxCxCxxxCxCxxCCxCxxxCxxxxxC 

1 0 xxxCx 1 1 1 1 1 1 1 1 IN | +--+ +-+ I +— + +-++-+ I + +' c ' : conserved cysteine 

involved in a disulfide bond.'*': position of the pattern. 

Consensus pattern: G-x-[GF]-x-C-x-T-[GA]-D-C-x(l,2)-G-x(2,3)-C 

[ 1] Edens L., Heslinga L., Klok R., Ledeboer A.M., Maat J., Toonen M.Y., Visser C, 

Verrips C.T. Gene 18:1-12(1982).[ 2] Zhu B., Chen T.H.H., Li P.H. Plant Physiol. 108:929- 

1 5 937(1995).[ 3] Malehorn D.E., Borgmeyer J.R., Smith C.E., Shah D.M.; Plant Physiol. 
106:1471-1481(1994). 



667. Thiolases signatures 

2 0 Two different types of thiolase [1,2,3] are found both in eukaryotes and in prokaryotes: 

acetoacetyl-CoA thiolase (EC 2.3.1.9 ) and 3-ketoacyl-CoA thiolase(EC 2.3.1,16). 3-ketoacyl- 
CoA thiolase (also called thiolase I) has a broad chain-length specificity for its substrates and 
is involved in degradative pathways such as fatty acid beta-oxidation. Acetoacetyl-CoA 
thiolase (also called thiolase II) is specific for the thiolysis of acetoacetyl-CoA and involved 
25 in biosynthetic pathways such as poly beta-hydroxybutyrate synthesisor steroid biogenesis. In 
eukaryotes, there are two forms of 3-ketoacyl-CoA thiolase: one located in the mitochondrion 
and the other in peroxisomes. There are two conserved cysteine residues important for 
thiolase activity. The first located in the N-terminal section of the enzymes is involved in the 
formation of an acyl-enzyme intermediate; the second located at the C-terminal extremity is 

3 0 the active site base involved in deprotonation in the condensation reaction. Mammalian 

nonspecific lipid-transfer protein (nsL-TP) (also known as sterol carrier protein 2) is a protein 
which seems to exist in two different forms: a 14 Kd protein (SCP-2) and a larger 58 Kd 
protein (SCP-x). The former is found in the cytoplasm or the mitochondria and is involved in 
lipid transport; the latter is found in peroxisomes. The C-terminal part of SCP-x is identical to 
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SCP-2 while the N-terminal portion is evolutionary related to thiolases[4]. Three signature 
patterns have been developed for this family of proteins, two of which are based on the 
regions around the biologically important cysteines. The third is based on a highly conserved 
region in the C-terminal part of these proteins. 

Consensus pattern: [LIVM]-[NST]-x(2)-C-[SAGLI]-[ST]-[SAG]-[LIVMFYNS]-x- [STAG]- 
[LIVM]-x(6)-[LIVM] [C is involved in formation of acyl-enzyme intermediate] 
Consensus pattern: N-x(2)-G-G-x-[LIVM]-[SA]-x-G-H-P-x-[GA]-x-[ST]-G 
Consensus pattern: [AG]-[LIVMA]-[STAGCLIVM]-[STAG]-[LIVMA]-C-x-[AG]-x-[AG]- 
x- [AG]-x-[SAG] [C is the active site residue] 

[ 1] Peoples O.P., Sinskey A J. J. Biol. Chem. 264: 15293-15297(1989). [ 2] Yang S.-Y., Yang 
X.-Y.H., Healy-Louie G., Schulz H., Elzinga M. J. Biol. Chem. 265:10424-10429(1990).[ 3] 
Igual J.C., Gonzalez-Bosch C, Dopazo J., Perez-Ortin J.E. J. Mol. Evol. 35:147-155(1992).[ 
4] Baker M.E., Billheimer J.T., Strauss J.F. Ill DNA Cell Biol. 10:695-698(1991). 

668. Thioredoxin family active site 

Thioredoxins [1 to 4] are small proteins of approximately one hundred amino-acid residues 
which participate in various redox reactions via the reversible oxidation of an active center 
disulfide bond. They exist in either a reduced form or an oxidized form where the two 
cysteine residues are linked in an intramolecular disulfide bond. Thioredoxin is present in 
prokaryotes and eukaryotes and the sequence around the redox-active disulfide bond is 
wellconserved. Bacteriophage T4 also encodes for a thioredoxin but its primary structure is 
not homologous to bacterial, plant and vertebrate thioredoxins. A number of eukaryotic 
proteins contain domains evolutionary related tothioredoxin, all of them seem to be protein 
disulphide isomerases (PDI). PDI(EC 5.3.4.1 ) [5,6,7] is an endoplasmic reticulum enzyme 
that catalyzes the rearrangement of disulfide bonds in various proteins. The various forms of 
PDI which are currently known are: - PDI major isozyme; a multifunctional protein that also 
function as the beta subunit of prolyl 4-hydroxylase (EC 1.14.11,2), as a component of 
oligosaccharyl transferase (EC 2.4.1.119) . as thyroxine deiodinase (EC 3.8. 1.4), as 
glutathione-insulin transhydrogenase (EC 1.8.4.2) and as a thyroid hormone-binding protein 
- ERp60 (ER-60; 58 Kd microsomal protein). ERp60 was originally thought to be a 
phosphoinositide-specific phospholipase C isozyme and later to be a protease. - ERp72. - 
P5.A11 PDI contains two or three (ERp72) copies of the thioredoxin domain. Bacterial 
proteins that act as thiol disulfide interchange proteins thatallows disulfide bond formation ii 
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some periplasmic proteins also contain a thioredoxin domain. These proteins are: - 
Escherichia coli dsbA (or prfA) and its orthologs in Vibrio cholerae (tcpG) and Haemophilus 
influenzae (por). - Escherichia coli dsbC (or xpRA) and its orthologs in Erwinia chrysanthemi 
and Haemophilus influenzae. - Escherichia coli dsbD (or dipZ) and its Haemophilus 
influenzae ortholog. - Escherichia coli dsbE (or ccmG) and orthologs in Haemophilus 
influenzae, Rhodobacter capsulatus (helX), Rhiziobiacae (cycY and tip A). 
Consensus pattern: [LIVMF]-[LIVMSTA]-x-[LIVMFYC]-[FYWSTHE]-x(2)-[FYWGTN]- 
C- [GATPLVE]-[PHYWSTA]-C-x(6)-[LIVMFYWT] [The two Cs form the redox-active 
bond] 

[ 1] Holmgren A. Annu. Rev. Biochem. 54:237~271(1985).[ 2] Gleason F.K., Holmgren A. 
FEMS Microbiol. Rev. 54:271-297(1988).[ 3] Holmgren A. J. Biol. Chem. 264:13963- 
13966(1989).[ 4] Eklund H., Gleason F.K., Holmgren A. Proteins 11:13-28(1991).[ 5] 
Freedman R.B., Hawkins H.C., Murant S J., Reid L. Biochem. Soc. Trans. 16:96-99(1988).[ 
6] Kivirikko K.I., Myllyla R., Pihlajaniemi T. FASEB J. 3:1609-1617(1989).[ 7] Freedman 
R.B. ? Hirst T.R., Tuite MJF. Trends Biochem. Sci. 19:331-336(1994). 

669. (Transcript fac2) Transcription factor TFIIB repeat signature 

In eukaryotes the initiation of transcription of protein encoding genes by polymerase II is 
modulated by general and specific transcription factors. The general transcription factors 
operate through common promoters elements (such as the TATA box). At least seven 
different proteins associates to form the general transcription factors: TFIIA, -IIB, -IID, -IIE ? 
-IIF, -IIG, and -IIH[1]. Transcription factor IIB (TFIIB) plays a central role in the 
transcription of class II genes, it associates with a complex of TFIID-IIA bound to DNA (DA 
complex) to form a ternary complex TFIID-IIA-IBB (DAB complex) which is then 
recognized by RNA polymerase II [2,3]. TFIIB is a protein of about 315 to 340amino acid 
residues which contains, in its C-terminal part an imperfect repeat of a domain of about 75 
residues. This repeat could contribute an element of symmetry to the folded protein. The 
following proteins have been shown to be evolutionary related to TFIIB: - An archaebacterial 
TFIIB homolog. In Pyrococcus woesei a previously undetected open reading frame has been 
shown [4] to be highly related to TFIIB. - Fungal transcription factor IIIB 70 Kd subunit 
(gene PCF4/TDS4/BRF 1) [5]. This protein is a general activator of RNA polymerase III 
transcription and plays a role analogous to that of TFIIB in pol III transcription. The central 
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section of the repeated domain, which is the most conserved part of that domain has been 
selected as a signature pattern. 

Consensus pattern: G-[KR]-x(3)-[STAGN]-x-[LIVMYA]-[GSTA](2)-[CSAV]-[LIVM]- 
[LIVMFY]-[LIVMA]-[GSA]-[STAC 

[ 1] Weinmann R. Gene Expr. 2:81-91(1992).[ 2] Hawley D. Trends Biochem. Sci. 16:317- 
318(1991).[ 3] Ha L, Lane W.S. ? Reinberg D. Nature 352:689-695(1991).[ 4] Ouzounis C, 
Sander C. Cell 71 .:189-190(1992). [ 5] Khoo B., Brophy B. ? Jackson S.P. Genes Dev. 8:2879- 
2890(1994). 

670. (transcritp fact) MADS-box domain signature and profile 

A number of transcription factors contain a conserved domain of 56 amino-acid residues, 
sometimes known as the MADS-box domain [El]. They are listed below: - Serum response 
factor (SRF) [1], a mammalian transcription factor that binds to the Serum Response Element 
(SRE). This is a short sequence of dyad symmetry located 300 bp to the 5 f end of the 
transcription initiation site of genes such as c-fos. - Mammalian myocyte-specific enhancer 
factors 2A to 2D (MEF2A to MEF2D). These proteins are transcription factor which binds 
specifically to the MEF2 element present in the regulatory regions of many muscle-specific 
genes. - Drosophila myocyte-specific enhancer factor 2 (MEF2). - Yeast GRM/PRTF protein 
(gene MCM1) [2], a transcriptional regulator of mating- type-specific genes. - Yeast arginine 
metabolism regulation protein I (gene ARGR1 or ARG80). - Yeast transcription factor 
RLM1. - Yeast transcription factor SMP1. - Arabidopsis thaliana agamous protein (AG) [3], a 
probable transcription factor involved in regulating genes that determines stamen and carpel 
development in wild-type flowers. Mutations in the AG gene result in the replacement of the 
stamens by petals and the carpels by a new flower. - Arabidopsis thaliana homeotic proteins 
Apetalal (API), Apetala3 (AP3) and Pistillata (PI) which act locally to specify the identity of 
the floral meristem and to determine sepal and petal development [4]. - Antirrhinum majus 
and tobacco homeotic protein deficiens (DEFA) and globosa (GLO) [5]. Both proteins are 
transcription factors involved in the genetic control of flower development. Mutations in 
DEFA or GLO cause the transformation of petals into sepals and of stamina into carpels. - 
Arabidopsis thaliana putative transcription factors AGL1 to AGL6 [6]. - Antirrhinum majus 
morphogenetic protein DEF H33 (squamosa) Jn SRF, the conserved domain has been shown 
[1] to be involved in DNA-binding and dimerization. A pattern that spans the complete length 
of the domain has been derived. The profile also spans the length of the MADS-box. 
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Consensus pattern: R-x-[RK]-x(5)-I-x-[DNGSK]-x(3)-[KR]-x(2)-T-[FY]-x-[RK](3)- x(2)- 
[LIVM]-x-K(2)-A-x-E-[LIVM]-[STA]-x-L-x(4)-[LIVM]-x- [LIVM](3)-x(6)-[LIVMF]-x(2)- 
[FY] 

[ 1] Norman C, Runswick M., Pollock R., Treisman R. Cell 5,5:989-1003(1988).[ 2] 
Passmore S., Maine G.T., Elble R., Christ C, Tye B.-K. J. Mol. Biol. 204:593-606(1988).[ 3] 
Yanofsky M., Ma H., Bowman J., Drews G., Feldmann K.A., Meyerowitz E.M. Nature 
346:35-39(1990).[ 4] Goto K., Meyerowitz E.M. Genes Dev. 8:1548-1560(1994).[ 5] 
Troebner W., Ramirez L., Motte P., Hue I., Huijser P., Loennig W.-E., Saedler H., Sommer 
H., Schwartz-Sommer Z. EMBO J. 11:4693-4704(1992).[ 6] Ma H., Yanofsky M.F., 
Meyerowitz E.M. Genes Dev. 5:484-495(1991).[El] 

671. Transketolase signatures 

Transketolase (EC 2.2.1.1) (TK) catalyzes the reversible transfer of a two-carbon ketol unit 
from xylulose 5-phosphate to an aldose receptor, such as ribose 5-phosphate, to form 
sedoheptulose 7-phosphate and glyceraldehyde 3-phosphate. This enzyme, together with 
transaldolase, provides a link between the glycolytic and pentose-phosphate pathways. TK 
requires thiamin pyrophosphate as a cofactor. In most sources where TK has been purified, it 
is a homodimer of approximately 70 Kd subunits. TK sequences from a variety of eukaryotic 
and prokaryotic sources [1,2] show that the enzyme has been evolutionarily conserved. In the 
peroxisomes of methylotrophic yeast Hansenula polymorpha, there is a highly related 
enzyme, dihydroxy-acetone synthase (DHAS) (EC 2.2.1.3 ^ (also known as formaldehyde 
transketolase), which exhibits a very unusual specificity by including formaldehyde amongst 
its substrates, l-deoxyxylulose-5 -phosphate synthase (DXP synthase) [3] is an enzyme so far 
found in bacteria (gene dxs) and plants (gene CLA1) which catalyzes the thiamin 
pyrophosphoate-dependent acyloin condensation reaction between carbon atoms 2 and 3 of 
pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D- xylulose-5-phosphate (dxp), a 
precursor in the biosynthetic pathway to isoprenoids, thiamin (vitamin Bl), and pyridoxol 
(vitamin B6). DXP synthase is evolutionary related to TK. Two regions of TK have been 
selected as signature patterns. The first, located in the N-terminal section, contains a histidine 
residue which appears to function inproton transfer during catalysis [4]. The second, located 
in the central section, contains conserved acidic residues that are part of the active cleft and 
may participate in substrate-binding [4]. 
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Consensus pattern: R-x(3)-[LIVMTA]-[DENQSTHKF]-x(5,6)-[GSN]-G-H-[PLIVMF]- 
[GSTA]-x(2)-[LIMC]-[GS 

Consensus pattern: G-[DEQGSA]-[DN]-G-[PAEQ]-[ST]-[HQ]-x-[PAGM]-[LIVMYAC]- 
[DEFYW]-x(2)-[STAP]-x(2)-[RGA] 

[ 1] Abedinia M., Layfield R., Jones S.M., Nixon P.F., Mattick J.S. Biochem. Biophys. Res. 
Commun. 183:1159-1166(1992).[ 2] Fletcher T.S., Kwee I.L., Nakada T., Largman C, 
Martin B.M. Biochemistry 31:1892-1896(1992).[ 3] Sprenger G.A., Schorken U., Wiegert T., 
Grolle S., De Graaf A.A., Taylor S.V., Begley T.P., Bringer-Meyer S., Sahm H. Proc. Natl, 
Acad. Sci. U.S.A. 94:1 2857-1 2862fl997l [ 4] Lindqvist Y., Schneider G., Ermler U., 
Sundstroem M. EMBO J. 11:2373-2379(1992). 

672. Transmembrane 4 family signature 

Recently a number of eukaryotic cell surface antigens have been found to be evolutionary 
related [1,2,3]. The proteins known to belong to this family are listed below: - Mammalian 
antigen CD9 (MIC3); A protein involved in platelet activation and aggregation. - Mammalian 
leukocyte antigen CD37, expressed on B lymphocytes. - Mammalian leukocyte antigen CD53 
(OX-44), which may be involved in growth regulation in hematopoietic cells. - Mammalian 
lysosomal membrane protein CD63 (melanoma-associated antigen ME491; antigen AD1). - 
Mammalian antigen CD81 (cell surface protein TAPA-1), which may play an important role 
in the regulation of lymphoma cell growth. - Mammalian antigen CD82 (protein R2; antigen 
C33; Kangai 1 (KAI1)), which associates with CD4 or CD8 and delivers costimulatory 
signals for the TCR/CD3 pathway. - Mammalian antigen CD151 (SFA-1; platelet-endothelial 
tetraspan antigen 3 (PETA-3)). - Mammalian cell surface glycoprotein A15 (TALLA-1; 
MXS1). - Mammalian novel antigen 2 (NAG-2). - Human tumor-associated antigen CO-029. 
- Schistosoma mansoni and japonicum 23 Kd surface antigen (SM23 / SJ23).These proteins 
share the following characteristics: they all seem to be type III membrane proteins (type III 
proteins are integral membrane proteins that contain a N-terminal membrane-anchoring 
domain which is not cleaved during biosynthesis and which functions both as a translocation 
signal and as a membrane anchor); they also contain three additional transmembrane regions, 
at least seven conserved cysteines residues, and are of approximately the same size (218 to 
284 residues). These proteins are collectively know as the 'transmembrane 4 super family' 
(TM4) because they span the plasma membrane four times. A schematic diagram of the 
domain structure of these proteins isshown below. +-+ + + — + + + 
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+-—+-—+ | | TMa | Extra | TM2| Cyt | TM3 | Extracellular | TM4 | Cyt| +-+- — 

+ + _ q q + _ CC— C -C — + C- — + *********£y t : cytoplasmic 

domain. TMa : transmembrane anchor.TM2 to TM4: transmembrane regions 2 to 4.'C f : 
conserved cysteine. T *' : position of the pattern. 

A conserved region that includes two cysteines and seems to be located in a short 
cytoplasmic loop between two transmembrane domains has been selected as a signature for 
these proteins. 

Consensus pattern: G-x(3)-[LIVMF]-x(2)-[GSA]-[LIVMF](2)-G-C-x-[GA]-[STA]- x(2)- 
[EG]-x(2)-[CWN]-[LIVM](2) 

[ 1] Levy S., Nguyen V.Q., Andria M.L., Takahashi S. J. Biol. Chem. 266:14597- 
14602(1991).[ 2] Tomlinson M.G., Williams A.F., Wright M.D. Eur. J. Immunol. 23:136- 
40(1993). [ 3] Barclay A.N., Birkeland M.L., Brown M.H., Beyers A.D., Davis S J., Somoza 
C, Williams A.F. The leucocyte antigen factbooks. Academic Press, London / San Diego, 
(1993). 

673. Tryptophan synthase alpha chain signature 

Tryptophan synthase catalyzes the last step in the biosynthesis of tryptophan: the conversion 
of indoleglycerol phosphate and serine, totryptophan and glyceraldehyde 3-phosphate [1,2]. It 
has two functional domains: one for the aldol cleavage of indoleglycerol phosphate to indole 
andglyceraldehyde 3-phosphate and the other for the synthesis of tryptophan fromindole and 
serine. In bacteria and plants [3], each domain is found on a separate subunit (alpha and beta 
chains), while in fungi the two domains are fused together on a single multifunctional protein. 
A conserved region that contains three conserved acidic residues has been selected as a 
signature pattern for the alpha chain. The first and the third acidic residues are believed to 
serve as proton donors/acceptors in the enzyme's catalytic mechanism. 
Consensus pattern: [LIVM]-E-[LIVM]-G-x(2)-[FYC]-[ST]-[DE]-[PA]-[LIVMY]- [AGLI]- 

[DE]-G 

[ 1] Crawford LP. Annu. Rev. Microbiol. 43:567-600(1989).[ 2] Hyde CC, Miles E.W. 
Bio/Technology 8:27-32(1990).[ 3] Berlyn M.B., Last R.L., Fink G.R. Proc. Natl. Acad. Sci. 
U.S.A. 86:4604-4608(1989). 



674. Tryptophan synthase beta chain pyridoxal-phosphate attachment site 
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Tryptophan synthase catalyzes the last step in the biosynthesis of tryptophan: the conversion 
of indoleglycerol phosphate and serine, totryptophan and glyceraldehyde 3-phosphate [1,2]. It 
has two functional domains: one for the aldol cleavage of indoleglycerol phosphate to indole 
andglyceraldehyde 3-phosphate and the other for the synthesis of tryptophan fromindole and 
serine. In bacteria and plants [3], each domain is found on a separate subunit (alpha and beta 
chains), while in fungi the two domains arefused together on a single multifunctional protein. 
The beta chain of the enzyme requires pyridoxal-phosphate as a cofactor. The pyridoxal- 
phosphate group is attached to a lysine residue. The region around this lysine residue also 
contains two histidine residues which are part of the pyridoxal-phosphate binding site. The 
signature pattern for the tryptophansynthase beta chain is derived from that conserved region. 
-Consensus pattern: [LIVM]-x-H-x-G-[STA]-H-K-x-N [K is the pyridoxal-P attachment site] 
[ 1] Crawford LP. Annu. Rev. Microbiol. 43:567-600(1989).[ 2] Hyde C.C., Miles E.W. 
Bio/Technology 8:27-32(1990).[ 3] Berlyn M.B., Last R.L., Fink G.R. Proc. Natl. Acad. Sci. 
U.S.A. 86:4604-4608(1989). 

675. Serine proteases, trypsin family, active sites 

The catalytic activity of the serine proteases from the trypsin family is provided by a charge 
relay system involving an aspartic acid residue hydrogen-bonded to a histidine, which itself is 
hydrogen-bonded to a serine. The sequences in the vicinity of the active site serine and 
histidine residues are well conserved in this family of proteases [1]. A partial list of proteases 
known to belong to the trypsin family is shown below. - Acrosin. - Blood coagulation factors 
VII, IX, X, XI and XII, thrombin, plasminogen, and protein C. - Cathepsin G. - 
Chymotrypsins. - Complement components Clr, Cls, C2, and complement factors B, D and 
I. - Complement-activating component of RA-reactive factor. - Cytotoxic cell proteases 
(granzymes A to H). - Duodenase I. - Elastases 1, 2, 3 A, 3B (protease E), leukocyte 
(medullasin). - Enterokinase (EC 3.4.21.9 ) (enteropeptidase). - Hepatocyte growth factor 
activator. - Hepsin. - Glandular (tissue) kallikreins (including EGF-binding protein types A, 
B, and C, NGF-gamma chain, gamma-renin, prostate specific antigen (PSA) and tonin). - 
Plasma kallikrein. - Mast cell proteases (MCP) 1 (chymase) to 8. - Myeloblasts (proteinase 
3) (Wegener's autoantigen). - Plasminogen activators (urokinase-type, and tissue-type). - 
Trypsins I, II, III, and IV. - Tryptases. - Snake venom proteases such as ancrod, batroxobin, 
cerastobin, flavoxobin, and protein C activator. - Collagenase from common cattle grub and 
collagenolytic protease from Atlantic sand fiddler crab. - Apolipoprotein(a). - Blood fluke 
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cercarial protease. - Drosophila trypsin like proteases: alpha, easter, snake-locus. - Drosophila 
protease stubble (gene sb). - Major mite fecal allergen Der p III. All the above proteins 
belong to family SI in the classification of peptidases[2,El] and originate from eukaryotic 
species. It should be noted thatbacterial proteases that belong to family S2A are similar 
5 enough in the regions of the active site residues that they can be picked up by the same 

patterns. These proteases are listed below. - Achromobacter lyticus protease I. - Lysobacter 
alpha-lytic protease. - Streptogrisin A and B (Streptomyces proteases A and B). - 
Streptomyces griseus glutamyl endopeptidase IL - Streptomyces fradiae proteases 1 and 2. 
Consensus pattern: [LIVM]-[ST]-A-[STAG]-H-C [H is the active site residue] 
1 0 Consensus pattern: [DNSTAGC]-[GSTAPIMVQH]-x(2)-G-[DE]-S-G-[GS]-[SAPHV]- 
[LIVMF YWH]- [LI VMFYSTANQH] [S is the active site residue] 

[ 1] Brenner S. Nature 334:528-530(1988).[ 2] Rawlings N.D., Barrett AJ. Meth. Enzymol. 
244:19-61(1994).[E1] 



15 

676. (tsp) Thrombospondin type 1 domain 
[1] Bork P; FEBS lett 1993;327:125-130. 



20 

677. Tubulin subunits alpha, beta, and gamma signature 

Tubulins [1,2], the major constituent of microtubules are dimeric proteins which consist of 
two closely related subunits (alpha and beta). Tubulin binds two molecules of GTP at two 
different sites (N and E). At the E (Exchangeable) site, GTP is hydrolyzed during 

2 5 incorporation into the microtubule. Near the E site is an invariant region rich in glycines 

which is found in both chains andwhich is now [3] said to control the access of the nucleotide 
to its binding site. A signature pattern was developed from this region. With the exception of 
the simple eukaryotes, most species express a variety of closely related alpha and beta 
isotypes. In most species there is a third member of the tubulin family: gamma tubulin. 

3 0 Gamma tubulin is found at microtubule organizing centers (MTOC) such as the spindle poles 

or the centrosome, suggesting that it is involved in the minus-end nucleation of microtubule 
assembly [4], 

Consensus pattern: [SAG]-G-G-T-G-[SA]-G 
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[ 1] Cleveland D.W., Sullivan K.F. Annu. Rev. Biochem. 54:331-365(1985).[ 2] Joshi H.C., 
Cleveland D.W. Cell Motil. Cytoskeleton 16:159-163(1990).[ 3] Hesse J., Thierauf M., 
Ponstingl H. J. Biol. Chem. 262:15472-15475(1987).[ 4] Joshi H.C. BioEssays 15:637- 
643(1993). 

Tubulin-beta mRNA autoregulation signal 

The stability of beta-tubulin mRNAs are autoregulated by their own translation product [1]. 
Unpolymerized tubulin subunits bind directly (or activate a factor(s) which binds co- 
translationally) to the nascent N-terminus of beta-tubulin. This binding is transduced through 
the adjacent ribosomes to activatean RNAse that degrades the polysome-bound mRNA. The 
recognition element has been shown to be the first four amino acids of beta-tubulin: Met-Arg- 
Glu-Ile. Mutations to this sequence abolish the autoregulation effect (except for the 
replacement of Glu by Asp); transposition of this sequence to an internal region of a 
polypeptide also suppresses the autoregulatory effect. 
Consensus pattern: <M-R-[DE]-[IL] 

[ 1] Cleveland D.W. Trends Biochem. Sci. 13:339-343(1988). 

678. (tRNA-synt 2c) Aminoacyl-transfer RNA synthetases class-II signatures. Aminoacyl- 
tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and 
transfer them to specific tRNA molecules as the first step in protein biosynthesis. In 
prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two 
aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a 
mitochondrial form. While all these enzymes have a common function, they are widely 
diverse in terms of subunit size and of quaternary structure. The synthetases specific for 
alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-II synthetases [2 to 6] and probably have a common 
folding pattern in their catalytic domain for the binding of ATP and amino acid which is 
different to the Rossmann fold observed for the class I synthetases [7]. Class-II tRNA 
synthetases do not share a high degree of similarity, however at least three conserved regions 
are present [2,5,8]. Signature patterns have been derived from two of these regions. 



Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE]- 
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Consensus pattern: [GSTALVF]-{DENQHRKP}-[GSTA]-[LIVMF]-[DE]-R-[LIVMF]-x- 
[LIVMSTAG]-[LIVMFY]- 

[ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987).[ 2] Delarue M., Moras D. 
5 BioEssays 15:675-687(1993).[ 3] Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[ 4] Nagel 
G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). [ 5] Cusack S., 
Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991).[ 6] Cusack S. 
Biochimie 75:1077-1081(1993).[ 7] Cusack S., Berthet-Colominas C, Haertlein M., Nassar 
N., Leberman R. Nature 347:249-255(1990).[ 8] Leveque F., Plateau P., Dessen P., Blanquet 
10 S. Nucleic Acids Res. 18:305-312(1990). 

679. UBA-domain 

The UBA-domain (ubiquitin associated domain) is a novel sequence motif found in 
1 5 several proteins having connections to ubiquitin and the ubiquitination pathway. The 
structure of the UBA domain consists of a compact three helix bundle [1]. Number of 
members: 84 

[1] Structure of a human DNA repair protein UBA domain that interacts with HIV-1 
Vpr. Dieckmann T, Withers- Ward ES, Jarosinski MA, Liu CF, Chen IS, Feigon J; Nat Struct 
2 0 Biol 1 998;5 : 1042-1047. 

680. UBX domain 

Domain present in ubiquitin-regulatory proteins. Present in FAF1 and Shplp.Number of 
25 members: 19 

[1] The UBA domain: a sequence motif present in multiple enzyme classes of the 
ubiquitination pathway. Hofmann K, Bucher P; Trends Biochem Sci 1996;21:172-173. 

30 681. (UCH) Ubiquitin carboxyl-terminal hydrolases family 1 cysteine active site 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol 
proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. There are two distinct families of UCH. The first class consist 
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of enzymes ofabout 25 Kd and is currently represented by: - Mammalian isozymes LI and 
L3. - Yeast YUH1. - Drosophila Uch.One of the active site residues of class-I UCH [3] is a 
cysteine. A signature pattern has been derived from the region around that residue. 
Consensus pattern: Q-x(3)-N-[SA]-C-G-x(3)-[LIVM](2)-H-[SA]-[LIVM]-[SA] [C is the 
active site residue 

[ 1] Jentsch S., Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089:127-139(1991).[ 2] 
D'andrea A., Pellman D. Crit. Rev. Biochem. Mol. Biol. 33:337-352(1998).[ 3] Johnston 
S.C., Larsen C.N., Cook W.J., Wilkinson K.D., Hill CP. EMBO J. 16:3787-3796(1997).[ 4] 
Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

682. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-1) 
Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol 
proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. There are two distinct families of UCH. The second class 
consist of largeproteins (800 to 2000 residues) and is currently represented by: - Yeast UBP1, 
UBP2, UBP3, UBP4 (or DOA4/SSV7), UBP5, UBP7, UBP9, UBP10, UBP11, UBP12, 
UBP13, UBP14, UBP15 and UBP16. - Human tre-2. - Human isopeptidase T. - Human 
isopeptidase T-3. - Mammalian Ode-1. - Mammalian Unp. - Mouse Dub-1. - Drosophila fat 
facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - 
Caenorhabditis elegans hypothetical protein R10E11.3. - Caenorhabditis elegans hypothetical 
protein K02C4.3.These proteins only share two regions of similarity. The first region 
containsa conserved cysteine which is probably implicated in the catalytic mechanism. The 
second region contains two conserved histidines residues, one of which is also probably 
implicated in the catalytic mechanism. Signature patterns for both conserved regions have 
been developed. 

Consensus pattern: G-[LIVMFY]-x(l,3)-[AGC]-[NASM]-x-C-[FYW]-[LIVMC]-[NST]- 
[SACV]-x-[LIVMS]-Q [C is the putative active site residue] 

Consensus pattern: Y-x-L-x-[SAG]-[LIVMFT]-x(2)-H-x-G-x(4,5)-G-H-Y [The two H's are 
putative active site residues] 

[ 1] Jentsch S., Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089:127-139(1991).[ 2] 
D'andrea A., Pellman D. Crit. Rev. Biochem. Mol. Biol. 33:337-352(1998).[ 3] Rawlings 
N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 
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683. Ubiquitin carboxyl- terminal hydrolases family 2 signatures (UCH-2) 
Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol 
proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. There are two distinct families of UCH. The second class 
consist of largeproteins (800 to 2000 residues) and is currently represented by: - Yeast UBP1, 
UBP2, UBP3, UBP4 (or DOA4/SSV7), UBP5, UBP7, UBP9, UBP10, UBP11, UBP12, 
UBP13, UBP14, UBP15 and UBP16. - Human tre-2. - Human isopeptidase T. - Human 
isopeptidase T-3. - Mammalian Ode-1. - Mammalian Unp. - Mouse Dub-1. - Drosophila fat 
facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - 
Caenorhabditis elegans hypothetical protein R10E11.3. - Caenorhabditis elegans hypothetical 
protein K02C4.3.These proteins only share two regions of similarity. The first region 
containsa conserved cysteine which is probably implicated in the catalytic mechanism. The 
second region contains two conserved histidines residues, one of which is also probably 
implicated in the catalytic mechanism. Signature patterns for both conserved regions have 
been developed. 

Consensus pattern: G-[LIVMFY]-x(l ? 3)-[AGC]-[NASM]-x-C-[FYW]-[LIVMC]-[NST]- 
[SACV]-x-[LIVMS]-Q [C is the putative active site residue] 

Consensus pattern: Y-x-L-x-[SAG]-[LIVMFT]-x(2)-H-x-G-x(4 ? 5)-G-H-Y [The two H's are 
putative active site residues] 

[ 1] Jentsch S., Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089:127-139(1991).[ 2] 
D'andrea A., Pellman D. Crit. Rev. Biochem. Mol. Biol. 33:337-352(1998).[ 3] Rawlings 
N.D. ? Barrett AJ. Meth. Enzymol. 244:461-486(1994). 

684. UDP-glycosyltransferases signature 

UDP glycosyltransferases (UGT) are a superfamily of enzymes that catalyzes the addition of 
the glycosyl group from a UTP-sugar to a small hydrophobic molecule. This family currently 
consist of: - Mammalian UDP-glucoronosyl transferases (UDPGT) [1,2]. A large family of 
membrane-bound microsomal enzymes which catalyze the transfer of glucuronic acid to a 
wide variety of exogenous and endogenous lipophilic substrates. These enzymes are of major 
importance in the detoxification and subsequent elimination of xenobiotics such as drugs and 
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carcinogens. - A large number of putative UDPGT from Caenorhabditis elegans. - 
Mammalian 2-hydroxyacylsphingosine 1-beta-galactosy transferase [3] (also known as UDP- 
galactose-ceramide galactosyltransferase). This enzyme catalyzes the transfer of galactose to 
ceramide, a key enzymatic step in the biosynthesis of galactocerebrosides, which are 
abundant sphingolipids of the myelin membrane of the central nervous system and peripheral 
nervous system. - Plants flavonol 0(3)-glucosyltransferase. An enzyme [4] that catalyzes the 
transfer of glucose from UBP-glucose to a flavanol. This reaction is essential and one of the 
last steps in anthocyanin pigment biosynthesis. - Baculoviruses ecdysteroid UDP- 
glucosyltransf erase (EC 2.4.1.-) [5] (egt). This enzyme catalyzes the transfer of glucose from 
UDP-glucose to ectysteroids which are insect molting hormones. The expression of egt in the 
insect host interferes with the normal insect development by blocking the molting process. - 
Prokaryotic zeaxanthin glucosyl transferase (gene crtX) ? an enzyme involved in carotenoid 
biosynthesis and that catalyses the glycosylation reaction which converts zeaxanthin to 
zeaxanthin-beta- diglucoside. - Streptomyces macrolide glycosyltransferases [6]. These 
enzymes specifically inactivates macrolide anitibiotics via 2 r -0-glycosylation using UDP- 
glucose.These enzymes share a conserved domain of about 50 amino acid residues locatedin 
their C-terminal section and from which a pattern has been extracted todetect them. 
Consensus pattern: [FW]-x(2)-Q-x(2)-[LIVMYA]-[LIMV]-x(4 3 6)-[LVGAC]-[LVFYA]- 
[LIVMF]-[STAGCM]-[HNQ]-[STAGC]-G-x(2)-[STAG]-x(3)-[STAGL]- [UVMFA]-x(4> 
[PQR]-[LIVMT]-x(3)-[PA]-x(3)-[DES]-[QEHN] 

[ 1] Button G.J. (In) Glucoronidation of drugs and other compounds, Button G J., Ed. ? pp 1- 
78, CRC Press, Boca Raton, (1980).[ 2] Burchell B., Nebert B.W., Nelson D.R., Bock K.W., 
Iyanagi T., Jansen P.L., Lancet B., Mulder G.J., Chowdhury J.R., Siest G., Tephly T.R., 
Mackenzie P.I. BNA Cell Biol. 10:487-494(1991).[ 3] Schulte S. ? Stoffel W. Proc. Natl. 
Acad. Sci. U.S.A. 90:10265-10269(1993).[ 4] Furtek B., Schiefelbein J.W., Johnston F., 
Nelson O.E. Jr. Plant Mol. Biol. 11:473-481(1988).[ 5] O'Reilly B.R., Miller L.K. Science 
245:1110-11 12(1989).[ 6] Hernandez C, Olano C, Mendez C, Salas J.A. Gene 134:139- 
140(1993). 

685. UBP-glucose/GBP-mannose dehydrogenase family 

The UBP-glucose/GBP-mannose dehydrogenaseses are a small group of enzymes 
which possesses the ability to catlyze the NAB-dependent 2-fold oxidation of an alcholol to 
an acid without the release of an aldehyde intermediate [2]. Number of members: 55 
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[1] Purification and characterization of guanosine diphospho-D-mannose 
dehydrogenase. A key enzyme in the biosynthesis of alginate by Pseudomonas aeruginosa. 
Roychoudhury S, May TB, Gill JF, Singh SK, Feingold DS, Chakrabarty AM; J Biol Chem 
1989;264:9380-9385. [2] Properties and kinetic analysis of UDP-glucose dehydrogenase 
from group A streptococci. Irreversible inhibition by UDP-chloroacetol. Campbell RE, Sala 
RF, van de Rijn I, Tanner ME; J Biol Chem 1997;272:3416-3422. 

686. Uracil-DNA glycosylase signature 

Uracil-DNA glycosylase (EC 3.2.2.-) (UNG) [1] is a DNA repair enzyme that excises uracil 
residues from DNA by cleaving the N-glycosylic bond. Uracil in DNA can arise as a result of 
misincorportation of dUMP residues by DNA polymerase or deamination of cytosine. The 
sequence of uracil-DNA glycosylase is extremely well conserved [2] in bacteria and 
eukaryotes as well as in herpes viruses. More distantly related uracil-DNA glycosylases are 
also found in poxviruses [3]. In eukaryotic cells, UNG activity is found in both the nucleus 
and the mitochondria. Human UNG1 protein is transported to both the mitochondria and the 
nucleus [4]. The N-terminal 77 amino acids of UNG1 seem to be required for mitochondrial 
localization [4], but the presence of a mitochondrial transitpeptide has not been directly 
demonstrated. As a signature for this type of enzyme, the most N-termina conserved region 
has been selected. This region contains an aspartic acid residue which has been proposed, 
based on X-ray structures [5,6] to act as a general base in the catalytic mechanism. 
Consensus pattern: [KR]-[LIV]-[LIVC]-[LIVM]-x-G-[QI]-D-P-Y [D is the active site 
residue] - 

[ 1] Sancar A., Sancar G.B. Annu. Rev. Biochem. 57:29-67(1 988). [ 2] Olsen L.C., Aasland 
R., Wittwer C.U., Krokan H.E., Helland D.E. EMBO J. 8:3121-3125 (1989).[ 3] Upton C, 
Stuart D.T., McFadden G. Proc. Natl. Acad. Sci. U.S.A. 90:4518-4522(1993).[ 4] Slupphaug 
G., Markussen F.-H., Olsen L.C., Aasland R., Aarsaether N., Bakke O., Krokan H.E., Helland 
D.E. Nucleic Acids Res. 21:2579-2584(1993).[ 5] Savva R., McAuley-Hecht K., Brown T., 
Pearl L. Nature 373:487-493(1995). [ 6] Mol CD., Arvai A.S., Slupphaug G., Kavli B., 
Alseth I., Krohan H.E., Tainer J.A. Cell 80:869-878n995V r 7] Muller S.J., Caradonna S. 
Biochim. Biophys. Acta 1088:197-207(1991).[ 8] Meyer-Siegler K., Mauro D.J., Seal G., 
Wurzer J., Deriel J.K., Sirover M.A. Proc. Natl. Acad. Sci. U.S.A. 88:8460-8464(1991).[ 9] 
Muller S.J., Caradonna S. J. Biol. Chem. 268:1310-1319(1993).[10] Barnes D.E., Lindahl T., 
Sedgwick B. Curr. Opin. Cell Biol. 5:424-433(1993). 
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687. Uncharacterized protein family UPF0001 signature 

The following uncharacterized proteins have been shown [1] to share regions ofsimilarities: - 
Yeast chromosome II hypothetical protein YBL036c. - Caenorhabditis elegans hypothetical 
protein F09E5.8. - Bacillus subtilis hypothetical protein ylmE. - Escherichia coli hypothetical 
protein yggS and HI0090, the corresponding Haemophilus influenzae protein. - Helicobacter 
pylori hypothetical protein HP0395. - Mycobacterium tuberculosis hypothetical protein 
MtCY270.2CL - Synechocystis strain PCC 6803 hypothetical protein slr0556. - A 
Pseudomonas aeruginosa hypothetical protein in pilT 5 'region. - A Vibrio alginolyticus 
hypothetical protein in pilT 5 'region. These are proteins of from 25 to 30 Kd which contain a 
number of conserved regions. The best conserved region which is located in the first third of 
these proteins has been selected as a signature pattern. 
Consensus pattern: [FW]-H-[FM]-[IV]-G-x-[LIV]-0-x-[NKR]-K-x(3)-[LIV] 
[ 1] Bairoch A., Rudd K.E. Unpublished observations (1996). 

688. Uncharacterized protein family UPF0003 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Escherichia coli protein aefA. - Escherichia coli hypothetical protein yggB. - Escherichia coli 
hypothetical protein yjeP and HI0195.1, the corresponding Haemophilus influenzae protein. - 
Escherichia coli hypothetical protein ynal. - Bacillus subtilis hypothetical protein yhdY. - 
Helicobacter pylori hypothetical protein HP0415. - Synechocystis strain PCC 6803 
hypothetical protein slr0639. - Archaeoglobus fulgidus hypothetical protein AF1546. - 
Methanococcus jannaschii hypothetical protein MJ0170. - Methanococcus jannaschii 
hypothetical protein MJ1143.The size of these proteins range from 30 to 120 Kd. They all 
contain a number of transmembrane regions. The best conserved region which is located in 
and just after the last potential transmembrane region has been selected as a signature 
pattern,. 

Consensus pattern: G-[STIF]-V-x(2)-[LIVM]-x(6)-[LIVMF]-x(3)-[DQ]-x(3)-[LIV]- x-[LIV]- 

P-N-x(2)-[LIVMF]-[LIVFSTA]-x(5)-N 

[ 1] Bairoch A. Unpublished observations (1997). 
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689. Uncharacterized protein family UPF0004 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Escherichia coli hypothetical protein yliG. - Escherichia coli hypothetical protein yleA and 
HI0019, the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical 
protein yqeV. - Helicobacter pylori hypothetical protein HP0269. - Helicobacter pylori 
hypothetical protein HP0285. - Mycoplasma iowae hypothetical protein in 16S RNA 
5 f region. - Mycobacterium leprae hypothetical protein B2235_C2_195. - Pseudomonas 
aeruginosa hypothetical protein in hemL 3 'region. - Synechocystis strain PCC 6803 
hypothetical protein slr0082. - Synechocystis strain PCC 6803 hypothetical protein sll0996. - 
Methanococcus jannaschii hypothetical protein MJ0865. - Methanococcus jannaschii 
hypothetical protein MJ0867. - Caenorhabditis elegans hypothetical protein F25B5.5.The size 
of these proteins range from 47 to 61 Kd. They contain six conserved cysteines, three of 
which are clustered in a region that can be used as asignature pattern. 

Consensus pattern: [LIVM]-x-[LIVMT]-x(2)-G-C-x(3)-C-[STAN]-[FY]-C-x-[LIVM]- x(4)- 
G 

[1] Bairoch A. Unpublished observations (1997). 

690. Uncharacterized protein family UPF0005 signature 

The following proteins seems to be evolutionary related [1]: - Mammalian protein TEGT 
(Testis Enhanced Gene Transcript). - Escherichia coli hypothetical protein yccA and HI0044, 
the corresponding Haemophilus influenzae protein. - A probable Pseudomonas aeruginosa 
ortholog of yccA. These are proteins of about 25 Kd which seem to contain seven 
transmembranedomains. A signature pattern that corresponds to a region that starts with the 
beginning of the third transmembrane domain and ends in the middle of the fourth one has 
been developed. 

Consensus pattern: G-[LIVM](2)-[SA]-x(5,8)-G-x(2)-[LIVM]-G-P-x-L-x(4)-[SAG]- x(4,6> 
[LIVM](2)-x(2)-A-x(3)-T-A-[LIVM](2)-F 

[ 1] Walter L., Marynen P., Szpirer J. ? Levan G. ? Guenther E. Genomics 28:301-304(1995). 

691. Uncharacterized protein family UPF0006 signatures 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Yeast chromosome II hypothetical protein YBL055c. - Escherichia coli hypothetical protein 
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ycfH and HI0454, the corresponding Haemophilus influenzae protein. - Escherichia coli 
hypothetical protein yigW. - Escherichia coli hypothetical protein yjjV and HI0081, the 
corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yabD. 
- Haemophilus influenzae hypothetical protein HI1664. - Mycoplasma genitalium 
hypothetical protein MG009. These are proteins of from 24 to 47 Kd which contain a number 
of conserved regions. They can be picked up in the database by the following patterns. 
Consensus pattern: [LIVMFY](2)-D-[STA]-H-x-H-[LIVMF]-[DN 
Consensus pattern: P-[LIVM]-x-[LIVM]-H-x-R-x-[TA]-x-[DE 

Consensus pattern: [LVSA]-[LIVA]-x(2)-[LIVM]-[PS]-x(3)-L-[LIVM]-[LIVMS]-E-T- D-x- 
P 

[ 1] Bairoch A., Rudd K.E. Unpublished observations (1995). 

692. Uncharacterized protein family UPF0007 signature 

The following proteins seems to be evolutionary related [1]: - Escherichia coli hypothetical 
protein ygbP and HI0672, the corresponding Haemophilus influenzae protein. - Bacillus 
subtilis hypothetical protein yacM. - Mycobacterium tuberculosis hypothetical protein 
MtCY06Gl 1.29c. - Synechocystis strain PCC 6803 hypothetical protein slr0951. - A 
Rhodobacter capsulatus hypothetical protein in nifR3 5 'region. Except for the Rhodobacter 
protein which contains a C-terminal extension, all these proteins have from 225 to 236 amino 
acids. They are hydrophilic proteins that can be picked up in the database by the following 
pattern. 

Consensus pattern: V-L-[IV]-H-D-[GA]-A-R 
[ 1] Bairoch A. Unpublished observations (1997). 

693. Uncharacterized protein family UPF0015 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: 
Yeast chromosome II hypothetical protein YBR002c. - Yeast chromosome XIII hypothetical 
protein YMRlOlc. - Escherichia coli hypothetical protein yaeU and HI0920, the 
corresponding Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein 
HP1221. - Mycobacterium leprae hypothetical protein B1937_F2_65. - A Corynebacterium 
glutamicum hypothetical protein in aroF 3'region. - A Streptomyces fradiae hypothetical 
protein in transposon Tn4556. - Synechocystis strain PCC 6803 hypothetical protein sll0505 
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- Methanococcus jannaschii hypothetical protein MJ1372.These are proteins of about 26 to 
40 Kd whose central region is well conserved. They can be picked up in the database by the 
following pattern. 

Consensus pattern: [DE]-[LIVMF](3)-R-T-[SG]-G-x(2)-R-x-S-x-[FY]-[LIVM](2)-W-Q- 
[ 1] Wolfe K.H., Lohan AJ.E. Yeast 10:S41-S46(1994). 

694. Uncharacterized protein family UPF0016 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Yeast hypothetical protein YBR187w. - Fission yeast hypothetical protein SpAC17G8.08c. - 
Mouse protein pFT27. - Synechocystis strain PCC 6803 hypothetical protein sll0615. These 
are hydrophobic proteins of 200 to 320 amino acids that seem to contain six or seven 
transmembrane domains. A conserved region which seems, in the eukaryotic proteins of this 
family, to directly follow the second transmembrane domain has been selected as a signature 
pattern. 

Consensus pattern: E-[LIVM]-G-D-K-T-F-[LIVMF](2)-A- 
[ 1] Bairoch A. Unpublished observations (1996). 

695. Uncharacterized protein family UPF0021 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Yeast chromosome VII hypothetical protein YGL211w. - Dictyostelium discoideum protein 
vegl36. - Methanococcus jannaschii hypothetical proteins MJ1157 and MJ1478.These are 
proteins of from 300 to 36o residues. They can be picked up in thedatabase by the following 
pattern which is located in their N-terminalsection. 
Consensus pattern: C-K-x(2)-F-x(4)-E-x(22 ? 23)-S-G-G-K-D 
[ 1] Bairoch A. Unpublished observations (1997). 

696. Uncharacterized protein family UPF0023 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Mouse protein 22A3. - Yeast chromosome XII hypothetical protein YLR022c. - 
Caenorhabditis elegans hypothetical protein W06E11 A - Methanococcus jannaschii 
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hypothetical protein MJ0592.These are hydrophilic proteins of about 30 Kd. They can be 
picked up in the database by the following pattern. 
Consensus pattern: D-x-D-E-[LIV]-L-x(4)-V-F-x(3)-S-K-G- 
[ 1] Bairoch A. Unpublished observations (1997). 

697. Uncharacterized protein family UPF0024 signature. The following uncharacterized 
proteins have been shown [1] to share regions of similarities: - Escherichia coli hypothetical 
protein ygbO and HI0701, the corresponding Haemophilus influenzae protein. - Helicobacter 
pylori hypothetical protein HP0926. - Yeast chromosome XV hypothetical protein YOR243c. 
- Caenorhabditis elegans hypothetical protein B0024.ll. - Methanococcus jannaschii 
hypothetical proteins MJ0588 and MJ1364.These are hydrophilic proteins of from 39 to 77 
Kd. They can be picked up in the database by the following pattern. 

Consensus pattern: G-x-K-D-[KR]-x-A-[LV]-T-x-Q-x-[LIVF]-[SGC]- 
[ 1] Bairoch A. Unpublished observations (1997). 

698. Uncharacterized protein family UPF0025 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Escherichia coli hypothetical protein yfcE. - Bacillus subtilis hypothetical protein ysnB. - 
Mycoplasma genitalium and pneumoniae hypothetical protein MG207. - Methanococcus 
jannaschii hypothetical proteins MJ0623 and MJ0936. These are hydrophilic proteins of 
about 20 Kd. They can be picked up in thedatabase by the following pattern. 
Consensus pattern: D-V-[LIV]-x(2)-G-H-[ST]-H-x(12)-[LIVMF]-N-P-G 
[ 1] Bairoch A. Unpublished observations (1997). 

699. Uncharacterized protein family UPF0029 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Yeast chromosome III hypothetical protein YCR59c. - Yeast chromosome IV hypothetical 
protein YDL177C. - Escherichia coli hypothetical protein yigZ and HI0722, the 
corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yvyE. 
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- A Thermus aquaticus hypothetical protein in pol 5 'region. These proteins can be picked up 
in the database by the following pattern. 

Consensus pattern: G-x(2)-[LIVM](2)-x(2)-[LIVM]-x(4)-[LIVM]-x(5)-[LIVM](2)-x- R- 
[FYW](2)-G-G-x(2)-[LIVM]-G 

[ 1] Koonin E.V., Bork P., Sander C. EMBO J. 13:493-503(1994). 

700. Uncharacterized protein family UPF0030 signature 

The following uncharacterized proteins have been shown [1] to be highly similar: - Yeast 
chromosome VI hypothetical protein YFL060c. - Yeast chromosome XIII hypothetical 
protein YMR095c. - Yeast chromosome XIV hypothetical protein YNL334c. - Bacillus 
subtilis hypothetical protein yaaE. - Haemophilus influenzae hypothetical protein HI1648. - 
Methanococcus jannaschii hypothetical protein MJ1 661. These are hydrophilic proteins of 
about 19 to 25 Kd. They can be picked up inthe database by the following pattern. 
Consensus pattern: [GA]-L4-[LIV]-P-G-G-E-S-T-[STA] 
[ 1] Bairoch A. Unpublished observations (1997). 

701. Uncharacterized protein family UPF0032 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Escherichia coli hypothetical protein yigU and HI0188, the corresponding Haemophilus 
influenzae protein. - Bacillus subtilis hypothetical protein ycbT. - Mycobacterium 
tuberculosis hypothetical protein MtCY49.33c and U2126A, the corresponding 
Mycobacterium leprae protein. - Synechocystis strain PCC 6803 hypothetical protein sll0194. 
- Odontella sinensis and Porphyra purpurea chlroplast hypothetical protein ycf 43 .These 
proteins have from 245 to 317 amino acids and seem to contain at least six or seven 
transmembrane regions. A conserved region located in the central section of these proteins 
has been developed as a signature pattern,. 

Consensus pattern: Y-x(2)-F-[LIVMA](2)-x-L-x(4)-G-x(2)-F-[EQ]-[LIVMF]-P- [LIVM] - 
[ 1] Bairoch A,, Rudd K.E. Unpublished observations (1996). 

702. Uncharacterized protein family UPF0034 signature 
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The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Escherichia coli hypothetical protein yhdG and HI0979, the corresponding Haemophilus 
influenzae protein. - Escherichia coli hypothetical protein yjbN and HI0634, the 
corresponding Haemophilus influenzae protein. - Escherichia coli hypothetical protein yohl 
5 and HI0270, the corresponding Haemophilus influenzae protein. - Bacillus subtilis 

hypothetical protein yacF. - Rhodobacter capsulatus protein nifR3 and related proteins in 
Azospirillum brasilense and Rhizobium leguminosarum. - Synechocystis strain PCC 6803 
hypothetical protein slr0644. - Synechocystis strain PCC 6803 hypothetical protein sll0926. - 
Caenorhabditis elegans hypothetical protein C45G9.2. - Yeast protein SMM1. - Yeast 
1 0 hypothetical protein YLR401c. - Yeast hypothetical protein YLR405w. - Yeast hypothetical 
protein YMLOSOw. Although it has been proposed [2] that Rhodobacter capsulatus nifR3 is a 
transcriptional regulatory protein, it is believed that these proteins constitute a family of 
enzymes whose active site could include a conserved cysteine which has been used as the 
central part of a signature pattern. 
1 5 Consensus pattern: [LIVM]-[DNG]-[LIVM]-N-x-G-C-P-x(3)-[LIVMASQ]-x(5)-G-[SAC] 
[ 1] Bairoch A., Rudd K.E. Unpublished observations (1995).[ 2] Foster-Hartnett D., Cullen 
P.J., Gabbert K.K., Kranz R.G. Mol. Microbiol. 8:903-914(1993). 

20 703. Uncharacterized protein family UPF0038 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - 
Escherichia coli hypothetical protein yacE and HI0890, the corresponding Haemophilus 
influenzae protein. - Mycobacterium tuberculosis hypothetical protein MtCY01B2.23 and 
O410, the corresponding Mycobacterium leprae protein. - Synechocystis strain PCC 6803 

2 5 hypothetical protein slr0553. - Other hypothetical proteins from Aeromonas hydrophila, 

Bacteroides nodosus, Neisseria gonorrhoeae, Pseudomonas putida, Thermus thermophilus 
and Xanthomonas campestris. - Human hypothetical protein pOV-2. - Yeast hypothetical 
protein YDR196C. - Caenorhabditis elegans hypothetical protein T05G5.5.These proteins all 
contain, in their N-terminal extremity, an ATP/GTP-binding motif 'A' (P-loop) (see 

3 0 < PDOC00017 >y The size of these proteins range from 200 to 290 residues (with the 

exception of the Mycobacterial sequences which are are 410 residues long). A conseved 
region some 50 residues away from the ATP-binding P-loop has been developed as a 
signature pattern. 

Consensus pattern: G-x-[LI]-x-R-x(2)-L-x(4)-F-x(8)-[LIV]-x(5)-P-x-[LIV] - 
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[ 1] Rudd K.E., Bairoch A. Unpublished observations (1997). 

704. Ubiquitin-conjugating enzymes active site 

Ubiquitin-conjugating enzymes (UBC or E2 enzymes) [1,2,3] catalyze the covalent 
attachment of ubiquitin to target proteins. An activatedubiquitin moiety is transferred from an 
ubiquitin-activating enzyme (El) to E2which later ligates ubiquitin directly to substrate 
proteins with or without the assistance of 'N-end 1 recognizing proteins (E3). In most species 
there are many forms of UBC (at least 9 in yeast) which are implicated in diverse cellular 
functions. A cysteine residue is required for ubiquitin-thiolester formation. There is a single 
conserved cysteine in UBCs and the region around that residue isconserved in the sequence 
of known UBC isozymes. That region has been used as a signature pattern. 
Consensus pattern: [FYWLSP]-H-[PC]-[NH]-[LIV]-x(3,4)-G-x-[LIV]-C-[LIV]-x- [LIV] [C 
is the active site residue] 

[ 1] Jentsch S., Seufert W. ? Sommer T., Reins H.-A. Trends Biochem. Sci. 15:195- 
198(1990).[ 2] Jentsch S., Seufert W.> Hauser H.-P. Biochim. Biophys. Acta 1089:127- 
139(1991).[ 3] Hershko A. Trends Biochem. Sci. 16:265-268(1991). 

705. Uroporphyrinogen decarboxylase signatures 

Uroporphyrinogen decarboxylase (URO-D), the fifth enzyme of the heme biosynthetic 
pathway, catalyzes the sequential decarboxylation of the four acetyl side chains of 
uroporphyrinogen to yield coproporphyrinogen [l].URO-D deficiency is responsible for the 
Human genetic diseases familialporphyria cutanea tarda (fPCT) and hepatoerythropoietic 
porphyria (HEP).The sequence of URO-D has been well conserved throughout evolution. 
The best conserved region is located in the N-terminal section; it contains a 
perfectly conserved hexapeptide. There are two arginine residues in this hexapeptide which 
could be involved in the binding, via salt bridges, to the carboxylgroups of the propionate 
side chains of the substrate. This region has been used as a signature pattern. A second 
signature pattern is based on a another well conserved region which is located in the central 
section of the protein. 

Consensus pattern: P-x-W-x-M-R-Q-A-G-R 

Consensus pattern: G-F-[STAGCV]-[STAGC]-x-P-[FYW]-T-[LV]-x(2)-Y-x(2)-[AE]- [GK] 
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[ 1] Garey J.R., Labbe-Bois R. ? Chelstowska A., Rytka J., Harrison L. ? Kushner J., Labbe P. 
Eur. J. Biochem. 205:1011-1016(1992). 

706. ubiE/COQ5 methyltransferase family signatures 

The following methyltransferases have been shown [1] to share regions of similarities: - 
Escherichia coli ubiE, which is involved in both ubiquinone and menaquinone biosynthesis 
and which catalyzes the S-adenosylmethionine dependent methylation of 2-polyprenyl-6- 
methoxy-l,4-benzoquinol into 2-polyprenyl-3- methyl-6-methoxy-l,4-benzoquinol and of 
demethylmenaquinol into menaquinol. - Yeast COQ5, a ubiquinone biosynthesis 
methlytransferase. - Bacillus subtilis spore germination protein C2 (gene: gercB or gerC2), a 
probable menaquinone biosynthesis methlytransferase. - Lactococcus lactis gerC2 homolog. - 
Caenorhabditis elegans hypothetical protein ZK652.9. - Leishmania donovani amastigote- 
specific protein A41. These are hydrophilic proteins of about 30 Kd (except for ZK652.9 
which is 65Kd). They can be picked up in the database by the following patterns. 
Consensus pattern: Y-D-x-M-N-x(2)-[LIVM]-S-x(3)-H-x(2)-W 
Consensus pattern: R-V-[LIVM]-K-[PV]-G-G-x-[LIVMF]-x(2)-[LIVM]-E-x-S 
[ 1] Lee P.T., Hsu A.Y., Ha H.T., Clarke C.F. J. Bacteriol. 179:1748-1754(1997). 

707. Uricase signature 

Uricase (urate oxidase) [1] is the peroxisomal enzyme responsible for the degradation of 
urate into allantoin. Some species, like primates and birds, have lost the gene for uricase and 
are therefore unable to degradeurate. Uricase is a protein of 300 to 400 amino acids. A highly 
conserved region located in the central part of the sequence has been used as a signature 
pattern. 

Consensus pattern: [LV]-x-[LV]-[LIV]-K-[STV]-[ST]-x-[SN]-x-F-x(2)-[FY]-x(4)- [FY]- 
x(2)-L-x(5)-R 

[ 1] Motojima K., Kanaya S. ? Goto S. J. Biol. Chem. 263:16677-16681(1988). 



708. Universal stress protein family (Usp) 
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By a wide range of stress conditions members of the Usp family are predicted to be 
related to the MADS-box proteins transcript_fact and bind to DNA [2]. Number of members: 
39 

[1] Expression and role of the universal stress protein, Usp A, of Escherichia coli during 
growth arrest. Nystrom T, Neidhardt FC; Mol Microbiol 1994; 11:537-544. 
[2] Sequence analysis of eukaryotic developmental proteins: ancient and novel domains. 
Mushegian AR, Koonin EV; Genetics 1996; 144:817-828. 

709. Ubiquitin domain signature and profile 

Ubiquitin [1,2,3] is a protein of seventy six amino acid residues, found in all eukaryotic cells 
and whose sequence is extremely well conserved from protozoan to vertebrates. It plays a key 
role in a variety of cellular processes, such as ATP-dependent selective degradation of 
cellular proteins,maintenance of chromatin structure, regulation of gene expression, stress 
response and ribosome biogenesis. In most species, there are many genes coding for 
ubiquitin. However they can be classified into two classes. The first class produces 
polyubiquitin molecules consisting of exact head to tail repeats of ubiquitin. The number of 
repeats is variable (up to twelve in a Xenopus gene). In the majority of polyubiquitin 
precursors, there is a final amino-acid after the last repeat. The second class of genes 
produces precursor proteins consisting of a single copy of ubiquitin fused to a C-terminal 
extension protein (CEP). There are two types of CEP proteins and both seem to be ribosomal 
proteins. Ubiquitin is a globular protein, the last four C-terminal residues (Leu-Arg- Gly-Gly) 
extending from the compact structure to form a 'tail 1 , important for its function. The latter is 
mediated by the covalent conjugation of ubiquitin to target proteins, by an isopeptide linkage 
between the C-terminal glycine and the epsilon amino group of lysine residues in the target 
proteins. There are a number of proteins which are evolutionary related to ubiquitin: - 
Ubiquitin-like proteins from baculoviruses as well as in some strains of bovine viral diarrhea 
viruses (BVDV). These proteins are highly similar to their eukaryotic counterparts. - 
Mammalian protein GDX [4]. GDX is composed of two domains, a N-terminal ubiquitin-like 
domain of 74 residues and a C-terminal domain of 83 residues with some similarity with the 
thyroglobulin hormonogenic site. - Mammalian protein FAU [5]. FAU is a fusion protein 
which consist of a N-terminal ubiquitin-like protein of 74 residues fused to ribosomal protein 
S30. - Mouse protein NEDD-8 [6], a ubiquitin-like protein of 81 residues. - Human protein 
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BAT3, a large fusion protein of 1132 residues that contains a N-terminal ubiquitin-like 
domain. - Caenorhabditis elegans protein ubl-1 [7]. Ubl-1 is a fusion protein which consist of 
a N-terminal ubiquitin-like protein of 70 residues fused to ribosomal protein S27A. - Yeast 
DNA repair protein RAD23 [8]. RAD23 contains a N-terminal domain that seems to be 
distantly, yet significantly, related to ubiquitin. - Mammalian RAD23-related proteins 
RAD23A and RAD23B. - Mammalian BCL-2 binding athanogene-1 (BAG-1). BAG-1 is a 
protein of 274 residues that contains a central ubiquitin-like domain. - Human spliceosome 
associated protein 114 (SAP 114 or SF3A120). - Yeast protein DSK2, a protein involved in 
spindle pole body duplication and which contains a N-terminal ubiquitin-like domain. - 
Human protein CKAP1/TFCB, Schizosaccharomyces pombe protein alpll and 
Caenorhabditis elegans hypothetical protein F53F4.3. These proteins contain a N-terminal 
ubiquitin domain and a C-terminal CAP-Gly domain. - Schizosaccharomyces pombe 
hypothetical protein SpAC26A3.16. This protein contains a N-terminal ubiquitin domain. - 
Yeast protein SMT3. - Human ubiquitin-like proteins SMT3A and SMT3B. - Human 
ubiquitin-like protein SMT3C (also known as PIC1; Ubll, Sumo-1; Gmp-1 or Sentrin). This 
protein is involved in targeting ranGAPl to the nuclear pore complex protein ranBP2. - 
SMT3-like proteins in plants and Caenorhabditis elegans. To identify ubiquitin and related 
proteins, a pattern has been developed based on conserved positions in the central section of 
the sequence. A profile was also developed that spans the complete length of the ubiquitin 
domain. 

Consensus pattern: K-x(2)-[LIVM]-x-[DESAK]-x(3)-[LIVM]-[P A]-x(3)-Q-x-[LIVM]- 
[LIVMC]-[LIVMFY]-x-G-x(4)-[DE] 

[ 1] Jentsch S., Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089:127-139(1991).[ 2] 
Monia B.P., Ecker D.J., Croke S.T. Bio/Technology 8:209-215(1990).[ 3] Finley D., 
Varshavsky A. Trends Biochem. Sci. 10:343-347(1985).[ 4] Filippi M., Tribioli C, Toniolo 
D. Genomics 7:453-457(1990).[ 5] Olvera J., Wool I.G. J. Biol. Chem. 268:17967- 
17974(1993).[ 6] Kumar S., Yoshida Y., Noda M. Biochem. Biophys. Res. Commun. 
195:393-399(1993).[ 7] Jones D., Candido E.P. J. Biol. Chem. 268:19545-19551(1993).[ 8] 
Melnick L., Sherman F. J. Mol. Biol. 233:372-388(1993). 

710. VHS domain 

Domain present in VPS-27, Hrs and STAM. Number of members: 27 
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711. Vinculin family signatures 

Vinculin [1] is a eukaryotic protein that seems to be involved in the attachment of the actin- 
based microfilaments to the plasma membrane. Vinculinis located at the cytoplasmic side of 
focal contacts or adhesion plaques. In addition to actin, vinculin interacts with other structural 
proteins such as talin and alpha-actinins. Vinculin is a large protein of 116 Kd (about a 1000 
residues). Structurally the protein consists of an acidic N-terminal domain of about 90 Kd 
separated from a basic C-terminal domain of about 25 Kd by a proline-rich region of about 50 
residues. The central part of the N-terminal domain consists of avariable number (3 in 
vertebrates, 2 in Caenorhabditis elegans) of repeats of a 110 amino acids domain. Catenins 
[2] are proteins that associate with the cytoplasmic domain of avariety of cadherins. The 
association of catenins to cadherins produces a complex which is linked to the actin filament 
network, and which seems to be of primary importance for cadherins cell-adhesion 
properties. Three different types of catenins seem to exist: alpha, beta, and gamma. Alpha- 
catenins are proteins of about 100 Kd which are evolutionary related to vinculin. Interm of 
their structure the most significant differences are the absence, inalpha-catenin, of the 
repeated domain and of the proline-rich segment. Two signature patterns for this family of 
proteins have been devolped. The first pattern is located in the N-terminal section of both 
vinculin and alpha-catenins and is part, in vinculin, of a domain that seems to be involved 
with the interaction with talin. The second pattern is based on a conserved regionin the N- 
terminal part of the repeated domain of vinculin. 

Consensus pattern: [KR]-x-[LIVMF]-x(3)-[LIVMA]-x(2)-[LIVM]-x(6)-R-Q-Q-E-L 
Consensus pattern: [LIVM]-x-[QA]-A-x(2)-W-[IL]-x-[DN]-P 

[ 1] Otto JJ. Cell MotiL Cytoskeleton 16:1-6(1990).[ 2] Herrenknecht K., Ozawa M., 
Eckerskorn C, Lottspeich F. ? Lenter M., Kemler R. Proc. Natl. Acad. ScL U.S.A. 88:9156- 
9160(1991), 

712. (Vitellogenin N) Lipoprotein amino terminal region 

This family contains regions from: Vitellogenin, Microsomal triglyceride transfer 
protein and apolipoprotein B-100. These proteins are all involved in lipid transport [1]. This 
family contains the LVln chain from lipovitellin, that contains two structural domains. 
Number of members: 33 
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[1] The structural basis of lipid interactions in lipovitellin, a soluble lipoprotein. 
Anderson TA, Levitt DG, Banaszak LJ Structure 1998;6:895-909. 

713. (VMS A) Major surface antigen from hepadnavirus 

714. ssDNA binding protein (Viral DNA bp) 

This protein is found in herpesviruses and is needed for 
replication. 

715. (Votage CLC) Voltage gated chloride channels 

This family of ion channels contains 10 or 12 transmembrane helices. Each protein forms a 
single pore. It has been shown that some members of this family form homodimers. These 
proteins contain two CBS domains. 

[1] Schmidt-Rose T, Jentsch TJ; J Biol Chem 1997;272:20515-20521. 

[2] Zhang J, George AL Jr, Griggs RC, Fouad GT ? Roberts J, Kwiecinski H ? Connolly AM, 

Ptacek LJ; Neurology 1996;47:993-998. 

716. von Willebrand factor type A domain (vwa) 
More von Willebrand factor type A domains? Sequence 
similarities with malaria thrombospondin-related 
anonymous protein, dihydropyridine-sensitive calcium 
channel and inter-alpha-trypsin inhibitor. 

Bork P, Rohde K; 

Biochem J 1991;279:908-911. 



1. RUGGERI, Z.M. and WARE, J. 
von Willebrand factor. 
FASEB J. 7 308-316 (1993). 
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2. COLOMBATTI, A., BONALDO, P. and DOLIANA, R. 

Type A modules: interacting domains found in several non-fibrillar 
collagens and in other extracellular matrix proteins. 
MATRIX 13 297-306 (1993). 

3. PERKINS, S J., SMITH, K.F., WILLIAMS, S.C, HARIS, P.I., CHAPMAN, D. 
and SIM, R.B. 

The secondary structure of the von Willebrand factor type A domain in 
factor B of human complement by Fourier transform infrared spectroscopy. 
Its occurrence in collagen types VI, VII, XII and XIV, the integrins and 
other proteins by averaged structure predictions. 
J.MOL.BIOL. 238 104-119 (1994). 

4. BORK, P. and ROHDE, K. 

More von Willebrand factor type A domains? Sequence similarities with 
malaria thrombospondin-related anonymous protein, dihydropyridine- 
sensitive calcium channel and inter-alpha-trypsin inhibitor. 
BIOCHEMJ. 279 908-910 (1991). 

5. EDWARDS, Y.J.K. and PERKINS, S.J. 

The protein fold of the von Willebrand factor type A domain is predicted 
to be similar to the open twisted beta-sheet flanked by alpha-helices 
found in human ras-p21. 
FEBS LETT. 358 283-286 (1995). 

6. LEE, J.O., RIEU, P., ARNAOUT, M.A. and LIDDINGTON, R. 
Crystal structure of the A domain from the alpha subunit of integrin CR3 
(CDllb/CD18). 

CELL 80 631-638 (1995). 

7. QU, A. and LEAHY, DJ. 

Crystal structure of the I-domain from the CDlla/CD18 (LFA-1, 
alpha L beta 2) integrin. 
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PROC.NATL.ACAD.SCLUSA 92 10277-10281 (1995). 

The von Willebrand factor is a large multimeric glycoprotein found in blood 
plasma. Mutant forms are involved in the aetiology of bleeding disorders 
[1]. In von Willebrand factor, the type A domain (vWF) is the prototype for 
a protein superfamily. The vWF domain is found in various plasma proteins: 
complement factors B, C2, CR3 and CR4; the integrins (I-domains); collagen 
types VI, VII, XII and XIV; and other extracellular proteins [2-4]. Proteins 
that incorporate vWF domains participate in numerous biological events 
(e.g., cell adhesion, migration, homing, pattern formation, and signal 
transduction), involving interaction with a large array of ligands [2]. 
Secondary structure prediction from 75 aligned vWF sequences has revealed 
a largely alternating sequence of alpha-helices and beta- strands [3]. Fold 
recognition algorithms were used to score sequence compatibility with a 
library of known structures: the vWF domain fold was predicted to be a 
doubly-wound, open, twisted beta-sheet flanked by alpha-helices [5]. 
3D structures have been determined for the I-domains of integrins CDllb 
(with bound magnesium) [6] and CDlla (with bound manganese) [7]. The domain 
adopts a classic alpha/beta Rossmann fold and contains an unusual metal 
ion coordination site at its surface. It has been suggested that this site 
represents a general metal ion-dependent adhesion site (MIDAS) for binding 
protein ligands [6]. The residues constituting the MIDAS motif in the CDllb 
and CDlla I-domains are completely conserved, but the manner in which the 
metal ion is coordinated differs slightly [7], 

VWFADOMAIN is a 3-element fingerprint that provides a signature for the vWF 
domain superfamily. The fingerprint was derived from an initial alignment 
of 14 sequences. Motif 1 includes the first beta-strand and 3 conserved 
residues involved in metal ion coordination in I-domains (Asp and 2 serines 
in positions 8, 10 and 12, respectively); motif 2 spans strands beta-2 and 
beta-2'; and motif 3 encodes beta-strand 3 and a conserved Asp (in position 
7), which coordinates the metal ion [6,7]. Three iterations on OWL27.0 were 
required to reach convergence, at which point a true set comprising 56 
sequences was identified. Numerous partial matches were also found. 



Attorney No. 2750-1237P 

562 

717. (WD40) WD domain, G-beta repeat 

The ancient regulatory-protein family of WD-repeat proteins. 

Neer EJ, Schmidt CJ, Nambudripad R, Smith TF; 

Nature 1994;371:297-300. 

Beta-transducin (G-beta) is one of the three subunits (alpha, beta, and gamma) 
of the guanine nucleotide-binding proteins (G proteins) which act as 
intermediaries in the transduction of signals generated by transmembrane 
receptors [1]. The alpha subunit binds to and hydrolyzes GTP; the functions of 
the beta and gamma subunits are less clear but they seem to be required for 
the replacement of GDP by GTP as well as for membrane anchoring and 
receptor recognition. 

In higher eukaryotes G-beta exists as a small multigene family of highly 
conserved proteins of about 340 amino acid residues. Structurally G-beta 
consists of eight tandem repeats of about 40 residues, each containing a 
central Trp- Asp motif (this type of repeat is sometimes called a WD-40 
repeat). Such a repetitive segment has been shown [El,2,3,4,5] to exist in a 
number of other proteins listed below: 

- Yeast STE4, a component of the pheromone response pathway. STE4 is a G-beta 
like protein that associates with GPA1 (G-alpha) and STE18 (G-gamma). 

- Yeast MSI1, a negative regulator of RAS-mediated cAMP synthesis. MSI1 is 
most probably also a G-beta protein. 

- Human and chicken protein 12.3. The function of this protein is not known, 
but on the basis of its similarity to G-beta proteins, it may also function 

in signal transduction. 

- Chlamydomonas reinhardtii gblp. This protein is most probably the homolog 
of vertebrate protein 12.3. 

- Human LIS1, a neuronal protein involved in type-1 lissencephaly [E2]. 

- Mammalian coatomer beta 1 subunit (beta-COP), a component of a cytosolic 
protein complex that reversibly associates with Golgi membranes to form 



Attorney No. 2750-1237P 

563 

vesicles that mediate biosynthetic protein transport. 

- Yeast CDC4, essential for initiation of DNA replication and separation of 
the spindle pole bodies to form the poles of the mitotic spindle. 

- Yeast CDC20, a protein required for two microtubule-dependent processes: 
nuclear movements prior to anaphase and chromosome separation. 

- Yeast MAK11, essential for cell growth and for the replication of Ml 
double-stranded RNA. 

- Yeast PRP4, a component of the U4/U6 small nuclear ribonucleoprotein with 
a probable role in mRNA splicing. 

- Yeast PWP1, a protein of unknown function. 

- Yeast SKI8, a protein essential for controlling the propagation of double- 
stranded RNA. 

- Yeast SOF1, a protein required for ribosomal RNA processing which 
associates with U3 small nucleolar RNA. 

- Yeast TUP1 (also known as AER2 or SFL2 or CYC9), a protein which has been 
implicated in dTMP uptake, catabolite repression, mating sterility, and 

many other phenotypes. 

- Yeast YCR57c, an ORF of unknown function from chromosome III. 

- Yeast YCR72c, an ORF of unknown function from chromosome III. 

- Slime mold coronin, an actin-binding protein. 

- Slime mold AAC3, a developmental^ regulated protein of unknown function. 

- Drosophila protein Groucho (formerly known as E(spl); 'enhancer of split), 
a protein involved in neurogenesis and that seems to interact with the 
Notch and Delta proteins. 

- Drosophila TAF-II-80, a protein that is tightly associated with TFIID. 

The number of repeats in the above proteins varies between 5 (PRP4, TUP1, and 
Groucho) and 8 (G-beta, STE4, MSI1, AAC3, CDC4, PWP1, etc.). In G-beta and G- 
beta like proteins, the repeats span the entire length of the sequence, while 
in other proteins, they make up the N-terminal, the central or the C-terminal 
section. 
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A signature pattern can be developed from the central core of the domain 
(positions 9 to 23). 

-Consensus pattern: [LIVMSTAC]-[LIVMFYWSTAGC]-[LIMSTAG]-[LIVMSTAGC]-x(2)- 
[DN]- 

x(2)-[LIVMWSTAC]-x-[LIVMFSTAG]-W-[DEN]-[LIVMFSTAGCN] 
[ 1] Gilman A.G. 

Annu. Rev. Biochem. 56:615-649(1987). 
[ 2] Duronio R.J., Gordon J.I., Boguski M.S. 

Proteins 13:41-56(1992). 
[ 3] van der Voorn L. ? Ploegh H.L. 

FEBS Lett 307:131-134(1992). 
[ 4] Neer E.J., Schmidt CJ. ? Nambudripad R. ; Smith T.F. 

Nature 371:297-300(1994). 
[ 5] Smith T.F. ? Gaiatzes C.G., Saxena K., Neer EJ. 

Biochemistry In Press(1998). 

718. WHEP-TRS domain containing proteins 

A conserved domain of 46 amino acids has been shown [1] to exist in a number 
of higher eukaryote aminoacyl-transfer RNA synthetases. This domain is present 
one to six times in the following enzymes: 

- Mammalian multifunctional aminoacyl-tRNA synthetase. The domain is present 
three times in a region that separates the N-terminal glutamyl-tRNA 
synthetase domain from the C-terminal prolyl-tRNA synthetase domain. 

- Drosophila multifunctional aminoacyl-tRNA synthetase. The domain is present 
six times in the intercatalytic region. 

- Mammalian tryptophanyl-tRNA synthetase. The domain is found at theN- 
terminal extremity. 

- Mammalian, insect, nematode and plant glycyl-tRNA synthetase. The domain is 
found at the N-terminal extremity [2]. 



Attorney No. 2750-1237P 

565 

- Mammalian histidyl-tRNA synthetase. The domain is found at the N-terminal 
extremity. 

This domain, which is called WHEP-TRS, could contain a central alpha-helical 
region and may play a role in the association of tRNA-synthetases into 
multienzyme complexes. 

A signature pattern based on the first 29 positions of the WHEP- 
Domain has been developed* 

-Consensus pattern: [QY]-G-[DNEA]-x-[LIV]-[KR]-x(2)-K-x(2)-[KRNG]-[AS]-x(4)- 
[LIV]-[DENK]-x(2)-[IV]-x(2)-L-x(3)-K 

[ 1] Cerini C. ? Kerjan P., Astier M., Gratecos D., Mirande M., Semeriva M. 

EMBO J. 10:4267-4277(1991). 
[ 2] Nada S. 5 Chang P.K., Dignam J.D. 

J. Biol. Chem. 268:7660-7667(1993). 

719. (Worm family 8) Putative membrane protein 

Analysis of protein domain families in Caenorhabditis elegans. 
Sonnhammer EL, Durbin R; 
Genomics 1997;46:200-216. 

This family called family 8 in [1], may be a transmembrane protein 
The specific function of this protein is unknown. 

720. Xylose isomerase 

Xylose isomerase (EC 5.3.1.5) [1] is an enzyme found in microorganisms which 
catalyzes the interconversion of D-xylose to D-xylulose. It can also isomerize 
D-ribose to D-ribulose and D-glucose to D-fructose. Xylose isomerase seems to 
require magnesium for its activity, while cobalt is necessary to stabilize the 
tetrameric structure of the enzyme. A number of residues are conserved in all 
known xylose isomerases. 
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Xylose isomerase also exists in plants [2] where it is homodimeric and is 
manganese-dependent. 

Two signatures patterns for xylose isomerase have been developed. The first one is 
derived from a stretch of five conserved amino acids that includes a glutamic 
acid residue known to be one of the four residues involved in the binding of 
the magnesium ion [3]; this pattern also includes a lysine residue which is 
involved in the catalytic activity. The second pattern is derived from a 
conserved region in the N-terminal section of the enzyme that include an 
histidine residue which has been shown [4] to be involved in the catalytic 
mechanism of the enzyme. 

-Consensus pattern: [LI]-E-P-K-P-x(2)~P 

[E is a magnesium ligand] 

[K is an active site residue] 
-Consensus pattern: [FL]-H-D-x-D-[LIV]-x-[PD]-x-[GDE] 

[H is an active site residue] 

[ 1] Dauter Z., Dauter M. ? Hemker J., Witzei H., Wilson K.S. 

FEBS Lett. 247:1-8(1989). 
[ 2] Kristo P.A., Saarelainen R., Fagerstrom R., Aho S., Korhola M. 

Eur. J. Biochem. 237:240-246(1996). 
[ 3] Henrick K., Collyer CA, Blow D.M. 

J. Mol. Biol. 208:129-157(1989). 
[ 4] Vangrysperre W., Ampe C, Kersters-Hilderson H. ? Tempst P. 

Biochem. J. 263:195-199(1989). 

721. XPG protein signatures. Xeroderma pigmentosum (XP) [1] is a human autosomal 
recessive disease, characterized by a high incidence of sunlight-induced skin cancer. People' 
skin cells with this condition are hypersensitive to ultraviolet light, due to defects in the 
incision step of DNA excision repair. There are a minimum of seven genetic 
complementation groups involved in this pathway: XP-A to XP-G. The defect in XP-G can 
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be corrected by a 133 Kd nuclear protein called XPG (or XPGC) [2].XPG belongs to a family 
of proteins [2,3,4,5,6] that are composed of twomain subsets: - Subset 1, to which belongs 
XPG, RAD2 from budding yeast and radl3 from fission yeast. RAD2 and XPG are single- 
stranded DNA endonucleases [7,8]. XPG makes the 3 Incision in human DNA nucleotide 
excision repair [9]. - Subset 2, to which belongs mouse and human FEN-1, rad2 from fission 
yeast, and RAD27 from budding yeast. FEN-1 is a structure-specific endonuclease. In 
addition to the proteins listed in the above groups, this family also includes: - Fission yeast 
exol, a 5 ->3' double-stranded DNA exonuclease that could act in a pathway that corrects 
mismatched base pairs. - Yeast EXOl (DHS1), a protein with probably the same function as 
exol. - Yeast DIN7.Sequence alignment of this family of proteins reveals that similarities are 
largely confined to two regions. The first is located at the N-terminal extremity (N-region) 
and corresponds to the first 95 to 105 amino acids. The second region is internal (I-region) 
and found towards the C-terminus; it spans about 140 residues and contains a highly 
conserved core of 27 amino acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS]). 
It is possible that the conserved acidic residues are involved in the catalytic mechanism of 
DNA excision repair in XPG. The amino acids linking the N- and I-regions are not 
conserved; indeed, they are largely absent from proteins belonging to the second subset. Two 
signature patterns have been developed for these proteins. The first corresponds to the central 
part of the N-region, the second to part of the I-region and includes the putative catalytic core 
pentapeptide 

Consensus pattern: [VI]-[KRE]-P-x-[FYIL]-V-F-D-G-x(2)-[PIL]-x-[LVC]-K- 
Consensus pattern: [GS]-[LIVM]-[PER]-[FYS]-[LIVM]-x-A-P-x-E-A-[DE]-[PAS]- [QS]- 
[CLM]- 

[ 1] Tanaka K., Wood R.D. Trends Biochem. Sci. 19:83-86(1994).[ 2] Scherly D. ? Nouspikel 
T., Corlet J., Ucla C, Bairoch A., Clarkson S.G. Nature 363:182-185(1993).[ 3] Carr A.M., 
Sheldrick K.S., Murray J.M., Al-Harithy R., Watts F.Z., Lehmann A.R. Nucleic Acids Res. 
21:1345-1349(1993).[ 4] Murray J.M., Tavassoli M., Al-Harithy R., Sheldrick K.S., 
Lehmann A.R., Carr A.M., Watts F.Z. Mol. Cell. Biol. 14:4878-4888(1994).[ 5] Harrington 
J.J., Lieber M.R. Genes Dev. 8:1344-1355(1994).[ 6] Szankasi P., Smith G.R. Science 
267:1166-1169(1995).[ 7] Habraken Y., Sung P., Prakash L., Prakash S. Nature 366:365- 
368(1993).[ 8] ODonovan A., Scherly D. ? Clarkson S.G., Wood R.D. J. Biol. Chem. 
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269:15965-15968(1994).[ 9] O'Donovan A., Davies A.A., Moggs J.G., West S.C., Wood 
R.D. Nature 371:432-435(1994). 

722. Xanthine/uracil permeases family 

The following transport proteins which are involved in the uptake of xanthine 
or uracil are evolutionary related [1]: 

- Uric uric acid-xanthine permease (gene uapA) from Aspergillus nidulans. 

- Purine permease (gene uapC) from Aspergillus nidulans. 

- Xanthine permease from Bacillus subtiiis (gene pbuX). 

- Uracil permease from Escherichia coli (gene uraA) [2] and Bacillus (gene 
pyrP). 

- Hypothetical protein ycdG from Escherichia coli. 

- Hypothetical protein ygfO from Escherichia coli. 

- Hypothetical protein ygfU from Escherichia coli. 

- Hypothetical protein yicE from Escherichia coli. 

- Hypothetical protein yunJ from Bacillus subtiiis. 

- Hypothetical protein yunK from Bacillus subtiiis. 

They are proteins of from 430 to 595 residues that seem to contain 12 
transmembrane domains. 

The best conserved region which corresponds with what seems to 

be the tenth transmembrane domain has been selected as a signature pattern. 

-Consensus pattern: [LIVM]-P-x-[PASIF]-V-[LIVM]-G-G-x(4)-[LIVM]-[FY]-[GSA]-x- 
[LIVM]-x(3)-G 

[ 1] Diallinas G. ? Gorfinkiel L. ? Arst G., Cecchetto G. ? Scazzocchio C. 

J. Biol. Chem. 270:8610-8622(1995). 
[ 2] Andersen P.S., Frees D., Fast R., Mygind B. 

J. Bacteriol. 177:2008-2013(1995). 



723. Hypothetical yabO/yceC/sfhB family 
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The following proteins, which seems to belong to a family of pseudouridine 
synthases (EC 4.2.1.70) [1] have been shown to share regions of similarities: 

- Escherichia coli and Haemophilus influenzae ribosomal large subunit 
pseudouridine synthase A (gene rluA). It is responsible for synthesis of 
pseudouridine from uracil-746 IN 23S rRNA. 

- Escherichia coli and Haemophilus influenzae ribosomal large subunit 
pseudouridine synthase C (gene rluC). It is responsible for synthesis of 
pseudouridine from uracil at positions 955, 2504 and 2580 in 23S rRNA. 

- Escherichia coli protein and homologs in other bacteria large subunit 
pseudouridine synthase D (gene rluD). 

- Yeast DRAP deaminase (gene RIB2). 

- Escherichia coli hypothetical protein yqcB and HI1435, the corresponding 
Haemophilus influenzae protein. 

- Haemophilus influenzae hypothetical protein HI0042. 

- Aquifex aeolicus hypothetical protein AQ_1758. 

- Bacillus subtilis hypothetical protein yhcT. 

- Bacillus subtilis hypothetical protein yjbO. 

- Bacillus subtilis hypothetical protein ylyB. 

- Helicobacter pylori hypothetical protein HP0347. 

- Helicobacter pylori hypothetical protein HP0745. 

- Helicobacter pylori hypothetical protein HP0956. 

- Mycoplasma genitalium hypothetical protein MG209. 

- Mycoplasma genitalium hypothetical protein MG370. 

- Synechocystis strain PCC 6803 hypothetical protein slrl592. 

- Synechocystis strain PCC 6803 hypothetical protein slr!629. 

- Yeast hypothetical protein YDL036c. 

- Yeast hypothetical protein YGR169c. 

- Fission yeast hypothetical protein SpAC18Bl 1.02c. 

- Caenorhabditis elegans hypothetical protein K07E8.7. 

These are proteins of from 21 to 50 Kd which contain a number of conserved 
regions in their central section. They can be picked up in the database by the 
following highly conserved pattern. 
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-Consensus pattern: [LIVCA]-[NHYT]-R-[LI]-D-x(2)-T-[STA]-G-[LIVAGC]- 
[LIVMF](2)-[LIVMFGC]-[SGTACV] 

[ 1] Conrad L, Sun D. ? Englund N., Ofengand J. 
J. Biol. Chem. 273:18562-18566(1998). 

In addition, the following bacterial proteins, which seems to belong to a family of 
pseudouridine synthases (EC 4.2.1.70) [1] also have been shown to share regions of 
similarities: 

- Escherichia coli and Haemophilus influenzae 16S pseudouridylate 516 
synthase (EC 4.2.1.70) (gene: rsuA). This enzyme is responsible for the 
formation of pseudouridine from uracil-516 in 16S ribosomal RNA. 

-Escherichia coli hypothetical protein yciL and Hill 99, the corresponding 
Haemophilus influenzae protein. 

- Escherichia coli hypothetical protein yjbC. 

- Escherichia coli hypothetical protein ymfC and HI0694, the corresponding 
Haemophilus influenzae protein. 

- Aquifex aeolicus hypothetical protein AQ_554. 

- Aquifex aeolicus hypothetical protein AQ_1464. 

- Bacillus subtilis hypothetical protein ypuL. 

- Bacillus subtilis hypothetical protein ytzF. 

- Borrelia burgdorferi hypothetical protein BB0129. 

- Helicobacter pylori hypothetical protein HP1459. 

- Synechocystis strain PCC 6803 hypothetical protein slr036L 

- Synechocystis strain PCC 6803 hypothetical protein slr0612. 

These are proteins of from 25 to 40 Kd which contain a number of conserved 
regions in their central section. They can be picked up in the database by the 
following highly conserved pattern. 



-Consensus pattern: G-R-L-D-x(2)~[STA]-x-G-[LIVFA]-[LIVMF](3)-[ST]-[DNST] 
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[ 1] Wrzesinski J., Bakin A., Nurse K., Lane B.G., Ofengand J. 
Biochemistry 34:8904-8913(1995). 

724. Zinc finger present in dystrophin, CBP/p300 
ZZ in dystrophin binds calmodulin 

Putative zinc finger; binding not yet shown. 

725. Zinc carboxypeptidase 

There are a number of different types of zinc-dependent carboxypeptidases (EC 
3.4.17.-) [1,2], All these enzymes seem to be structurally and functionally 
related. The enzymes that belong to this family are listed below. 

- Carboxypeptidase Al (EC 3.4.17.1), a pancreatic digestive enzyme that can 
removes all C-terminal amino acids with the exception of Arg, Lys and Pro. 

- Carboxypeptidase A2 (EC 3.4.17.15), a pancreatic digestive enzyme with a 
specificity similar to that of carboxypeptidase Al, but with a preference 
for bulkier C-terminal residues. 

- Carboxypeptidase B (EC 3.4.17.2), also a pancreatic digestive enzyme, but 
that preferentially removes C-terminal Arg and Lys. 

- Carboxypeptidase N (EC 3.4.17.3) (also known as arginine carboxypeptidase), 
a plasma enzyme which protects the body from potent vasoactive and 
inflammatory peptides containing C-terminal Arg or Lys (such as kinins or 
anaphylatoxins) which are released into the circulation. 

- Carboxypeptidase H (EC 3.4.17.10) (also known as enkephalin convertase or 
carboxypeptidase E), an enzyme located in secretory granules of pancreatic 
islets, adrenal gland, pituitary and brain. This enzyme removes residual C- 
terminal Arg or Lys remaining after initial endoprotease cleavage during 
prohormone processing, 

- Carboxypeptidase M (EC 3.4.17.12), a membrane bound Arg and Lys specific 

enzyme. 

It is ideally situated to act on peptide hormones at local tissue sites 
where it could control their activity before or after interaction with 



Attorney No. 2750-1237P 

572 

specific plasma membrane receptors. 
-Mast cell carboxypeptidase (EC 3.4.17.1), an enzyme with a specificity 
to carboxypeptidase A, but found in the secretory granules of mast cells. 

- Streptomyces griseus carboxypeptidase (Cpase SG) (EC 3.4,17.-) [3], which 
combines the specificities of mammalian carboxypeptidases A and B. 

- Thermoactinomyces vulgaris carboxypeptidase T (EC 3.4.17.18) (CPT) [4], 
which also combines the specificities of carboxypeptidases A and B. 

- AEBP1 [5], a transcriptional repressor active in preadipocytes. AEBP1 seems 
to regulate transcription by cleavage of other transcriptional proteins. 

- Yeast hypothetical protein YHR132c. 

All of these enzymes bind an atom of zinc. Three conserved residues are 
implicated in the binding of the zinc atom: two histidines and a glutamic acid 
Two signature patterns which contain these three zinc-ligands have been derived. 

-Consensus pattern: [PK]-x-[LIVMFY]-x-[LIVMFY]-x(4)-H-[STAG]-x-E-x-[LIVM]- 

[STAG]-x(6)-[LIVMFYTA] 

[H and E are zinc ligands] 
-Consensus pattern: H-[STAG]-x(3)-[LIVME]-x(2)-[LIVMFYW]-P-[FYW] 

[H is a zinc ligand] 

[ 1] Tan F., Chan S J., Steiner D.F., Schilling J.W., Skidgel R.A. 

J. Biol. Chem. 264:13165-13170(1989). 
[ 2] Reynolds D.S., Stevens R.L., Gurley D.S., Lane W.S. ? Austen K.F., 

Serafin W.E. 

J. Biol. Chem. 264:20094-20099(1989). 
[ 3] Narahashi Y. 

J. Biochem. 107:879-886(1990). 
[ 4] Teplyakov A., Polyakov K. ? Obmolova G., Strokopytov B., Kuranova I., 

Osterman A.L., Grishin N.V., Smulevitch S.V., Zagnitko O.P., 

Galperina O.V., Matz M.V., Stepanov V.M. 

Eur. J. Biochem. 208:281-288(1992). 
[ 5] He G.-P., Muise A., Li A.W., Ro H.-S. 

Nature 378:92-96(1995). 
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[ 6] Hourdou M.-L., Guinand M. ? Vacheron M.J., Michel G. ? Denoroy U 7 
Duez CM., Englebert S. ? Joris B., Weber G., Ghuysen J.-M. 
Biochem. J. 292:563-570(1993). 

[ 7] Rawlings N.D., Barrett AJ. 
Meth. Enzymol. 248:183-228(1995). 

726. Zinc finger, C2H2 type 

The C2H2 zinc finger is the classical zinc finger domain. 
The two conserved cysteines and histidines co-ordinate a 
zinc ion. The following pattern describes the zinc finger. 
#-X-C-X(l-5)-C-X3-#-X5-#-X2-H-X(3-6)-[H/C] 
Where X can be any amino acid, and numbers in brackets 
indicate the number of residues. The positions marked # are 
those that are important for the stable fold of the zinc 
finger. The final position can be either his or cys. 
The C2H2 zinc finger is composed of two short beta strands 
followed by an alpha helix. The amino terminal part of the 
helix binds the major groove in DNA binding zinc fingers. 

'Zinc finger' domains [1-5] are nucleic acid-binding protein structures first 
identified in the Xenopus transcription factor TFIIIA. These domains have 
since been found in numerous nucleic acid-binding proteins. A zinc finger 
domain is composed of 25 to 30 amino-acid residues. There are two cysteine or 
histidine residues at both extremities of the domain, which are involved in 
the tetrahedral coordination of a zinc atom. It has been proposed that such a 
domain interacts with about five nucleotides. A schematic representation of a 
zinc finger domain is shown below: 

x x 
x x 

X X 
X X 
X X 



Attorney No. 2750-1237P 

574 

x x 

C H 
x \ / x 

x Zn x 

X / \ X 

C H 

xxxxx xxxxx 

Many classes of zinc fingers are characterized according to the number and 
positions of the histidine and cysteine residues involved in the zinc atom 
coordination. In the first class to be characterized, called C2H2, the first 
pair of zinc coordinating residues are cysteines, while the second pair are 
histidines. A number of experimental reports have demonstrated the zinc- 
dependent DNA or RNA binding property of some members of this class. 

Some of the proteins known to include C2H2-type zinc fingers are listed below. 
The number of zinc finger regions found in each of these proteins are indicated 
between brackets; a V symbol indicates that only partial sequence 
data is available and that additional finger domains may be present. 

- Saccharomyces cerevisiae: ACE2 (3), ADR1 (2), AZF1 (4), FZF1 (5), MIG1 (2), 
MSN2 (2), MSN4 (2), RGM1 (2), RIM1 (3), RME1 (3), SFP1 (2), SSL1 (1), 
STP1 (3), SWI5 (3), VAC1 (1) and ZMS1 (2). 

- Emericella nidulans: brlA (2), creA (2). 

- Drosophila: AEF-1 (4), Cf2 (7), ci-D (5), Disconnected (2), Escargot (5), 
Glass (5), Hunchback (6), Kruppel (5), Kruppel-H (4+), Odd-skipped (4), 
Odd-paired (4), Pep (3), Snail (5), Spalt-major (7), Serependity locus beta 
(6), delta (7), h-1 (8), Suppressor of hairy wing su(Hw) (12), Suppressor 
of variegation suvar(3)7 (5), Teashirt (3) and Tramtrack (2). 

- Xenopus: transcription factor TFIIIA (9), p43 from RNP particle (9), Xfin 
(37 !!), Xsna (5), gastrula XlcGFS.l to XlcGF71.1 (from 4+ to 11+), Oocyte 
XlcOF2 to XlcOF22 (from 7 to 12). 

- Mammalian: basonuclin (6), BCL-6/LAZ-3 (6), erythroid krueppel-like 
transcription factor (3), transcription factors Spl (3), Sp2 (3), Sp3 (3) 
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and Sp(4) 3, transcriptional repressor YY1 (4), Wilms' tumor protein (4), 
EGRl/Krox24 (3), EGR2/Krox20 (3), EGR3/Pilot (3), EGR4/AT133 (4), Evi-1 
(10), GLI1 (5), GLI2 (4+), GLI3 (3+), HI V-EP 1/ZNF40 (4), HIV-EP2 (2), KR1 
(9+), KR2 (9), KR3 (15+), KR4 (14+), KR5 (11+), HF.12 (6+), REX-1 (4), ZfX 
(13), ZfY (13), Zfp-35 (18), ZNF7 (15), ZNF8 (7), ZNF35 (10), ZNF42/MZF-1 
(13), ZNF43 (22), ZNF46/Kup (2), ZNF76 (7), ZNF91 (36), ZNF133 (3). 

In addition to the conserved zinc ligand residues it has been shown [6] that a 
number of other positions are also important for the structural integrity of 
the C2H2 zinc fingers. The best conserved position is found four residues 
after the second cysteine; it is generally an aromatic or aliphatic residue. 

-Consensus pattern: C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H 
[The two C's and two H's are zinc ligands] 

[ 1] Klug A., Rhodes D. 

Trends Biochem. Sci. 12:464-469(1987). 
[ 2] Evans R.M., Hollenberg S.M. 

Cell 52:1-3(1988). 
[ 3] Payre F., Vincent A. 

FEBS Lett. 234:245-250(1988). 
[ 4] Miller J., McLachlan A.D., Klug A. 

EMBO J. 4:1609-1614(1985). 
[ 5] Berg J.M. 

Proc. Natl. Acad. Sci. U.S.A. 85:99-102(1988). 
[ 6] Rosenfeld R., Margalit H. 

J. Biomol. Struct. Dyn. 11:557-570(1993). 

727. Zinc finger, C3HC4 type (RING finger) 

A number of eukaryotic and viral proteins contain a conserved cysteine-rich 
domain of 40 to 60 residues (called C3HC4 zinc-finger or 'RING' finger) [1] 
that binds two atoms of zinc, and is probably involved in mediating protein- 
protein interactions. The 3D structure of the zinc ligation system is unique 
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to the RING domain and is refered to as the "cross-brace" motif. The spacing 
of the cysteines in such a domain is C-x(2)-C-x(9 to 39)-C-x(l to 3)-H-x(2 to 
3)-C-x(2)-C-x(4 to 48)-C-x(2)-C. 

Proteins currently known to include the C3HC4 domain are listed below 
(references are only provided for recently determined sequences). 

- Mammalian V(D)J recombination activating protein (gene RAG1). RAG1 
activates the rearrangement of immunoglobulin and T-cell receptor genes. 

- Mouse rpt-1. Rpt-1 is a trans-acting factor that regulates gene expression 
directed by the promoter region of the interleukin-2 receptor alpha chain 
or the LTR promoter region of HIV-1. 

- Human rfp. Rfp is a developmental^ regulated protein that may function in 
male germ cell development. Recombination of the N-terminal section of rfp 
with a protein tyrosine kinase produces the ret transforming protein. 

- Human 52 Kd Ro/SS-A protein. A protein of unknown function from the Ro/SS-A 
ribonucleoprotein complex. Sera from patients with systemic lupus 
erythematosus or primary Sjogren's syndrome often contain antibodies that 
react with the Ro proteins. 

- Human histocompatibility locus protein RING1. 

- Human PML, a probable transcription factor. Chromosomal translocation of 
PML with retinoic receptor alpha creates a fusion protein which is the 
cause of acute promyelocytic leukemia (APL). 

- Mammalian breast cancer type 1 susceptibility protein (BRCA1) [El]. 

- Mammalian cbl proto-oncogene. 

- Mammalian bmi-1 proto-oncogene. 

- Vertebrate CDK-activating kinase (CAK) assembly factor MAT1, a protein that 
stabilizes the complex between the CDK7 kinase and cyclin H (MAT1 stands 
for Menage A Trois% 

- Mammalian mel-18 protein. Mel-18 which is expressed in a variety of tumor 
cells is a transcriptional repressor that recognizes and bind a specific 
DNA sequence. 

- Mammalian peroxisome assembly factor-1 (PAF-1) (PMP35), which is somewhat 
involved in the biogenesis of peroxisomes. In humans, defects in PAF-1 are 
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responsible for a form of Zellweger syndrome, an autosomal recessive 
disorder associated with peroxisomal deficiencies, 

- Human MAT1 protein, which interacts with the CDK7-cyclin H complex. 

- Human RING1 protein. 

- Xenopus XNF7 protein, a probable transcription factor. 

- Trypanosoma protein ESAG-8 (T-LR), which may be involved in the 
postranscriptional regulation of genes in VSG expression sites or may 
interact with adenylate cyclase to regulate its activity. 

- Drosophila proteins Posterior Sex Combs (Psc) and Suppressor two of zeste 
(Su(z)2). The two proteins belong to the Polycomb group of genes needed to 
maintain the segment-specific repression of homeotic selector genes. 

- Drosophila protein male-specific msl-2, a DNA-binding protein which is 
involved in X chromosome dosage compensation (the elevation of 
transcription of the male single X chromosome). 

- Arabidopsis thaliana protein COP1 which is involved in the regulation of 
pho tomorphogenesis . 

- Fungal DNA repair proteins RAD5, RAD16, RAD18 and rad8. 

- Herpesviruses trans-acting transcriptional protein ICP0/IE110. This protein 
which has been characterized in many different herpesviruses is a trans- 
activator and/or -repressor of the expression of many viral and cellular 
promoters. 

- Baculoviruses protein CG30. 

- Baculoviruses major immediate early protein (PE-38). 

- Baculoviruses immediate-early regulatory protein IE-N/IE-2. 

- Caenorhabditis elegans hypothetical proteins F54G8.4, R05D3.4 and T02C1.1. 

- Yeast hypothetical proteins YER116c and YKR017c. 

The central region of the domain was selected as a signature pattern 
for the C3HC4 finger. 

-Consensus pattern: C-x-H-x-[LIVMFY]-C-x(2)-C-[LIVMYA] 

[ 1] Borden K.L.B., Freemont P.S. 
Curr. Opin. Struct. Biol. 6:395-401(1996). 
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728. Zinc finger C-x8-C-x5-C-x3-H type (and similar). 

729. Zinc finger, CCHC class 

A family of CCHC zinc fingers, mostly from retroviral gag 
proteins (nucleocapsid). Prototype structure is from HIV. 
Also contains members involved in eukaryotic gene 
regulation, such as C. elegans GLH-1. 
Structure is an 18-residue zinc finger; no examples of indels 
in the alignment. 

730. Zn-finger in Ran binding protein and others. 

731. ANl-like Zinc finger 

Zinc finger at the C-terminus of Anl Swiss:Q91889, a ubiquitin-like protein in Xenopus 
laevis. The following pattern describes the zinc finger. C-X2-C-X(9-12)-C-X(l-2)-C-X4-C- 
X2-H-X5-H-X-C Where X can be any amino acid, and numbers in brackets indicate the 
number of residues. 

[1] Linnen JM ? Bailey CP, Weeks DL; Gene 1993;128:181-188. 

732. 14-3-3 proteins 

Structure of a 14-3-3 protein and implications for coordination of multiple 
signalling pathways. 

Xiao B, Smerdon SJ, Jones DH, Dodson GG ? Soneji Y, Aitken A, Gamblin SJ; 
Nature 1995;376:188-191. 

Crystal structure of the zeta isoform of the 14-3-3 protein. 

Liu D, Bienkowska J, Petosa C, Collier RJ, Fu H, Liddington R; 
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Interaction of 14-3-3 with signaling proteins is mediated by the 
recognition of phosphoserine. 
5 Muslin AJ, Tanner JW, Allen PM, Shaw AS; 
Cell 1996;84:889-897. 

The 14-3-3 protein binds its target proteins with a common site 
located towards the C-terminus. 
1 0 Ichimura T, Ito M, Itagaki C, Takahashi M, Horigome T, Omata S, Ohno S, 
Isobe T 

FEBS Lett 1997;413:273-276. 

Molecular evolution of the 14-3-3 protein family. 
15 Wang W, Shakes DC 

J Mol Evol 1996;43:384-398. 
Function of 14-3-3 proteins. 
Jin DY, Lyu MS, Kozak CA, Jeang KT 
Nature 1996;382:308-308. 

20 

The 14-3-3 proteins [1,2,3] are a family of closely related acidic homodimeric 
proteins of about 30 Kd which were first identified as being very abundant in 
mammalian brain tissues and located preferentially in neurons. The 14-3-3 
proteins seem to have multiple biological activities and play a key role in 
25 signal transduction pathways and the cell cycle. They interacts with kinases 
such as PKC or Raf-1; they seem to also function as protein-kinase dependent 
activators of tyrosine and tryptophan hydroxylases and in plants they are 
associated with a complex that binds to the G-box promoter elements. 

3 0 The 14-3-3 family of proteins are ubiquitously found in all eukaryotic species 

studied and have been sequenced in fungi (yeast BMH1 and BMH2, fission yeast 
rad24 and rad25), plants, Drosophila, and vertebrates. The sequences of the 
14-3-3 proteins are extremely well conserved. Two highly conserved regions have 
been selected as signature patterns: the first is a peptide of 11 residues 
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located in the N-terminal section; the second, a 20 amino acid region located 
in the C-terminal section. 

-Consensus pattern: R-N-L-[LIV]-S-[VG]-[GA]-Y-[KN]-N-[IVA] 
-Consensus pattern: Y-K-[DE]-S-T-L-I-[IM]-Q-L-[LF]-[RHC]-D-N-[LF]-T-[LS]-W- 
[TAN]-[SAD] 

[ 1] Aitken A. 

Trends Biochem. Sci. 20:95-97(1995). 
[ 2] Morrison D. 

Science 266:56-57(1994). 
[ 3] Xiao B., Smerdon S J., Jones D.H., Dodson G.G., Soneji Y., Aitken A., 

Gamblin SJ. 

Nature 376:188-191(1995). 



733. D-isomer specific 2-hydroxyacid dehydrogenases (2 Hacid DH) 

This Pfam covers the Formate dehydrogenase, D-giycerate dehydrogenase and 
D-lactate dehydrogenase families in SCOP. A number of NAD-dependent 2- 
hydroxyacid dehydrogenases which seem to be specific for the D-isomer of their 
substrate have been shown [1,2,3,4] to be functionally and structurally related. These 
enzymes are listed below. 

- D-lactate dehydrogenase (EC 1.1.1.28), a bacterial enzyme which catalyzes the 
reduction of D-lactate to pyruvate. 

- D-glycerate dehydrogenase (EC 1.1.1.29) (NADH-dependent hydroxypyruvate 
reductase), a plant leaf peroxisomal enzyme that catalyzes the reduction of 
hydroxypyruvate to glycerate. This reaction is part of the glycolate pathway of 
photorespiration. 

- D-glycerate dehydrogenase from the bacteria Hyphomicrobium methylovorum 
and Methylobacterium extorquens. 

- 3-phosphoglycerate dehydrogenase (EC 1.1.1.95), a bacterial enzyme that 
catalyzes the oxidation of D-3-phosphoglycerate to 3-phosphohydroxypyruvate. 
This reaction is the first committed step in the 'phosphorylated* pathway of serine 
biosynthesis. 
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- Erythronate-4-phosphate dehydrogenase (EC 1.1.1.-) (gene pdxB), a bacterial 
enzyme involved in the biosynthesis of pyridoxine (vitamin B6). 

- D-2-hydroxyisocaproate dehydrogenase (EC 1.1.1.-) (D-hicDH) ? a bacterial 
enzyme that catalyzes the reversible and stereospecific interconversion between 2- 
ketocarboxylic acids and D-2-hydroxy-carboxylic acids. 

- Formate dehydrogenase (EC 1.2.1.2) (FDH) from the bacteria Pseudomonas sp. 
101 and various fungi [5]. 

- Vancomycin resistance protein vanH from Enterococcus faecium; this protein is a 
D-specific alpha-keto acid dehydrogenase involved in the formation of a 
peptidoglycan which does not terminate by D-alanine thus preventing 
vancomycin binding. 

- Escherichia coli hypothetical protein ycdW. 

- Escherichia coli hypothetical protein yiaE. 

- Haemophilus influenzae hypothetical protein HI1556. 

- Yeast hypothetical protein YER081w. 

- Yeast hypothetical protein YIL074w. 

All these enzymes have similar enzymatic activities and are structurally related. Three 
of the most conserved regions of these proteins have been selected to develop patterns. The 
first pattern is based on a glycine-rich region located in the central section of these enzymes; 
this region probably corresponds to the NAD-binding domain. The two other patterns contain 
a number of conserved charged residues, some of which may play a role in the catalytic 
mechanism. 

-Consensus pattern: [LIVMA]-[AG]-[IVT]-[LIVMFY]-[AG]-x-G-[NHKRQGSAC]~[LIV]- 
G-x(13 ? 14)-[LIVfMT]-x(2)-[FYwCTH]-[DNSTK] 

-Consensus pattern: [LIVMFYWA]-[LIVFYWC]-x(2)-[SAC]-[DNQHR]-[IVFA]-[LIVF]-x- 
[LIVF]-[HNI]-x-P-x(4)-[STN]-x(2)-[LIVMF]-x-[GSDN] 

-Consensus pattern: [LMFATC]-[KPQ]-x-[GSTDN]-x-[LIVMFYWR]-[LIVMFYW](2)-N-x- 
[STAGC]-R-[GP]-x-[LIVH]-[LIVMC]-[DNV] 

[1] Grant G.A. Biochem. Biophys. Res. Commun. 165:1371-1374(1989). 

[2] Kochhar S., Hunziker P., Leong-Morgenthaler P.M., Hottinger H. Biochem. Biophys. 

Res. Commun. 184:60-66(1992). 

[3] Ohta T., Taguchi H. J. Biol. Chem. 266:12588-12594(1991). 
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[4] Goldberg J.D., Yoshida T., Brick P. J. Mol. Biol. 236:1123-1140(1994). 
[5] Popov V.O., Lamzin V.S. Biochem. J. 301:625-643(1994). 

734. 2-oxo acid dehydrogenases acyltransferase (catalytic domain) 
Refined crystal structure of the catalytic domain of dihysrolipoyl 
transacetylase (E2P) from azotobacter vineelandii at 2.6 angstroms 
resolution. 

Mattevi A, Obmolova G, Kalk KH, Westphal AH, De Kok A, Hoi WG; 
J Mol Biol 1993;230:1183-1199. 

These proteins contain one to three copies of a lipoyl binding domain 
followed by the catalytic domain. 

735. 3-beta hydroxy steriod dehydrogenase/isomerase family 
Structure and tissue-specific expression of 3 
beta-hydroxysteroid dehydrogenase/5 -ene-4-ene isomerase 
genes in human and rat classical and peripheral 
steroidogenic tissues. 

Labrie F ? Simard J, Luu-The V, Pelletier G, Belanger A, 
Lachance Y, Zhao HF, Labrie C, Breton N, de Launoit Y, et al 
J Steroid Biochem Mol Biol 1992;41:421-435. 
The enzyme 3 beta-hydroxysteroid dehydrogenase/5-ene-4-ene 
isomerase (3 beta-HSD) catalyzes the oxidation and isomerization 
of 5-ene-3 beta-hydroxypregnene and 5-ene-hydroxyandrostene 
steroid precursors into the corresponding 4-ene-ketosteroids necessary 
for the formation of all classes of steroid hormones. 

736. 3-hydroxyacyl-CoA dehydrogenase 
This family also includes lambda crystallin. 
Structure of L-3-hydroxyacyl-coenzyme A dehydrogenase: 
preliminary chain tracing at 2.8-A resolution. 
Birktoft JJ ? Holden HM ? Hamlin R, Xuong NH, Banaszak LJ; 
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Proc Natl Acad Sci U S A 1987;84:8262-8266. 

3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35) (HCDH) [1] is an enzyme involved 
in fatty acid metabolism, it catalyzes the reduction of 3-hydroxyacyl-CoA to 
5 3-oxoacyl-CoA. Most eukaryotic cells have 2 fatty-acid beta-oxidation systems, 
one located in mitochondria and the other in peroxisomes. In peroxisomes 
3-hydroxyacyl-CoA dehydrogenase forms, with enoyl-CoA hydratase (ECH) and 
3,2-trans-enoyl-CoA isomerase (ECI) a multifunctional enzyme where the N- 
terminal domain bears the hydratase/isomerase activities and the C-terminal 
1 0 domain the dehydrogenase activity. There are two mitochondrial enzymes: one 
which is monofunctional and the other which is, like its peroxisomal 
counterpart, multifunctional. 

In Escherichia coli (gene fadB) and Pseudomonas fragi (gene faoA) HCDH is part 
15 of a multifunctional enzyme which also contains an ECH/ECI domain as well as a 
3-hydroxybutyryl-CoA epimerase domain [2]. 

The other proteins structurally related to HCDH are: 

20 - Bacterial 3-hydroxybutyryl-CoA dehydrogenase (EC 1.1.1.157) which reduces 
3-hydroxybutanoyl-CoA to acetoacetyl-CoA [3]. 
- Eye lens protein lambda-cry stallin [4], which is specific to lagomorphes 
(such as rabbit). 

2 5 There are two major region of similarities in the sequences of proteins of the 

HCDH family, the first one located in the N-terminal, corresponds to the NAD- 
binding site, the second one is located in the center of the sequence. A signature 
pattern has been derived from this central region. 

3 0 -Consensus pattern: [DNE]-x(2)-[GA]-F-[LIVMFY]-x-[NT]-R-x(3)-[PA]-[LIVMFY](2)- 

x(5)-[LIVMFYCT]-[LIVMFY]-x(2)-[GV] 

[ 1] Birktoff J J., Holden H.M., Hamlin R., Xuong N.-H., Banaszak L.J. 
Proc. Natl. Acad. Sci. U.S.A. 84:8262-8266(1987). 
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[ 2] Nakahigashi K. ? Inokuchi H. 

Nucleic Acids Res. 18:4937-4937(1990). 
[ 3] Mullany P., Clayton C.L., Pallen M.J., Slone R., Al-Saleh A., 

Tabaqchali S. 

FEMS Microbiol. Lett. 124:61-67(1994). 
[ 4] Mulders J.W.M., Hendriks W. ? Blankesteijn W.M., Bloemendal H. ? 
de Jong W.W. 

J. Biol. Chem. 263:15462-15466(1988). 

737. 60s Acidic ribosomal protein 

Proteins PI, P2, and P0, components of the eukaryotic 
ribosome stalk. New structural and functional aspects. 
Remacha M, Jimenez-Diaz A, Santos C, Briones E, Zambrano R, 
Rodriguez Gabriel MA, Guarinos E ? Ballesta JP; 
Biochem Cell Biol 1995;73:959-968. 

This family includes archaebacterial L12, eukaryotic P0, PI and P2. 

738. 6-phosphogluconate dehydrogenases 

6-phosphogluconate dehydrogenase (EC 1.1.1.44) (6PGD) catalyzes the third step 
in the hexose monophosphate shunt, the decarboxylating reduction of 
6-phosphogluconate in to ribulose 5-phosphate. 

Prokaryotic and eukaryotic 6PGD are proteins of about 470 amino acids whose 
sequence are highly conserved [1]. A region which has been shown [2], from studies 
of the sheep 6PGD tertiary structure, to be involved in the binding of 6-phosphogluconate 
has been selected as a signature pattern. 

-Consensus pattern: [LIVM]-x-D-x(2)-[GA]-[NQS]-K-G-T-G-x-W 

[ 1] Reizer A. ? Deutscher J., Saier M.H. Jr., Reizer J. 

Mol. Microbiol. 5:1081-1089(1991). 
[ 2] Adams MJ., Archibald I.G., Bugg C.E., Carne A., Gover S., 



Attorney No. 2750-1237P 

585 

Helliwell J.R-, Pickersgill R.W., White S.W. 
EMBO J. 2:1009-1014(1983). 



5 739. (7tm 1) G-protein coupled receptors [1 to 4,E1,E2] (also called R7G) are an extensive 
group of hormones, neurotransmitters, odorants and light receptors which 
transduce extracellular signals by interaction with guanine nucleotide- 
binding (G) proteins. The receptors that are currently known to belong to this 
family are listed below. 

10 

- 5-hydroxytryptamine (serotonin) 1A to IF, 2A to 2C, 4, 5 A, 5B, 6 and 7 [5]. 

- Acetylcholine, muscarinic- type, Ml to M5. 

- Adenosine Al, A2A, A2B and A3 [6]. 

- Adrenergic alpha- 1A to -1C; alpha-2A to -2D; beta-1 to -3 [7], 
1 5 - Angiotensin II types I and II. 

- Bombesin subtypes 3 and 4. 

- Brady kinin Bl and B2. 

- c3a and C5a anaphylatoxin. 

- Cannabinoid CB1 and CB2. 

2 0 - Chemokines C-C CC-CKR-1 to CC-CKR-8. 

- Chemokines C-X-C CXC-CKR-1 to CXC-CKR-4. 

- Cholecystokinin-A and cholecystokinin-B/gastrin. 

- Dopamine Dl to D5 [8]. 

- Endothelin ET-a and ET-b [9]. 

2 5 - fMet-Leu-Phe (fMLP) (N-formyl peptide). 

- Follicle stimulating hormone (FSH-R) [10], 

- Galanin. 

- Gastrin-releasing peptide (GRP-R). 

- Gonadotropin-releasing hormone (GNRH-R). 

3 0 - Histamine HI and H2 (gastric receptor I). 

- Lutropin-choriogonadotropic hormone (LSH-R) [10]. 

- Melanocortin MC1R to MC5R. 

- Melatonin. 

- Neuromedin B (NMB-R). 
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- Neuromedin K (NK-3R). 

- Neuropeptide Y types 1 to 6. 

- Neurotensin (NT-R). 

- Octopamine (tyramine), from insects. 

- Odorants [11]. 

- Opioids delta-, kappa- and mu-types [12]. 

- Oxytocin (OT-R). 

- Platelet activating factor (PAF-R). 

- Prostacyclin. 

- Prostaglandin D2. 

- Prostaglandin E2, EP1 to EP4 subtypes. 

- Prostaglandin F2. 

- Purinoreceptors (ATP) [13]. 

- Somatostatin types 1 to 5. 

- Substance-K (NK-2R). 

- Substance-P (NK-1R). 

- Thrombin. 

- Thromboxane A2. 

- Thyrotropin (TSH-R) [10]. 

- Thyrotropin releasing factor (TRH-R). 

- Vasopressin Via, Vlb and V2. 

- Visual pigments (opsins and rhodopsin) [14]. 

- Proto-oncogene mas. 

- A number of orphan receptors (whose ligand is not known) from mammals and 
birds. 

- Caenorhabditis elegans putative receptors C06G4.5, C38C10.1, C43C3.2, 
T27D1.3 and ZC84.4. 

- Three putative receptors encoded in the genome of cytomegalovirus: US27, 
US28, and UL33. 

- ECRF3, a putative receptor encoded in the genome of herpesvirus saimiri. 

The structure of all these receptors is thought to be identical. They have 
seven hydrophobic regions, each of which most probably spans the membrane. 
The N-terminus is located on the extracellular side of the membrane and is 
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often glycosylated, while the C-terminus is cytoplasmic and generally 
phosphorylated. Three extracellular loops alternate with three intracellular 
loops to link the seven transmembrane regions. Most, but not all of these 
receptors, lack a signal peptide. The most conserved parts of these proteins 
are the transmembrane regions and the first two cytoplasmic loops. A conserved 
acidic- Arg-aromatic triplet is present in the N-terminal extremity of the 
second cytoplasmic loop [15] and could be implicated in the interaction with G 
proteins. 

To detect this widespread family of proteins, a pattern that contains the conserved 
triplet and that also spans the major part of the third transmembrane helix has 
been developed. 

-Consensus pattern: [GSTALIVMFYWC]-[GSTANCPDE]-{EDPKRH}-x(2)- 

[LIVMNQGA]-x(2)- 
[LIVMFT]-[GSTANC]-[LIVMFYWSTAC]-[DENH]-R-[FYWCSH]-x(2)- 

[LIVM] 

[ 1] Strosberg A.D. 

Eur. J. Biochem. 196:1-10(1991). 
[ 2] Kerlavage A.R. 

Curr. Opin. Struct. Biol. 1:394-401(1991). 
[ 3] Probst W.C., Snyder L.A., Schuster D.I., Brosius J., Sealfon S.C. 

DNA Cell Biol. 11:1-20(1992). 
[ 4] Savarese T.M., Fraser CM. 

Biochem. J. 283:1-9(1992). 
[ 5] Branchek T. 

Curr. Biol. 3:315-317(1993). 
[ 6] Stiles G.L. 

J. Biol. Chem. 267:6451-6454(1992). 
[ 7] Friell T., Kobilka B.K., Lefkowitz R.J., Caron M.G. 

Trends Neurosci. 11:321-324(1988). 
[ 8] Stevens C.F. 

Curr. Biol. 1:20-22(1991). 
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[ 9] Sakurai T., Yanagisawa M., Masaki T. 

Trends Pharmacol. Sci. 13:103-107(1992). 
[10] Salesse R. ? Remy J J., Levin J.M., Jallal B., Gamier J. 

Biochimie 73:109-120(1991). 
[11] Lancet D., Ben-Arie N. 

Curr. Biol. 3:668-674(1993). 
[12] Uhl G.R., Childers S. ? Pasternak G. 

Trends Neurosci. 17:89-93(1994). 
[13] Barnard E.A., Burnstock G., Webb TJB. 

Trends Pharmacol. Sci. 15:67-70(1994). 
[14] Applebury M.L., Hargrave P.A. 

Vision Res. 26:1881-1895(1986). 
[15] Attwood T.K., Eliopoulos E.E., Findlay J.B.C. 

Gene 98:153-159(1991). 

(7tm 1) Visual pigments (opsins) retinal binding site 

Visual pigments [1,2] are the light-absorbing molecules that mediate vision. 
They consist of an apoprotein, opsin, covalently linked to the chromophore 
cis-retinal. Vision is effected through the absorption of a photon by cis- 
retinal which is isomerized to trans-retinal. This isomerization leads to a 
change of conformation of the protein. Opsins are integral membrane proteins 
with seven transmembrane regions that belong to family 1 of G-protein coupled 
receptors. 

In vertebrates four different pigments are generally found. Rod cells, which 
mediate vision in dim light, contain the pigment rhodopsin. Cone cells, which 
function in bright light, are responsible for color vision and contain three 
or more color pigments (for example, in mammals: red, blue and green). 

In Drosophila, the eye is composed of 800 facets or ommatidia. Each 
ommatidium contains eight photoreceptor cells (R1-R8): the Rl to R6 cells are 
outer cells, R7 and R8 inner cells. Each of the three types of cells (R1-R6, 
R7 and R8) expresses a specific opsin. 
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Proteins evolutionary related to opsins include squid retinochrome, also known 
as retinal photoisomerase, which converts various isomers of retinal into 11- 
cis retinal and mammalian retinal pigment epithelium (RPE) RGR [3], a protein 
that may also act in retinal isomerization. 

The attachment site for retinal in the above proteins is a conserved lysine 
residue in the middle of the seventh transmembrane helix. The pattern 
that had been developed includes this residue. 

-Consensus pattern: [LIVMWAC]-[PGC]-x(3)-[SAC]-K-[STALIMR]-[GSACPNV]- 
[STACP]- 

x(2)-[DENF]-[AP]-x(2)-[IY] 

[K is the retinal binding site] 

[ 1] Applebury M.L. ? Hargrave P. A. 

Vision Res. 26:1881-1895(1986). 
[ 2] Fryxell K.J., Meyerowitz E.M. 

J. MoL Evol. 33:367-378(1991). 
[ 3] Shen D., Jiang M., Hao W., Tao L., Salazar M., Fong H.K.W. 

Biochemistry 33:13117-13125(1994). 

The following descriptions of protein family functions are not provided by the Pfam or 
Prosite databases. 

740. BAH 

BAH domain. Number of members: 65 

[1] Medline: 97074677. Molecular cloning of polybromo, a nuclear protein containing 
multiple domains including five bromodomains, a truncated HMG-box, and two repeats of a 
novel domain. Nicolas RH, Goodwin GH; Gene 1996;175:233-240. 
[2] Medline: 99198739. The BAH (bromo-adjacent homology) domain: a link between 
DNA methylation, replication and transcriptional regulation. Callebaut I, Courvalin J-C ? 
Mornon JP; FEBS letts 1999;446:189-193. 
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741. ELM2. 

ELM2 domain. The ELM2 (Egl-27 and MTA1 homology 2) domain is a small domain of 
unknown function. Number of members: 10 

742. Euk proin. EUKARYOTIC_PORIN The major protein of the outer mitochondrial 
membrane of eukaryotes is a porin that forms a voltage-dependent anion-selective 
channel (VDAC) that behaves as a general diffusion pore for small hydrophilic molecules [1 
to 4]. The channel adopts an open conformation at low or zero membrane potential and a 
closed conformation at potentials above 30-40 mV. 

This protein contains about 280 amino acids and its sequence is composed of between 12 
to 16 beta-strands that span the mitochondrial outer membrane. Yeast contains two 
members of this family (genes POR1 and POR2); vertebrates have at least three members 
(genes VDAC1, VDAC2 and VDAC3) [5]. 

A conserved region located at the C-terminal part of these proteins was selected as a 
signature pattern. 

Consensus pattern[YH]-x(2)-D-[SPCAD]-x-[STA]-x(3)-[TAG]-[KR]-[LIVMF]-[DNSTA]- 
[DNS]-x(4)-[GSTAN]-[LIVMA]-x-[LIVMY] 

[ 1] Benz R. Biochim. Biophys. Acta 1197:167-196(1994). 
[ 2] Manella C.A. Trends Biochem. Sci. 17:315-320(1992). 
[ 3] Dihanich M. Experientia 46:146-153(1990). 

[ 4] Forte M., Guy H.R., Mannella C.A. J. Bioenerg. Biomembr. 19:341-350(1987). 

[ 5] Sampson M.J., Lovell R.S., Davison D.B., Craigen W.J. Genomics 36:192-196(1996). 

743. Glyco hydor 19 
Chitinases family 19 signatures 

cross-reference(s) CHITINASE_19_1, CHITINASE_19_2 

Chitinases (EC 3.2.1.14) [1] are enzymes that catalyze the hydrolysis of the beta-l,4-N- 
acetyl-D-glucosamine linkages in chitin polymers. From the view point of sequence 
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similarity chitinases belong to either family 18 or 19 in the classification of glycosyl 
hydrolases [2,E1]. Chitinases of family 19 (also known as classes IA or I and IB or II) 
are enzymes from plants that function in the defense against fungal and insect pathogens 
by destroying their chitin-containing cell wall. Class IA/I and IB/II enzymes differ in the 
presence (IA/I) or absence (IB/II) of a N-terminal chitin-binding domain (see the relevant 
entry <PDOC00025>). The catalytic domain of these enzymes consist of about 220 to 230 
amino acid residues. 

Two highly conserved regions were selected as signature patterns, the first one is located in 
the N-terminal section and contains one of the six cysteines which are conserved in most, 
if not all, of these chitinases and which is probably involved in a disulfide bond. 

Consensus patternC-x(4,5)-F-Y-[ST]^ 
Consensus pattern[LIVM]-[GSA]-F-x^ 

[ l]Flach J. ? Pilet P.-E., Jolles P. Experientia 48:701-716(1992). 
[ 2] Henrissat B. Biochem. J. 280:309-316(1991). 

744. MBD 

Methyl-CpG binding domain 

The Methyl-CpG binding domain (MBD) binds to DNA that contains one or more 
symmetrically methylated CpGs [1]. DNA methylation in animals is associated with 
alterations in chromatin structure and silencing of gene expression. MBD has negligible non- 
specific affinity for DNA. In vitro foot-printing with MeCP2 showed the MBD can protect a 
12 nucleotide region surrounding a methyl CpG pair [1]. MBDs are found in several Methyl- 
CpG binding proteins and also DNA demethylase [2]. Number of members: 11 

[l]Medline: 94232813. Dissection of the methyl-CpG binding domain from the chromosomal 
protein MeCP2. Nan X, Meehan RR, Bird A; Nucleic Acids Res 1993;21:4886-4892. 
[2]Medline: 99158138. A mammalian protein with specific demethylase activity for mCpG 
DNA. Bhattacharya SK, Ramchandani S, Cervoni N, Szyf M; Nature 1999;397:579-583. 



745. Peptidase CI 
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Eukaryotic thiol (cysteine) proteases active sites 

cross-reference(s) THIOLPROTEASE_CYS; THIOL J>ROTEASE__HIS; 
THIOL_PROTEASE_ASN 

Eukaryotic thiol proteases (EC 3.4.22.-) [1] are a family of proteolytic enzymes which 
5 contain an active site cysteine. Catalysis proceeds through a thioester intermediate and is 
facilitated by a nearby histidine side chain; an asparagine completes the essential catalytic 
triad. The proteases which are currently known to belong to this family are listed below 
(references are only provided for recently determined sequences). 

- Vertebrate lysosomal cathepsins B (EC 3.4.22.1), H (EC 3.4.22.16), L (EC 3.4.22.15), 
10 and S (EC 3.4.22.27) [2]. 

- Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as cathepsin C) 

[2]. 

- Vertebrate calpains (EC 3.4.22.17). Calpains are intracellular calcium- activated thiol 
protease that contain both a N-terminal catalytic domain and a C-terminal calcium-binding 

1 5 domain. 

- Mammalian cathepsin K, which seems involved in osteoclastic bone resorption [3], 

- Human cathepsin O [4]. 

- Bleomycin hydrolase. An enzyme that catalyzes the inactivation of the antitumor drug 
BLM (a glycopeptide). 

2 0 - Plant enzymes: barley aleurain (EC 3.4.22.16), EP-B1/B4; kidney bean EP-C1, rice bean 
SH-EP; kiwi fruit actinidin (EC 3.4.22.14); papaya latex papain (EC 3.4.22.2), 
chymopapain (EC 3.4.22.6), caricain (EC 3.4.22.30), and proteinase IV (EC 3.4.22.25); 
pea turgor-responsive protein 15A; pineapple stem bromelain (EC 3.4.22.32); rape COT44; 
rice oryzain alpha, beta, and gamma; tomato low-temperature induced, Arabidopsis 

25 thalianaA494,RD19Aand RD21A. 

- House-dust mites allergens DerPl and EurMl. 

- Cathepsin B-like proteinases from the worms Caenorhabditis elegans (genes gcp-1, cpr- 
3, cpr-4, cpr-5 and cpr-6), Schistosoma mansoni (antigen SM31) and Japonica (antigen 
SJ31), Haemonchus contortus (genes AC-1 and AC-2), and Ostertagia ostertagi (CP-1 and 

30 CP-3). 

- Slime mold cysteine proteinases CP1 and CP2. 

- Cruzipain from Trypanosoma cruzi and brucei. 

- Trophozoite cysteine proteinase (TCP) from various Plasmodium species. 

- Proteases from Leishmania mexicana, Theileria annulata and Theileria parva. 
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- Baculoviruses cathepsin-like enzyme (v-cath). 

- Drosophila small optic lobes protein (gene sol), a neuronal protein that contains a 
calpain-like domain. 

- Yeast thiol protease BLH1/YCP1/LAP3. 

- Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like protein. 
Two bacterial peptidases are also part of this family: 

- Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 

- Thiol protease tpr from Porphyromonas gingivalis. 

Three other proteins are structurally related to this family, but may have lost their 
proteolytic activity. 

- Soybean oil body protein P34. This protein has its active site cysteine replaced by a 
glycine. 

- Rat testin, a Sertoli cell secretory protein highly similar to cathepsin L but with the 
active site cysteine is replaced by a serine. Rat testin should not be confused with mouse 
testin which is a LIM-domain protein (see <PDOC00382>). 

- Plasmodium falciparum serine-repeat protein (SERA), the major blood stage antigen. 
This protein of 111 Kd possesses a C-terminal thiol-protease-like domain [6], but the active 
site cysteine is replaced by a serine. 

The sequences around the three active site residues are well conserved and can be used as 
signature patterns. 

Consensus patternQ-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC]-[STAGCV] [C is 
the active site residue] 

Note the residue in position 4 of the pattern is almost always cysteine; the only exceptions are 
calpains (Leu), bleomycin hydrolase (Ser) and yeast YCP1 (Ser). Note the residue in position 
5 of the pattern is always Gly except in papaya protease IV where it is Glu. 
Consensus pattem[LIVMGSTAN]-x-H-^ 
[H is the active site residue] 

Consensus pattem[FYCH]-[WI]-[LIVT]-x-[KRQAG]-N-[ST]-W-x(3)-[FW^ 
[LFYW]-[LIVMFYG]-x-[LIVMF] [N is the active site residue] 
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Note these proteins belong to family CI (papain-type) and C2 (calpains) in the classification 
of peptidases [7 ? E1]. 

[ l]Dufour E. Biochimie 70:1335-1342(1988). 

[ 2]Kirschke H. ? Barrett A.J., Rawlings NJD. Protein Prof. 2:1587-1643(1995). 

[ 3]Shi G.-P., Chapman H.A., Bhairi S.M., Deleeuw C, Reddy V.Y., Weiss S.J. FEBS Lett. 

357:129-134(1995). 

[ 4]Velasco G., Ferrando A.A. ? Puente X.S., Sanchez L.M., Lopez-Otin C. J. Biol. Chem. 
269:27136-27142(1994). 

[ 5]Chapot-Chartier M.P., Nardi M., Chopin M.C., Chopin A., Gripon J.C. Appl. Environ. 
Microbiol. 59:330-333(1993). 

[ 6]Higgins D.G., McConnell D.J., Sharp P.M. Nature 340:604-604(1989). 
[ 7]Rawlings N.D., Barrett AJ. Meth. Enzymol. 244:461-486(1994). 

746. Peptidase M22 

Glycoprotease family signature cross-reference(s) GLYCOPROTEASE 
Glycoprotease (GCP) (EC 3.4.24.57) [1], or o-syaloglycoprotein endopeptidase, 
is a metalloprotease secreted by Pasteurella haemolytica which specifically 
cleaves O-sialoglycoproteins such as glycophorin A. The sequence of GCP is 
highly similar to the following uncharacterized proteins: 

- Escherichia coli hypothetical protein ygjD (ORF-X). 

- Bacillus subtilis hypothetical protein ydiE. 

- Mycobacterium leprae hypothetical protein U229E. 

- Mycobacterium tuberculosis hypothetical protein MtCY78.10. 

- Synechocystis strain PCC 6803 hypothetical protein slr0807. 

- Methanococcus jannaschii hypothetical protein MJ1130. 

- Haloarcula marismortui hypothetical protein in HSH 3 'region. 

- Yeast hypothetical protein YKR038c. 

- Yeast hypothetical protein QRI7. 

One of the conserved regions contains two conserved histidines. It is possible 
that this region is involved in coordinating a metal ion such as zinc. 
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Consensus pattern[KR]-[GSAT]-x(4)-[FYWLH]-[DQNGK]-x-P-x-[LIVMFY]-x(3)-H- 
x(2)-[AG]-H-[LIVM] 

Note these proteins belong to family M22 in the classification of 
peptidases [2,E1]. 

[ l]Abdullah K.M., Lo R.Y.C., Mellors A. J. Bacteriol. 173:5597-5603(1991). 
[ 2]Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

747. SAM. SAM domain (Sterile alpha motif) 

It has been suggested that SAM is an evolutionarily conserved protein binding domain that is 
involved in the regulation of numerous developmental processes in diverse eukaryotes. The 
SrM domain can potentially function as a protein interaction module through its ability to 
homo- and heterooligomerise with other SAM domains. Number of members: 81 

[l]Medline: 96100659 SAM: A novel motif in yeast sterile alpha and Drosophila 
polyhomeotic proteins Ponting CP; Prot Sci 1995;4:1928-1930. 

[2]Medline: 97160498 SAM as a protein interaction domain involved in developmental 
regulation. Shultz J, Ponting CP, Hofmann K, Bork P; Prot Sci 1997;6:249-253. 
[3]Medline: 99101382 The crystal structure of an Eph receptor SAM domain reveals a 
mechanism for modular dimerization. Reference Author: Stapleton D ? Balan I, Pawson 
T, Sicheri F; Nat Struct Biol 1999;6:44-49. 

748. Tyrosinase signatures cross-reference(s) TYROSINASE^; TYROSINASE_2 
Tyrosinase (EC 1.14.18.1) [1] is a copper monooxygenases that catalyzes the 
hydroxylation of monophenols and the oxidation of o-diphenols to o-quinols. 
This enzyme, found in prokaryotes as well as in eukaryotes, is involved in the 
formation of pigments such as melanins and other polyphenolic compounds. 

Tyrosinase binds two copper ions (CuA and CuB). Each of the two copper ion has 
been shown [2] to be bound by three conserved histidines residues. The regions 
around these copper-binding ligands are well conserved and also shared by some 
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hemocyanins, which are copper-containing oxygen carriers from the hemolymph of 
many molluscs and arthropods [3,4]. 

At least two proteins related to tyrosinase are known to exist in mammals: 

5 

- TRP-1 (TYRP1) [5], which is responsible for the conversion of 5,6-dihydro- 
xyindole-2-carboxylic acid (DHICA) to indole-5 ? 6-quinone-2-carboxylic acid. 

- TRP-2 (TYRP2) [6], which is the melanogenic enzyme DOPAchrome tautomerase 
(EC 5.3.3.12) that catalyzes the conversion of DOPAchrome to DHICA. TRP-2 

1 0 differs from tyrosinases and TRP-1 in that it binds two zinc ions instead 
of copper [7]. 

Other proteins that belong to this family are: 

1 5 - Plants polyphenol oxidases (PPO) (EC 1.10.3.1) which catalyze the oxidation 
of mono- and o-diphenols to o-diquinones [8]. 

- Caenorhabditis elegans hypothetical protein C02C2.1. 

Two signature patterns for tyrosinase and related proteins have been derived 
2 0 The first one contains two of the histidines that bind CuA, and is located in 
the N-terminal section of tyrosinase. The second pattern contains a histidine 
that binds CuB, that pattern is located in the central section of the enzyme. 

Consensus pattern H-x(4 ? 5)-F-[LIVMFTP]-x-[FW]-H-R-x(2)-[LM]-x(3)-E 

2 5 [The two H's are copper ligands] 

Consensus patternD-P-x-F-[LIVMFYW]-x(2)-H-x(3)-D [H is a copper 

ligand] 

[ l]Lerch K. Prog. Clin. Biol. Res. 256:85-98(1988). 

3 0 [ 2]Jackman M.P., Hajnal A., Lerch K. Biochem. J. 274:707-713(1991). 

[ 3]Linzen B. Naturwissenschaften 76:206-211(1989). 

[ 4]Lang W.H., van Holde K.E. Proc. Natl. Acad. ScL U.S.A. 88:244-248(1991). 

[ 5]Kobayashi T., Urabe K., Winder A., Jimenez-Cervantes C. ? Imokawa G. ? Brewington T. ? 

Solano F., Garcia-Borron J.C., Hearing VJ. EMBO J. 13:5818-5825(1994). 
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[ 6]Jackson I.J., Chambers D.M., Tsukamoto K., Copeland N.G., Gilbert D.J., Jenkins NA, 
Hearing V. EMBO J. 11:527-535(1992). 

[ 7]Solano F., Martinez-Liarte J.H., Jimenez-Cervantes C, Garcia-Borron J.C., Lozano J.A. 

Biochem. Biophys. Res. Commun. 204:1243-1250(1994). 

[ 8]Cary J.W., Lax A.R., Flurkey W.H. Plant Mol. Biol. 20:245-253(1992). 

749. (Mur Ligase) Folylpolyglutamate synthase signatures 

Folylpolyglutamate synthase (EC 6.3.2.17) (FPGS) [1] is the enzyme of folate metabolism 
that catalyzes ATP-dependent addition of glutamate moieties to tetrahydrofolate. 

Its sequence is moderately conserved between prokaryotes (gene folC) and eukaryotes. 
We developed two signature patterns based on the conserved regions which are rich in 
glycine residues and could play a role in the catalytical 
activity and/or in substrate binding. 

Description of pattern(s) and/or profile(s) 

Consensus pattern[LIVMFY]-x-[LIVM]-[STAG]-G-T-[NK]-G-K-x-[ST]-x(7)- [LIVM](2)- 
x(3)-[GSK] 

Consensus pattern[LIVMFY](2)-E-x-G-[LIVM]-[GA]-G-x(2)-D-x-[GST]-x-[UVM](2) 

[ l]Shane B. ? Garrow T. ? Brenner A. ? Chen L., Choi Y.J., Hsu J.C., Stover P. Adv. Exp. Med. 
Biol. 338:629-634(1993). 

750. (Peptidase M3) Neutral zinc metallopeptidases, zinc-binding region signature 
The majority of zinc-dependent metallopeptidases (with the notable exception of the 
carboxypeptidases) share a common pattern of primary structure [1,2,3] in the part of their 
sequence involved in the binding of zinc, and can be grouped together as a 
superfamily,known as the metzincins, on the basis of this sequence similarity. They can be 
classified into a number of distinct families [4 ? E1] which are listed below along with the 
proteases which are currently known to belong to these families. 



Family Ml 
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- Bacterial aminopeptidase N (EC 3.4.11.2) (gene pepN). 

- Mammalian aminopeptidase N (EC 3.4.11.2). 

- Mammalian glutamyl aminopeptidase (EC 3.4.11.7) (aminopeptidase A). It may play a 
role in regulating growth and differentiation of early B-lineage cells. 

- Yeast aminopeptidase yscll (gene APE2). 

- Yeast alanine/arginine aminopeptidase (gene AAP1). 

- Yeast hypothetical protein YIL137c. 

- Leukotriene A-4 hydrolase (EC 3.3.2.6). This enzyme is responsible for the hydrolysis of 
an epoxide moiety of LTA-4 to form LTB-4; it has been shown that it binds zinc and is 
capable of peptidase activity. 

Family M2 

- Angiotensin-converting enzyme (EC 3.4.15.1) (dipeptidyl carboxypeptidase I) (ACE) the 
enzyme responsible for hydrolyzing angiotensin I to angiotensin IL There are two forms 
of ACE: a testis-specific isozyme and a somatic isozyme which has two active centers. 

Family M3 

- Thimet oligopeptidase (EC 3.4.24.15), a mammalian enzyme involved in the cytoplasmic 
degradation of small peptides. 

- Neurolysin (EC 3.4.24.16) (also known as mitochondrial oligopeptidase M or microsomal 
endopeptidase). 

- Mitochondrial intermediate peptidase precursor (EC 3.4.24.59) (MIP). It is involved the 
second stage of processing of some proteins imported in the mitochondrion. 

- Yeast saccharolysin (EC 3.4.2437) (proteinase yscD). 

-Escherichia coli and related bacteria dipeptidyl carboxypeptidase (EC 3.4.15.5) (gene 
dcp). 

- Escherichia coli and related bacteria oligopeptidase A (EC 3.4.24.70) (gene opdA or prlC). 

- Yeast hypothetical protein YKL134c. 

Family M4 

- Thermostable thermolysins (EC 3.4.24.27), and related thermolabile neutral proteases 
(bacillolysins) (EC 3.4.24.28) from various species of Bacillus. 

- Pseudolysin (EC 3.4.24.26) from Pseudomonas aeruginosa (gene lasB). 

- Extracellular elastase from Staphylococcus epidermidis. 
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- Extracellular protease prtl from Erwinia carotovora. 

- Extracellular minor protease smp from Serratia marcescens. 

- Vibriolysin (EC 3.4.24.25) from various species of Vibrio. 

- Protease prtA from Listeria monocytogenes. 

- Extracellular proteinase pro A from Legionella pneumophila. 

Family M5 

- Mycolysin (EC 3.4.24.31) from Strep tomyces cacaoi. 
Family M6 

- Immune inhibitor A from Bacillus thuringiensis (gene ina). Ina degrades two classes of 
insect antibacterial proteins, attacins and cecropins. 

Family M7 

- Streptomyces extracellular small neutral proteases 
Family M8 

- Leishmanolysin (EC 3.4.24.36) (surface glycoprotein gp63), a cell surface protease from 
various species of Leishmania. 

Family M9 

- Microbial collagenase (EC 3.4.24.3) from Clostridium perfringens and Vibrio 
alginolyticus. 

Family M10A 

- Serralysin (EC 3.4.24.40), an extracellular metalloprotease from Serratia. 

- Alkaline metalloproteinase from Pseudomonas aeruginosa (gene aprA). 

- Secreted proteases A, B, C and G from Erwinia chrysanthemi. 

- Yeast hypothetical protein YIL108w. 

Family M10B 

- Mammalian extracellular matrix metalloproteinases (known as matrixins) [5]: MMP-1 (EC 
3.4.24.7) (interstitial collagenase), MMP-2 (EC 3.4.24.24) (72 Kd gelatinase), MMP-9 (EC 
3.4.24.35) (92 Kd gelatinase), MMP-7 (EC 3.4.24.23) (matrylisin), MMP-8 (EC 3.4.24.34) 
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(neutrophil collagenase), MMP-3 (EC 3.4.24.17) (stromelysin-1), MMP-10 (EC 3.4.24.22) 
(stromelysin-2), and MMP-11 (stromelysin-3), MMP-12 (EC 3.4.24.65) (macrophage 
metalloelastase). 

- Sea urchin hatching enzyme (envelysin) (EC 3.4.24.12). A protease that allows the 
embryo to digest the protective envelope derived from the egg extracellular matrix. 

- Soybean metalloendoproteinase 1. 

Family Mil 

- Chlamydomonas reinhardtii gamete lytic enzyme (CLE). 
Family M12A 

- Astacin (EC 3.4.24.21), a crayfish endoprotease. 

-MeprinA (EC 3.4.24.18), a mammalian kidney and intestinal brush border 
metalloendopeptidase. 

- Bone morphogenic protein 1 (BMP-1), a protein which induces cartilage and bone 
formation and which expresses metalloendopeptidase activity. The Drosophila homolog 
of BMP-1 is the dorsal-ventral patterning protein tolloid. 

- Blastula protease 10 (BP10) from Paracentrotus lividus and the related protein SpAN 
from Strongylocentrotus purpuratus. 

- Caenorhabditis elegans protein toh-2. 

- Caenorhabditis elegans hypothetical protein F42A10.8. 

- Choriolysins L and H (EC 3.4.24.67) (also known as embryonic hatching proteins LCE 
and HCE) from the fish Oryzias lapides. These proteases participates in the breakdown 
of the egg envelope, which is derived from the egg extracellular matrix, at the time of 
hatching. 

Family M12B 

- Snake venom metalloproteinases [6]. This subfamily mostly groups proteases that act in 
hemorrhage. Examples are: adamalysin II (EC 3.4.24.46), atrolysin C/D (EC 
3.4.24.42), atrolysin E (EC 3.4.24.44), fibrolase (EC 3.4.24.72), trimerelysin I (EC 
3.4.25.52) and II (EC 3.4.25.53). 

- Mouse cell surface antigen MS2. 



Family M13 
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- Mammalian neprilysin (EC 3,4.24.11) (neutral endopaptidase) (NEP). 

- Endothelin-converting enzyme 1 (EC 3.4.24.71) (ECE-1), which process the precursor of 
endothelin to release the active peptide. 

- Kell blood group glycoprotein, a major antigenic protein of erythrocytes. The Kell protein 
5 is very probably a zinc endopeptidase. 

- Peptidase O from Lactococcus lactis (gene pepO). 

Family M27 

- Clostridial neurotoxins, including tetanus toxin (TeTx) and the various botulinum toxins 
1 0 (BoNT). These toxins are zinc proteases that block neurotransmitter release by 

proteolytic cleavage of synaptic proteins such as synaptobrevins, syntaxin and SNAP-25 

Family M30 

1 5 - Staphylococcus hyicus neutral metalloprotease. 
Family M32 

- Thermostable carboxypeptidase 1 (EC 3.4.17.19) (carboxypeptidase Taq) ? an enzyme 
from Thermus aquaticus which is most active at high temperature. 

20 

Family M34 

- Lethal factor (LF) from Bacillus anthracis, one of the three proteins composing the 
anthrax toxin. 

25 Family M35 

- Deuterolysin (EC 3.4.24.39) from Penicillium citrinum and related proteases from various 
species of Aspergillus. 

Family M36 

30 - Extracellular elastinolytic metalloproteinases from Aspergillus. 

From the tertiary structure of thermolysin, the position of the residues acting as zinc 
ligands and those involved in the catalytic activity are known. Two of the zinc ligands are 
histidines which are very close together in the sequence; C-terminal to the first histidine is 
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a glutamic acid residue which acts as a nucleophile and promotes the attack of a water 
molecule on the carbonyl carbon of the substrate. A signature pattern which includes the 
two histidine and the glutamic acid residues is sufficient to detect this superfamily of 
proteins. 

5 

Description of pattern(s) and/or profile(s) 

Consensus pattern[GSTALIVN]-x(2)-H-E-[LIVMFYW]-{DEHRKP}-H-x- 
[LIVMFYWGSPQ] [The 

two H f s are zinc ligands] [E is the active site residue] 
1 0 Sequences known to belong to this class detected by the patternALL, 

except for members of families M5, M7 amd Mil. 

Other sequence(s) detected in SWISS-PROT55; including Neurospora 

crassa conidiation-specific protein 13 which could be a 

zinc-protease. 
15 [ l]Jongeneel C.V., Bouvier J. ? Bairoch A. 

FEBS Lett. 242:211-214(1989). 

[ 2]Murphy G.J.P., Murphy G., Reynolds JJ. 

FEBS Lett. 289:4-7(1991). 

[ 3]Bode W., Grams R, Reinemer P., Gomis-Rueth F.-X., Baumann U., McKay 
2 0 D.B., Stoecker W. 

Zoology 99:237-246(1996). 
[ 4]Rawlings N.D., Barrett A.J. 
Meth. Enzymol. 248:183-228(1995). 
[ 5]Woessner J. Jr. 

2 5 FASEB J. 5:2145-2154(1991). 

[ 6]Hite L.A., Fox J.W., Bjarnason LB. 
[ 7]Montecucco C, Schiavo G. 
Trends Biochem. ScL 18:324-327(1993). 
[ 8]Niemann H. ? Blasi L, Jahn R. 

3 0 Trends Cell Biol. 4:179-185(1994). 

751. PseudoU_synt_l 
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tRNA pseudouridine synthase is involved in the formation of pseudouridine at the anticodon 
stem and loop of transfer-RNAs Pseudouridine is an isomer of uridine (5-(beta-D- 
ribofuranosyl) uracil, and id the most abundant modified nucleoside found in all cellular 
RNAs. The TruA-like proteins also exhibit a conserved sequence with a strictly conserved 
5 aspartic acid, likely involved in catalysis. Number of members: 25 

[l]Medline: 98254513. Transfer RNA-pseudouridine synthetase Pusl of Saccaromyces 
cerevisiae contains one atom of zinc essential for its native conformation and tRNA 
recognition. Arluison V, Hountondji C, Robert B, Grosjean H; Biochemistry 1998;37:7268- 
10 7276. 

752. EPSP synthase signatures 

EPSP synthase (3-phosphoshikimate 1-carboxyvinyltransferase) (EC 2.5.1.19) catalyzes the 
1 5 sixth step in the biosynthesis from chorismate of the aromatic amino acids (the shikimate 
pathway) in bacteria (gene aroA), plants and fungi (where it is part of a multifunctional 
enzyme which catalyzes five consecutive steps in this pathway) [1]. EPSP synthase has been 
extensively studied as it is the target of the potent herbicide glyphosate which inhibits the 
enzyme. 

20 

The sequence of EPSP from various biological sources shows that the structure of the enzyme 
has been well conserved throughout evolution. Two conserved regions were selected as 
signature patterns. The first pattern corresponds to a region that is part of the active site and 
which is also important for the resistance to glyphosate [2]. The second pattern is located in 

2 5 the C-terminal part of the protein and contains a conserved lysine which seems to be 

important for the activity of the enzyme. 

Description of pattern(s) and/or profile(s) 

3 0 Consensus pattern[LIVM]-x(2)-[GN]-N-[SA]-G-T-[STA]-x-R-x-[LIVMY]-x-[GSTA] 

Consensus pattern[KR]-x-[KM]-E-[CST]-[DNE]-R-[LIVM]-x-[STA]-[LIVMC] 
[LIVMF]-x~[KRA]-[LIVMF]-G 
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[ l]Stallings W.C., Abdel-Megid S.S., Lim L.W., Shieh H.-S., Dayringer H.E., Leimgruber 
N.K., Stegeman R.A., Anderson K.S., Sikorski J.A., Padgette S.R., Kishore G.M. Proc. 
Natl. Acad. Sci. U.S.A. 88:5046-5050(1991). 

[ 2]Padgette S.R., Re D.B., Gaser C.S., Eicholtz D.A., Frazier R.B., Hironaka CM., Levine 
E.B., Shah D.M., Fraley R.T., Kishore G.M. J. Biol. Chem. 266:22364-22369(1991). 

753. Glyco_hydro_18 

Glycosyl hydrolases family 18. Number of members: 173 

[lJMedline: 95219379. Crystal structure of a bacterial chitinase at 2.3 A resolution. Perrakis 
A, Tews I, Dauter Z, Oppenheim AB, Chet I, Wilson KS, Vorgias CE; Structure 
1994;2:1169-1180. 

754. Esterase 
Putative esterase 

This family contains Esterase D Swiss:P10768. However it is not clear if all members of the 
family have the same function. This family is possibly related to the COesterase family. 
Number of members: 36 

755. (HMA) Heavy-metal-associated domain 

A conserved domain of about 30 amino acid residues has been found [1] in a number of 
proteins that transport or detoxify heavy metals. This domain contains two conserved 
cysteines that could be involved in the binding of these metals. The domain has been 
termed Heavy-Metal-Associated (HMA). It has been found in: 
- A variety of cation transport ATPases (E1-E2 ATPases) (see <PDOC00139>). The 
human copper ATPAses ATP7A and ATP7B which are respectively involved in 
Menke's and Wilson's diseases. ATP7A and ATP7B both contain 6 tandem copies of the 
HMA domain. The copper ATPases CCC2 from budding yeast, copA from 
Enterococcus faecalis and synA from Synechococcus contain one copy of the HMA 
domain. The cadmium ATPases cadA from Bacillus firmus and from plasmid pI258 
from Staphylococcus aureus also contain a single HMA domain, while a chromosomal 
Staphylococcus aureus cadA contains two copies. Other, less characterized ATPases 
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that contain the HMA domain are: fixl from Rhizobium meliloti, pacS from 
Synechococcus strain PCC 7942), Mycobacterium leprae ctpA and ctpB and 
Escherichia coli hypothetical protein yhhO. In all these ATPases the HMA domain(s) 
are located in the N-terminal section. 

- Mercuric reductase (EC 1.16.1.1) (gene merA) which is generally encoded by plasmids 
carried by mercury-resistant Gram-negative bacteria. Mercuric reductase is a class- 1 
pyridine nucleotide-disulphide oxidoreductase (see <PDOC00073>). There is 
generally one HMA domain (with the exception of a chromosomal merA from 
Bacillus strain RC607 which has two) in the N-terminal part of merA. 

- Mercuric transport protein periplasmic component (gene merP), also encoded by 
plasmids carried by mercury-resistant Gram-negative bacteria. It seems to be a 
mercury scavenger that specifically binds to one Hg(2+) ion and which passes it to 
the mercuric reductase via the merT protein. The N-terminal half of merP is a HMA 
domain. 

- Helicobacter pylori copper-binding protein copP. 

- Yeast protein ATX1 [2], which could act in the transport and/or partitioning of 
copper. 

The consensus pattern for HMA spans the complete domain. 
Description of pattern(s) and/or profile(s) 

Consensus pattern[LIVN]-x(2)-[LIVMFA]-x-C-x-[STAGCDNH]-C-x(3)-[LIVFG]-x(3)- 
[LIV]-x(9,ll)-[IVA]-x-[LVFYS] [The two C's probably bind metals] 

[ l]Bull P.C., Cox D.W. Trends Genet. 10:246-252(1994). 

[ 2]Lin S.-J., Culotta VX. Proc. Natl. Acad. Sci. U.S.A. 92:3784-3788(1995). 

756. (Peptidase M10) Matrixins cysteine switch 
PROSITE cross-reference(s): CYSTEINE_SWITCH 

Mammalian extracellular matrix metalloproteinases (EC 3.4.24.-), also known as matrixins 
[1] (see <PDOC00129>) ? are zinc-dependent enzymes. They are secreted by cells in an 
inactive form (zymogen) that differs from the mature enzyme by the presence of an N- 
terminal propeptide. A highly conserved octapeptide is found two residues downstream of 
the C-terminal end of the propeptide. This region has been shown to be involved in 
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autoinhibition of matrixins [2,3]; a cysteine within the octapeptide chelates the active site 
zinc ion, thus inhibiting the enzyme. This region has been called the 'cysteine switch' or 
'autoinhibitor region'. 

A cysteine switch has been found in the following zinc proteases: 

- MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24.17) (stromelysin-1). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

- MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 

- MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). 

- MMP-10 (EC 3.4.24.22) (stromelysin-2). 

- MMP-11 (EC 3.4.24.-) (stromelysin-3). 

- MMP-12 (EC 3.4.24.65) (macrophage metalloelastase). 

- MMP-13 (EC 3.4.24.-) (collagenase 3). 

- MMP-14 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 1). 

- MMP-15 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 2). 

- MMP-16 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 3). 

- Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4]. 

- Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5]. 

Description of pattern(s) and/or profile(s) 

Consensus patternP-R-C-[GN]-x-P-[DR]-[LIVSAPKQ] [C chelates the zinc ion] 
[ l]Woessner J. Jr. FASEB J. 5:2145-2154(1991). 

[ 2]Sanchez-Lopez R., Nicholson R., Gesnel M.C., Matrisian L.M., Breathnach R. J. Biol. 
Chem. 263:11892-11899(1988). 

[ 3]Park A.J., Matrisian L.M., Kells A.F., Pearson R., Yuan Z., Navre M. J. Biol. Chem. 
266:1584-1590(1991). 

[ 4]Lepage T., Gache C. EMBO J. 9:3003-3012(1990). 

[ 5]Kinoshita T., Fukuzawa H., Shimada T., Saito T., Matsuda Y. Proc. Natl. Acad. Sci. 
U.S.A. 89:4693-4697(1992). 
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757. (Peptidase S8) Serine proteases, subtilase family, active sites 

PROSITE cross-reference(s): PS00136; SUBTILASE_ASP ? PS00137; SUBTILASE_HIS, 
PS00138; SUBTILASE SER 

Subtilases [1,2] are an extensive family of serine proteases whose catalytic activity is 
provided by a charge relay system similar to that of the trypsin family of serine proteases 
but which evolved by independent convergent evolution. The sequence around the 
residues involved in the catalytic triad (aspartic acid, serine and histidine) are completely 
different from that of the analogous residues in the trypsin serine proteases and can be 
used as signatures specific to that category of proteases. 
The subtilase family currently includes the following proteases: 

- Subtilisins (EC 3.4.21.62), these alkaline proteases from various Bacillus species have 
been the target of numerous studies in the past thirty years. 

- Alkaline elastase YaB from Bacillus sp. (gene ale). 

- Alkaline serine exoprotease A from Vibrio alginolyticus (gene proA). 

- Aqualysin I from Thermus aquaticus (gene pstl). 

- AspA from Aeromonas salmonicida. 

- Bacillopeptidase F (esterase) from Bacillus subtilis (gene bpf). 

- C5A peptidase from Streptococcus pyogenes (gene scpA). 

- Cell envelope-located proteases PI, PII, and PHI from Lactococcus lactis. 

- Extracellular serine protease from Serratia marcescens. 

- Extracellular protease from Xanthomonas campestris. 

- Intracellular serine protease (ISP) from various Bacillus. 

- Minor extracellular serine protease epr from Bacillus subtilis (gene epr). 

- Minor extracellular serine protease vpr from Bacillus subtilis (gene vpr). 

- Nisin leader peptide processing protease nisP from Lactococcus lactis. 

- Serotype-specific antigene 1 from Pasteurella haemolytica (gene ssal). 

- Thermitase (EC 3.4.21.66) from Thermoactinomyces vulgaris. 

- Calcium-dependent protease from Anabaena variabilis (gene prcA). 

- Halolysin from halophilic bacteria sp. 172pl (gene hly). 

- Alkaline extracellular protease (AEP) from Yarrowia lipolytica (gene xpr2). 

- Alkaline proteinase from Cephalosporium acremonium (gene alp). 

- Cerevisin (EC 3.4.21.48) (vacuolar protease B) from yeast (gene PRB1). 

- Cuticle-degrading protease (prl) from Metarhizium anisopliae. 

- KEX-1 protease from Kluyveromyces lactis. 
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- Kexin (EC 3.4.21.61) from yeast (gene KEX-2). 

- Oryzin (EC 3.4.21.63) (alkaline proteinase) from Aspergillus (gene alp). 

- Proteinase K (EC 3.4.21.64) from Tritirachium album (gene proK). 

- Proteinase R from Tritirachium album (gene proR). 

- Proteinase T from Tritirachium album (gene proT). 

- Subtilisin-like protease III from yeast (gene YSP3). 

- Thermomycolin (EC 3.4.21.65) from Malbranchea sulfurea. 

- Furin (EC 3.4.21.85), neuroendocrine convertases 1 to 3 (NEC-1 to -3) and PACE4 
protease from mammals, other vertebrates, and invertebrates. These proteases are involved 
in the processing of hormone precursors at sites comprised of pairs of basic amino acid 
residues [3]. 

- Tripeptidyl-peptidase II (EC 3.4.14.10) (tripeptidyl aminopeptidase) from Human. 

- Prestalk-specific proteins tagB and tagC from slime mold [4]. Both proteins consist of two 
domains: a N-terminal subtilase catalytic domain and a C-terminal ABC transporter domain 
(see <PDOC00185>). 

Description of pattern(s) and/or profile(s) 

Consensus pattern[STAIV]-x-[LIVM^ IP 

is the active site residue] 

Consensus patten^^ 

[H is the active site residue] 

Consensus patternG-T-S-x-[SA]-x-P-x(2)-[STAVC]-[AG] [S is the active site residue] 
Note if a protein includes at least two of the three active site signatures, the probability of it 
being a serine protease from the subtilase family is 100% 
Note these proteins belong to family S8 in the classification of 
peptidases [5,E1], 

[ ljSiezen RJL, de Vos W.M., Leunissen J.A.M., Dijkstra B.W. Protein Eng. 4:719- 
737(1991). 

[ 2]Siezen RJ. (In) Proceeding subtilisin symposium, Hamburg, (1992). 
[ 3]Barr P.J. Cell 66:1-3(1991). 

[ 4]Shaulsky G., Kuspa A., Loomis W.F.; Genes Dev. 9:1111-1122(1995). 
[ 5]Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 
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758. (SSB) Single-strand binding protein family signatures 
PROSITE cross-reference(s): PS00735; SSB_1,PS00736; SSB_2 

The Escherichia coli single-strand binding protein [1] (gene ssb), also known as the helix- 
destabilizing protein, is a protein of 177 amino acids. It binds tightly, as a homotetramer, to 
single-stranded DNA (ss-DNA) and plays an important role in DNA replication, 
recombination and repair. 

Closely related variants of SSB are encoded in the genome of a variety of large self- 
transmissible plasmids. SSB has also been characterized in bacteria such as Proteus mirabilis 
or Serratia marcescens. 

Eukaryotic mitochondrial proteins that bind ss-DNA and are probably involved in 
mitochondrial DNA replication are structurally and evolutionary related to prokaryotic SSB. 
Proteins currently known to belong to this subfamily are listed below [2]. 

- Mammalian protein Mt-SSB (P16). 

- Xenopus Mt-SSBs and Mt-SSBr. 

- Drosophila MtSSB. 

- Yeast protein RIM1. 

Two signature patterns have been developed for these proteins. The first is a conserved 
region in the N-terminal section of the SSB's. The second is a centrally located region which, 
in Escherichia coli SSB, is known to be involved in the binding of DNA. 

Description of pattern(s) and/or profile(s) 
Consensus pattern[LIVMFHNSTH 
[GST]-x-[DET] 

Consensus patternT-x-W-[HY]-[RNS]-[LIVM]-x-[LIVMF]-[FY]-[NGKR] 

[ l]Meyer R.R., Laine P.S. Microbiol. Rev. 54:342-380(1990). 
[ 2]Stroumbakis N.D., Li Z., Tolias P.P. Gene 143:171-177(1994). 

759. KDPG and KHG aldolases active site signatures 
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PROSITE cross-reference(s): PS00159; ALDOLASE_KDPG_KHG_l, PS00160; 
ALDOLASE_KDPG_KHG_2 

4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) (KHG-aldolase) catalyzes the 
interconversion of 4-hydroxy-2-oxoglutarate into pyruvate and glyoxylate. Phospho-2- 
dehydro-3-deoxygluconate aldolase (EC 4.1.2.14) (KDPG-aldolase) catalyzes the 
interconversion of 6-phospho-2-dehydro-3-deoxy-D-gluconate into pyruvate and 
glyceraldehyde 3-phosphate. 

These two enzymes are structurally and functionally related [1]. They are both homotrimeric 
proteins of approximately 220 amino-acid residues. They are class I aldolases whose catalytic 
mechanism involves the formation of a Schiff-base intermediate between the substrate and 
the epsilon-amino group of a lysine residue. In both enzymes, an arginine is required for 
catalytic activity. 

Two signature patterns were developed for these enzymes. The first one contains the active 
site arginine and the second, the lysine involved in the Schiff-base formation. 

Description of pattern(s) and/or profile(s) 

Consensus patternG-[LIVM]-x(3)-E-[LIV]-T-[LF]-R [R is the active site residue] 
Consensus patternG-x(3)-[LIVMF]-K-[LF]-F-P-[SA]-x(3)-G [K is involved in Schiff-base 
formation] 

[ 1] Vlahos C J., Dekker E.E. J. Biol. Chem. 263:11683-11691(1988). 

760. AP endonucleases family 1 signatures. PROSITE cross-reference(s): PS00726; 
AP_NUCLEASE_F1_1, PS00727; AP_NUCLEASE_F1_2, PS00728; 
AP_NUCLEASE_F1_3 

DNA damaging agents such as the antitumor drugs bleomycin and neocarzinostatin or those 
that generate oxygen radicals produce a variety of lesions in DNA. Amongst these is base- 
loss which forms apurinic/apyrimidinic (AP) sites or strand breaks with atypical 3 'termini. 
DNA repair at the AP sites is initiated by specific endonuclease cleavage of the 
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phosphodiester backbone. Such endonucleases are also generally capable of removing 
blocking groups from the 3 'terminus of DNA strand breaks. 

AP endonucleases can be classified into two families on the basis of sequence similarity. 
Family 1 groups the enzymes listed below [1]. 

- Escherichia coli exonuclease III (EC 3.1.11.2) (gene xthA). 

- Streptococcus pneumoniae and Bacillus subtilis exonuclease A (gene exoA). 

- Mammalian AP endonuclease 1 (API) (EC 4.2.99.18). 

- Drosophila recombination repair protein 1 (gene Rrpl). 

- Arabidopsis thaliana apurinic endonuclease-redox protein (gene arp). 

Except for Rrpl and arp, these enzymes are proteins of about 300 amino-acid residues. 
Rrpl and arp both contain additional and unrelated sequences in their N-terminal section 
(about 400 residues for Rrpl and 270 for arp). 

Three signature patterns were developed for this family of enzymes. The patterns are based 
on the most conserved regions. The first pattern contains a glutamate which has been 
shown [2], in the Escherichia coli enzyme to bind a divalent metal ion such as magnesium or 
manganese 

Consensus pattern[APF]-D-[LIVMF](2)-x-[LIVM]-Q-E-x-K [E binds a divalent metal ion] 
Consensus patternD-[ST]-[FY]-R-[KH]-x(7 ? 8)-[FYW]-[ST]-[FYW](2) 
Consensus patternN-x-G-x-R-[LIVM]-D»[LIVMFYH]-x-[LV]-x-S 

[ 1] Barzilay G., Hickson I.S. BioEssays 17:713-719(1995). 

[ 2] Mol CD., Kuo C.-F., Thayer MM., Cunningham R.P., Tainer J.A. Nature 374:381- 
386(1995). 

761. (ER)Enhancer of rudimentary signature, PROSITE cross-reference(s): PS01290; ER 

The Drosophila protein r enhancer of rudimentary' (gene (e(r)) is a small protein of 104 
residues whose function is not yet clear. From an evolutionary point of view, it is highly 
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conserved [1] and has been found to exist in probably all multicellular eukaryotic 
organisms. It has been proposed that this protein plays a role in the cell cycle. 

A conserved region in the central part of the protein was selected as as signaure pattern. 

Consensus patternY-D-I-[SA]-x-L-[FY]-x-F-[IV]-D-x(3)-D-[LIV]-S 

[ 1] Gelsthorpe M., Pulumati M, McCallum C, Dang-Vu K., Tsubota S.I. Gene 186:189- 
195(1997). 

762. (ETF alpha) Electron transfer flavoprotein alpha-subunit signature, PROSITE cross- 
reference^): PS00696; ETF_ALPHA 

The electron transfer flavoprotein (ETF) [1,2] serves as a specific electron acceptor for 
various mitochondrial dehydrogenases. ETF transfers electrons to the main respiratory 
chain via ETF-ubiquinone oxidoreductase. ETF is an heterodimer that consist of an alpha 
and a beta subunit and which bind one molecule of FAD per dimer. A similar system also 
exists in some bacteria. 

The alpha subunit of ETF is a protein of about 32 Kd which is structurally related to the 
bacterial nitrogen fixation protein fixB which could play a role in a redox process and feed 
electrons to ferredoxin. 

Other related proteins are: 

- Escherichia coli hypothetical protein ydiR. 

- Escherichia coli hypothetical protein ygcQ. 

A highly conserved region which is located in the C-terminal section was selected as a 
signature pattern for these proteins. 

Consensus pattern [LI]-Y-[LIVM]-[AT]-x-G-[IV]-[SD]-G-x-[IV]-Q-H-x(2)-G-x(6)-[IV]-x- 
A-[IV]-N 
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[ 1] Finocchiaro G., Ikeda Y., Ito M., Tanaka K. Prog. Clin. Biol. Res. 321:637-652(1990). 
[ 2] Tsai M.H., Saier M.H. Jr. Res. Microbiol. 146:397-404(1995). 

763. (lectin c) C-type lectin domain signature and profile 

PROSITE cross-reference(s): PS00615; C_TYPE_LECTIN_1, PS50041; 

C_TYPE_LECTIN_2 

A number of different families of proteins share a conserved domain which was first 
characterized in some animal lectins and which seem to function as a calcium-dependent 
carbohydrate-recognition domain [1,2,3]. This domain, which is known as the C-type lectin 
domain (CTL) or as the carbohydrate-recognition domain (CRD), consists of about 110 to 
130 residues. There are four cysteines which are perfectly conserved and involved in two 
disulfide bonds. A schematic representation of the CTL domain is shown below. 



xcxxxxcxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxWxCxxxxCx 



************* 



+— -+ +- 



'C: conserved cysteine involved in a disulfide bond, 
'c': optional cysteine involved in a disulfide bond. 
'*': position of the pattern. 



The categories of proteins, in which the CTL domain has been found, are listed below. 

Type-II membrane proteins where the CTL domain is located at the C-terminal extremity of 
the proteins: 

- Asialoglycoprotein receptors (ASGPR) (also known as hepatic lectins) [4]. The ASGPR's 
mediate the endocytosis of plasma glycoproteins to which the terminal sialic acid residue 
in their carbohydrate moieties has been removed. 

- Low affinity immunoglobulin epsilon Fc receptor (lymphocyte IgE receptor), which plays 
an essential role in the regulation of IgE production and in the differentiation of B cells. 



Attorney No. 2750-1237P 

614 

- Kupffer cell receptor. A receptor with an affinity for galactose and fucose, that could 
be involved in endocytosis. 

- A number of proteins expressed on the surface of natural killer T-cells: NKG2, NKR-P1, 
YE1/88 (Ly-49), CD69 and on B-cells: CD72, LyB-2. The CTL- domain in these proteins is 
distantly related to other CTL-domains; it is unclear whether they are likely to bind 
carbohydrates* 

Proteins that consist of an N-terminal collagenous domain followed by a CTL- domain [5], 
these proteins are sometimes called 'collectins 1 : 

- Pulmonary surfactant-associated protein A (SP-A). SP-A is a calcium- 
dependent protein that binds to surfactant phospholipids and contributes to 
lower the surface tension at the air-liquid interface in the alveoli of the 
mammalian lung. 

- Pulmonary surfactant-associated protein D (SP-D). 

- Conglutinin, a calcium-dependent lectin-like protein which binds to a yeast 
cell wall extract and to immune complexes through the complement component 
(iC3b). 

- Mannan-binding proteins (MBP) (also known as mannose-binding proteins). 
MBP's bind mannose and N-acetyl-D-glucosamine in a calcium-dependent 
manner. 

- Bovine collectin-43 (CL-43). 

Selectins (or LEC-CAM) [6,7]. Selectins are cell adhesion molecules implicated in the 
interaction of leukocytes with platelets or vascular endothelium. Structurally, selectins 
consist of a long extracellular domain, followed by a transmembrane region and a short 
cytoplasmic domain. The extracellular domain is itself composed of a CTL-domain, 
followed by an EGF-like domain and a variable number of SCR/Sushi repeats. Known 
selectins are: 

- Lymph node homing receptor (also known as L-selectin, leukocyte adhesion 
molecule-1, (LAM-1), leu-8, gp90-mel, or LECAM-1) 

- Endothelial leukocyte adhesion molecule 1 (ELAM-1, E-selectin or LECAM-2). 
The ligand recognized by ELAM-1 is sialyl-Lewis x. 
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- Granule membrane protein 140 (GMP-140, P-selectin, PADGEM, CD62, or LECAM- 
3). The ligand recognized by GMP-140 is Lewis x. 

Large proteoglycans that contain a CTL-domain followed by one copy of a SCR/ Sushi 
repeat, in their C-terminal section: 

- Aggrecan (cartilage-specific proteoglycan core protein). This proteoglycan 
is a major component of the extracellular matrix of cartilagenous tissues 
where it has a role in the resistance to compression. 

- Brevican. 

- Neurocan. 

- Versican (large fibroblast proteoglycan), a large chondroitin sulfate 
proteoglycan that may play a role in intercellular signalling. 

In addition to the CTL and Sushi domains, these proteins also contain, in their N-terminal 
domain, an Ig-like V-type region, two or four link domains (see <PDOC00955>) and up to 
two EGF-like repeats. 

Two type-I membrane proteins: 

- Mannose receptor from macrophages. This protein mediates the endocytosis of 
glycoproteins by macrophages in several recognition and uptake processes. 
Its extracellular section consists of a fibronectin type II domain followed 

by eight tandem repeats of the CTL domain. 

- 180 Kd secretory phospholipase A2 receptor (PLA2-R). A protein whose 
structure is highly similar to that of the mannose receptor. 

-DEC-205 receptor. This protein is used by dendritic cells and thymic 
epithelial cells to capture and endocytose diverse carbohydrate-binding 
antigens and direct them to antigen-processing cellular compartiments. DEC- 
205 extracellular section consists of a fibronectin type II domain followed 
by ten tandem repeats of the CTL domain. 

- Silk moth hemocytin, an humoral lectin which is involved in a self-defence 
mechanism. It is composed of 2 FA58C domains (see <PDOC00988>), a CTL 
domain, 2 VWFC domains (see <PDOC00928), and a CTCK (see <PDOC00912>). 
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Various other proteins that uniquely consist of a CTL domain: 

- Invertebrate soluble galactose-binding lectins. A category to which belong 
a humoral lectin from a flesh fly; echinoidin, a lectin from the coelomic 
fluid of a sea urchin; BRA-2 and BRA-3 ? two lectins from the coelomic fluid 
of a barnacle, a lectin from the tunicate Polyandrocarpa misakiensis and a 
newt oviduct lectin. The physiological importance of these lectins is not 
yet known but they may play an important role in defense mechanisms. 

- Pancreatic stone protein (PSP) (also known as pancreatic thread protein 
(PTP) ? or reg), a protein that might act as an inhibitor of spontaneous 
calcium carbonate precipitation. 

- Pancreatitis associated protein (PAP), a protein that might be involved in 
the control of bacterial proliferation. 

- Tetranectin, a plasma protein that binds to plasminogen and to isolated 
kringle 4. 

- Eosinophil granule major basic protein (MBP), a cytotoxic protein. 

- A galactose specific lectin from a rattlesnake. 

- Two subunits of a coagulation factor IX/factor X-binding protein (IX/X-bp), 
a snake venom anticoagulant protein which binds with factors IX and X in 
the presence of calcium. 

- Two subunits of a phospholipase A2 inhibitor from the plasma of a snake 
(PLI-A and PLI-B). 

- A lipopolysaccharide-binding protein (LPS-BP) from the hemolymph of a 
cockroach [8]. 

- Sea raven antifreeze protein (AFP) [9]. 

As a signature pattern for this domain, the C-terminal region with its three conserved 
cysteines was selected. 

Consensus patternC-[LIVMFYATG]-x(5,12)-[WL]-x-[DNSR]-x(2)-C-x(5,6)- 
[FYWLIVSTA]-[LIVMSTA]-C [The three Cs are involved in disulfide 
bonds] 

Note all CTL domains have five Trp residues before the second Cys, 
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with the exception of tunicate lectin and cockroach LPS-BP which 
have Leu. 

Note this documentation entry is linked to both a signature pattern 
and a profile. As the profile is much more sensitive than the 
pattern, you should use it if you have access to the necessary 
software tools to do so. 

[ 1] Drickamer K. J. Biol. Chem. 263:9557-9560(1988). 

[2] Drickamer K. Prog. Nucleic Acid Res. Mol. Biol. 45:207-232(1993). 

[ 3] Drickamer K. Curr. Opin. Struct. Biol. 3:393-400(1993). 

[ 4] Spiess M. Biochemistry 29:10009-10018(1990). 

[ 5] Weis W.I., Kahn R., Fourme R., Drickamer K., Hendrickson W.A. Science 254:1608- 
1615(1991). 

[ 6] Siegelman M. Curr. Biol. 1:125-128(1991). 

[ 7] Lasky L.A. Science 238:964-969(1992). 

[ 8] Jomori T., Natori S. J. Biol. Chem. 266:13318-13323(1991). 

[ 9] Ng N.F.L., Hew C.-L. J. Biol. Chem. 267:16069-16075(1992). 

764. (SRCR) Speract receptor repeated domain signature 
PROSITE cross-reference(s): PS00420; SPERACTJtECEPTOR, 

The receptor for the sea urchin egg peptide speract is a transmembrane glycoprotein of 
500 amino acid residues [1]. Structurally it consists of a large extracellular domain of 450 
residues, followed by a transmembrane region and a small cytoplasmic domain of 12 amino 
acids. The extracellular domain contains four repeats of a 115 amino acids domain. There are 
17 positions that are perfectly conserved in the four repeats, among them are six cysteines, 
six glycines, and three glutamates. 

Such a domain is also found, once, in the C-terminal section of mammalian macrophage 
scavenger receptor type I [2], a membrane glycoproteins implicated in the pathologic 
deposition of cholesterol in arterial walls during atherogenesis. 
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The signature pattern that was derived spans part of the N-terminal section of the domain and 
contains 8 of the 17 conserved residues. 

Consensus patternG-x(5)-G-x(2)-E-x(6)-W-G-x(2)-C-x(3)-[FYW]-x(8)-C-x(3)-G 

[ 1] Dangott J.J., Jordan J.E., Bellet R.A., Garbers D.L. Proc. Natl. Acad. Sci. U.S.A. 
86:2128-2132(1989). 

[ 2] Freeman M., Ashkenas J., Rees D.J., Kingsley D.M., Copeland N.G., Jenkins N.A., 
Krieger M. Proc. Natl. Acad. Sci. U.S.A. 87:8810-8814(1990). 

765. Bac_surface_Ag 
Bacterial surface antigen 

This entry includes the following surface antigens; D15 antigen from H.influenzae, OMA87 
from P.multocida, OMP85 from N.meningitidis and N.gonorrhoeae. Number of members: 
14 

[l]Medline: 95255676. The sequencing of the 80-kDa D15 protective surface antigen of 
Haemophilus influenzae. Flack FS, Loosmore S, Chong P, Thomas WR; Gene 1995;156:97- 
99. 

[2] Medline: 96333354. Cloning, sequencing, expression, and protective capacity of the 
oma87 gene encoding the Pasteurella multocida 87-kilodalton outer membrane antigen. 
Ruffolo CG, Adler B; Infect Immun 1996;64:3161-3167. 

766. BRCA1 C Terminus (BRCT) domain 

The BRCT domain is found predominantly in proteins involved in cell cycle checkpoint 
functions responsive to DNA damage. It has been suggested that the Retinoblastoma protein 
contains a divergent BRCT domain, this has not been included in this family. The BRCT 
domain of XRCC1 forms a homodimer in the crystal structure Medline:99016060. This 
suggests that pairs of BRCT domains 

associate as homo- or heterodimers. Number of members: 131 

[1] Medline: 96259550. BRCA1 protein products ...Functional motifs... Koonin EV, Altschul 
SF, Bork P; Nature Genet 1996;13:266-268. 
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[2] Medline: 97153217. From BRCA1 to RAP1: A widespread BRCT module closely 
associated with DNA repair Callebaut I, Mornon JP; Febs lett 1997;400:25-30. 
[3] Medline: 97186552. A superfamily of conserved domains in DNA damage responsive cell 
cycle checkpoint proteins Bork P, Hofmann K, Bucher P, Neuwald AF, Altschul SF, Koonin 
EV; Faseb J 1997;11:68-76. 

[4] Medline: 97402527. Gapped BLAST and PSI-BLAST: a new generation of protein 
database search programs. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller 
W, Lipman DJ; Nucleic Acids Res 1997;25:3389-3402. 

[5] Medline: 99016060. Structure of an XRCC1 BRCT domain: a new protein-protein 
interaction module. Zhang X, Morera S, Bates PA, Whitehead PC, Coffer AI, Hainbucher K, 
Nash RA, Sternberg MJ, Lindahl T, Freemont PS; 

767. Kappa casein 

Kappa-casein is a mammalian milk protein involved in a number of important physiological 
processes. In the gut, the ingested protein is split into an insoluble peptide (para kappa- 
casein) and a soluble hydrophilic glycopeptide (caseinomacropeptide). Caseinomacropeptide 
is responsible for increased efficiency of digestion, prevention of neonate hypersensitivity to 
ingested proteins, and inhibition of gastric pathogens. Number of members: 56 

[1] Medline: 98072500. Nucleotide sequence evolution at the kappa-casein locus: evidence 
for positive selection within the family Bovidae. Ward TJ, Honeycutt RL, Derr JN; Genetics 
1997;147:1863-1872. 

768. Chitinases family 18 active site 
PROSITE cross-reference(s) CHITINASE_18 

Chitinases (EC 3.2.1.14) [1] are enzymes that catalyze the hydrolysis of the beta-l,4-N- 
acetyl-D-glucosamine linkages in chitin polymers. From the view point of sequence 
similarity chitinases belong to either family 18 or 19 in the classification of glycosyl 
hydrolases [2,E1]. Chitinases of family 18 (also known as classes III or V) groups a variety 
of proteins: 
a) Chitinases from: 

- Prokaryotes such as Alteromonas, Bacillus, Serratia, Streptomyces, etc. 

- Plants such as Arabidopsis, cucumber, bean, tobacco, etc. 



Attorney No. 2750-1237P 

620 

- Fungi such as Aphanocladium, Rhizopus, Saccharomyces, etc. 

- Nematode (Brugia malayi). 

- Insects (Manduca sexta). 

- Baculoviruses (Autographa Californica Nuclear Polyhedrosis virus), 
b) Other proteins: 

- Hevamine, a rubber tree protein with chitinase and lysozyme activities. 

- Kluyveromyces lactis killer toxin alpha subunit, which acts as a chitinase. 

- Flavobacterium and Streptomyces endo-beta-N-acetylglucosaminidases (EC 3.2.1.96). 
-Mammalian di-N-acetylchitobiase which is involved in the degradation of asparagine- 
linked glycoproteins. 

- Human cartilage glycoprotein Gp-39. 

- Jack bean concanavalin B (conB), a protein that has lost its catalytic activity. 

Site directed mutagenesis experiments [3] and crystallographic data [4,5] have shown that a 
conserved glutamate is involved in the catalytic mechanism and probably acts as a proton 
donor. This glutamate is at the extremity of the best conserved region in these proteins. 

Consensus pattern[LIVMFY]-[DN]-G-[LIVMF]-[DN]-[LIVMF]-[DN]-x-E [E is the active 
site residue] 

[ 1] Flach L, Pilet P.-E., Jolles P. Experientia 48:701-716(1992). 
[ 2] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 3] Watanabe T., Kohori K. ? Miyashita K., Fujii T., Sakai H., Uchida M., Tanaka H. J. Biol. 
Chem. 268:18567-18572(1993). 

[ 4] Perrakis A., Tews I., Dauter Z., Oppenheim A.B. ? Chet I., Wilson K.S., Vorgias C.R 
Structure 2:1169-1180(1994). 

[ 5] van Scheltinga A.C.T., Kalk K.H. ? Beintema JJ., Dijkstra B.W. Structure 2:1181- 
1189(1994). 

769. gag_pl7. gag gene protein pl7 (matrix protein). 

The matrix protein forms an icosahedral shell associated with the inner membrane of the 
mature immunodeficiency virus. Number of members: 1598 
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[1] Medline: 95055757. Three-dimensional structure of the human immunodeficiency virus 
type 1 matrix protein. Massiah MA, Starich MR, Paschall C, Summers MF, Christensen AM, 
Sundquist WI; J Mol Biol 1994;244:198-223. 

770. GDA1/CD39 family of nucleoside phosphatases signature 
PROSITE cross-reference(s); GDA1__CD39_NTPASE 

A number of nucleoside diphosphate and triphosphate hydrolases as well as some yet 
uncharacterized proteins have been found to belong to the same family [1, 2]. This family 
currently consist of: 

- Yeast guanosine-diphosphatase (EC 3.6.142) (GDPase) (gene GDA1). GDA1 is a golgi 
integral membrane enzyme that catalyzes the hydrolysis of GDP to GMP. 

-Potato apyrase (EC 3.6.1.5) (adenosine diphosphatase) (ADPase). Apyrase acts on both 
ATP and ADP to produce AMP. 

-Mammalian vascular ATP-diphosphohydrolase (EC 3.6.1.5) (also known as lymphoid 
cell activation antigen CD39). 

- Toxoplasma gondii nucleoside-triphosphatases (EC 3.6.1.15) (NTPase). NTPase 
hydrolyses various nucleoside triphosphates to produce the corresponding nucleoside 
mono- and diphosphates. This enzyme is secreted into the invaded host cell into the 
parasitophorous vacuole, a specialized compartment where the parasite intracellular 
resides. 

- Pea nucleoside-triphosphatases (EC 3.6.1.15) (NTPase). 

- Caenorhabditis elegans hypothetical protein C33H5.14. 

- Caenorhabditis elegans hypothetical protein R07E4 A 

- Yeast chromosome V hypothetical protein YEROOSw. 

The above uncharacterized proteins all seem to be membrane-bound. 

All these proteins share a number of conserved domains. The best conserved of these 

domains have been selected. It is located in the central section of the 

proteins. 



Consensus pattern[LIVM]-x-G-x(2)-E-G-x-[FY]-x-[FW]-[LIVA]-[TAG]-x-N-[HY] 
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[ 1] Handa M., Guidotti G. Biochem. Biophys. Res. Commun. 218:916-923(1996). 
[ 2] Vasconcelos E.G., Ferreira S.T., de Carvalho T.M.U., de Souza W., Kettlun A.M., 
Mantilla M., Valenzuela M.A., Verjovski-Almeida S. J. Biol. Chem. 271:22139- 
22145(1996). 

771. GTP cyclohydrolase I signatures 

PROSITE cross-reference(s); GTP_CYCLOHYDROL_l_l, GTP_CYCLOHYDROL_l_2 
GTP cyclohydrolase I (EC 3.5.4.16) catalyzes the biosynthesis of formic acid and 
dihydroneopterin triphosphate from GTP. This reaction is the first step in the biosynthesis of 
tetrahydrofolate in prokaryotes, of tetrahydrobiopterin in vertebrates, and of pteridine- 
containing pigments in insects. 

GTP cyclohydrolase I is a protein of from 190 to 250 amino acid residues. The compaiison 
of the sequence of the enzyme from bacterial and eukaryotic sources shows that the 
structure of this enzyme has been extremely well conserved throughout evolution [1]. 

Two conserved regions were selected as signature patterns. The first contains a perfectly 
conserved tetrapeptide which is part of the GTP-binding pocket [2], the second region also 
contains conserved residues involved in GTP-binding. 

Consensus pattern[DEN]-[LIVM](2)-x(2)-[KRNQ]-[DEN]-[LIVM]-x(3)-[ST]-x-C-E- H-H 
Consensus pattern[SA]-x-[RK]-x-Q-[LIVM]-Q-E-[RN]-[LI]-[TSN] 

[ 1] Maier J., Witter K., Guetlich M., Ziegler I., Werner T., Ninnemann H. Biochem. 
Biophys. Res. Commun. 212:705-711(1995). 

[ 2] Nar H., Huber R., Meining W., Schmid C, Weinkauf S., Bacher A. Structure 3:459- 
466(1995). 

772. IlvC. Acetohydroxy acid isomeroreductase 

Acetohydroxy acid isomeroreductase catalyses the conversion of acetohydroxy acids into 
dihydroxy valerates. This reaction is the second in the synthetic pathway of the essential 
branched side chain amino acids valine and isoleucine. Number of members: 29 
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[1] Medline: 97361822. The crystal structure of plant acetohydroxy acid isomeroreductase 
complexed with NADPH, two magnesium ions and a herbicidal transition state analog 
determined at 1.65 A resolution. Biou V, Dumas R, Cohen-Addad C, Douce R, Job D, Pebay- 
Peyroula E; EMBO J 1997;16:3405-3415. 

773. Prokaryotic membrane lipoprotein lipid attachment site 
PROSITE cross-reference(s); PROKAR_LIPOPROTEIN 

In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptide, 
which is cleaved by a specific lipoprotein signal peptidase (signal peptidase II). The 
peptidase recognizes a conserved sequence and cuts upstream of a cysteine residue to which 
a glyceride-fatty acid lipid is attached [1]. Some of the proteins known to undergo such 
processing currently include (for recent listings see [1,2,3]): 

- Major outer membrane lipoprotein (murein-lipoproteins) (gene lpp). 

- Escherichia coli lipoprotein-28 (gene nip A). 

- Escherichia coli lipoprotein-34 (gene nlpB). 

- Escherichia coli lipoprotein nlpC. 

- Escherichia coli lipoprotein nlpD. 

- Escherichia coli osmotically inducible lipoprotein B (gene osmB). 

- Escherichia coli osmotically inducible lipoprotein E (gene osmE). 

- Escherichia coli peptidoglycan-associated lipoprotein (gene pal). 

- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 

- Escherichia coli copper homeostasis protein cutF (or nlpE). 

- Escherichia coli plasmids traT proteins. 

- Escherichia coli Col plasmids lysis proteins. 

- A number of Bacillus beta-lactamases. 

- Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 

- Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 

- Borrelia hermsii variable major protein 21 (gene vmp21) and 7 (gene vmp7). 

- Chlamydia trachomatis outer membrane protein 3 (gene omp3). 

- Fibrobacter succinogenes endoglucanase cel-3. 

- Haemophilus influenzae proteins Pal and Pep. 

- Klebsiella pullulunase (gene pulA). 

- Klebsiella pullulunase secretion protein pulS. 

- Mycoplasma hyorhinis protein p37. 
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- Mycoplasma hyorhinis variant surface antigens A, B, and C (genes vlpABC). 

- Neisseria outer membrane protein H.8. 

- Pseudomonas aeruginosa lipopeptide (gene lppL). 

- Pseudomonas solanacearum endoglucanase egl. 

- Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 

- Rickettsia 17 Kd antigen. 

- Shigella flexneri invasion plasmid proteins mxiJ and mxiM. 

- Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 

- Treponema pallidium 34 Kd antigen. 

- Treponema pallidium membrane protein A (gene tmpA). 

- Vibrio harveyi chitobiase (gene chb). 

- Yersinia virulence plasmid protein yscJ. 

- Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper- binding 
protein. This is the first archaebacterial protein known to be modified in such a fashion). 

From the precursor sequences of all these proteins, we derived a consensus pattern and a 
set of rules to identify this type of post-translational modification. 

Consensus pattern{DERK}(6)-[LIVMFWSTAG](2)-[LIVMFYSTAGCQ]-[AGS]-C [C is the 
lipid attachment site] Additional rules: 1) The cysteine must be between positions 15 and 35 
of the sequence in consideration. 2) There must be at least one Lys or one Arg in the first 
seven positions of the sequence. 

[ 1] Hayashi S., Wu H.C J. Bioenerg, Biomembr. 22:451-471(1990). 

[ 2]Klein P., Sornorjai R.L., Lau P.C.K. Protein Eng. 2:15-20(1988). 
[ 3]von Heijne G. Protein Eng. 2:531-534(1989). 

[ 4]Mattar S., Scharf B., Kent S.B.H., Rodewald K., Oesterhelt D., Engelhard M. J. Biol. 
Chem. 269:14939-14945(1994). 

774. Aminoacyl-transfer RNA synthetases class-II signatures 

PROSITE cross-reference(s); AA_TRNA_LIGASE_II_1; AAJTRNA_LIGASEJI_2 
Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate 
amino acids and transfer them to specific tRNA molecules as the first step in protein 
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biosynthesis. In prokary otic organisms there are at least twenty different types of 
aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are 
generally two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic 
form and a mitochondrial form. While all these enzymes have a common function, they are 
widely diverse in terms of subunit size and of quaternary structure. 

The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, 
phenylalanine, proline, serine, and threonine are referred to as class-II synthetases [2 to 6] 
and probably have a common folding pattern in their catalytic domain for the binding of 
ATP and amino acid which is different to the Rossmann fold observed for the class I 
synthetases [7]. 

Class-II tRNA synthetases do not share a high degree of similarity, however at least three 
conserved regions are present [2,5,8]. Signature patterns from two of these regions have been 
derived. 

Consensus pattern[FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE] 

Consensus pattern[GSTALVF]-{DENQHRKP}-[GSTA]-[LIVMF]-[DE]-R-[LIVMF]-x- 

[LI VMSTAG] - [LI VMF Y] 

[ ljSchimmel P. Annu. Rev. Biochem. 56:125-158(1987). 
[ 2]Delarue M., Moras D. BioEssays 15:675-687(1993). 
[ 3]Schimmel P. Trends Biochem. Sci. 16:1-3(1991). 

[ 4]Nagel G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
[ 5]Cusack S., Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991). 
[ 6]Cusack S. Biochimie 75:1077-1081(1993). 

[ 7]Cusack S., Berthet-Colominas C, Haertlein M., Nassar N., Leberman R. Nature 347:249- 
255(1990). 

[ 8]Leveque F., Plateau P., Dessen P., Blanquet S. Nucleic Acids Res. 18:305-312(1990). 
775. X. Trans- activation protein X 

This protein is found in hepadnaviruses where it is indispensable for replication. Number of 
members: 91 
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776. Thymidylate synthase active site 

Thymidylate synthase (EC 2.1.1.45) [1,2] catalyzes the reductive methylation of 
dUMP to dTMP with concomitant conversion of 5,10-methylenetetrahydrofolate to 
dihydrofolate. Thymidylate synthase plays an essential role in DNA synthesis and is an 
important target for certain chemo therapeutic drugs. 

Thymidylate synthase is an enzyme of about 30 to 35 Kd in most species except in 
protozoan and plants where it exists as a bifunctional enzyme that includes a dihydrofolate 
reductase domain. 

A cysteine residue is involved in the catalytic mechanism (it covalently binds the 5,6- 
dihydro-dUMP intermediate). The sequence around the active site of this enzyme is 
conserved from phages to vertebrates. 

Consensus patternR-x(2)-[LIVM]-x(3)-[FW]-[QN]-x(8 ? 9)-[LV]-x-P-C-[HAVM]<3)- 
[QMT]-[FYW]-x-[LV] [C is the active site residue] 

[ 1] Benkovic S.J. Annu. Rev. Biochem. 49:227-251(1980). 

[ 2] Ross P., O'Gara R, Condon S. Appl. Environ. Microbiol. 56:2156-2163(1990). 

777. Glycosyl hydrolases family 31 signatures 

It has been shown [1,2,3,E1] that the following glycosyl hydrolases can be, on the 
basis of sequence similarities, classified into a single family: 

- Lysosomal alpha-glucosidase (EC 3.2.1.20) (acid maltase) is a vertebrate glycosidase 
active at low pH, which hydrolyzes alpha(l->4) and alpha(l->6) linkages in glycogen, 
maltose, and isomaltose. 

- Alpha-glucosidase (EC 3.2.1.20) from the yeast Candida tsukunbaensis. 

- Alpha-glucosidase (EC 3.2.1.20) (gene malA) from the archebacteria Sulfolobus 
solfataricus. 

- Intestinal sucrase-isomaltase (EC 3.2.1.48 / EC 3.2.1.10) is a vertebrate membrane-bound, 
multifunctional enzyme complex which hydrolyzes sucrose, maltose and isomaltose. The 
sucrase and isomaltase domains of the enzyme are homologous (41% of amino acid identity) 
and have most probably evolved by duplication. 

- Glucoamylase 1 (EC 3.2.1.3) (glucan 1,4-alpha-glucosidase) from various fungal species. 

- Yeast hypothetical protein YBR229c. 

- Fission yeast hypothetical protein SpAC30Dl 1.01c. 
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An aspartic acid has been implicated [4] in the catalytic activity of sucrase, 
isomaltase, and lysosomal alpha-glucosidase. The region around this active residue is highly 
conserved and can be used as a signature pattern. A second region, which contains two 
conserved cysteines, has been used as an additional signature pattern. 

Consensus pattern [GF]-[LIVMF]-W-x-D-M-[NSA]-E [D is the active site residue] 
Consensus pattern G-[AV]-D-[LIVMTA]-C-G-[FY]-x(3)-[ST]-x(3)-L-C-x-R-W-x(2)-[LV]- 
[GSA]-[SA]-F-x-P-F-x-R-[DN] 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Kinsella B.T., Hogan S., Larkin A., Cantwell B.A. Eur. J. Biochem. 202:657-664(1991). 
[ 3] Nairn H.Y., Niermann T\, Kleinhans U., Hollenberg CP., Strasser A.W.M. FEBS Lett. 
294:109-112(1991). 

[ 4] Hermans M.M.P., Kroos M.A., van Beeumen J., Oostra B.A., Reuser AJJ. J. Biol. 
Chem. 266:13507-13512(1991). 

778. Urease signatures 

Urease (EC 3.5.1.5) is a nickel-binding enzyme that catalyzes the hydrolysis of urea 
to carbon dioxide and ammonia [1]. Historically, it was the first enzyme to be crystallized (in 
1926). It is mainly found in plant seeds, microorganisms and invertebrates. In plants, urease 
is a hexamer of identical chains. In bacteria [2], it consists of either two or three different 
subunits (alpha, beta and gamma). 

Urease binds two nickel ions per subunit; four histidine, an aspartate and a 
carbamated-lysine serve as ligands to these metals; an additional histidine is involved in the 
catalytic mechanism [3]. 

As signatures for this enzyme, a region was selected that contains two histidine that 
bind one of the nickel ions and the region of the active site histidine. 

Consensus pattern T-[AY]-[GA]-[GAT]-[LIVM]-D-x-H-[LIVM]-H-x(3)-P [The two H's bind 
nickel] 

Consensus pattern [LIVM](2)-[CT]-H-[HN]-L-x(3)-[LIVM]-x(2)-D-[LIVM]-x-F-A [H is the 
active site residue] 

[ 1] Takishima K., Suga T., Mamiya G. Eur. J. Biochem. 175:151-165(1988). 
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[ 2] Mobley H.L.T., Husinger R.P. MicrobioL Rev. 53:85-108(1989). 

[ 3] Jabri E., Carr M.B., Hausinger R.P., Karplus PA. Science 268:998-1004(1995). 

779. Tyrosine specific protein phosphatases signature and profiles 

Tyrosine specific protein phosphatases (EC 3.1.3.48) (PTPase) [1 to 5] are enzymes 
that catalyze the removal of a phosphate group attached to a tyrosine residue. These enzymes 
are very important in the control of cell growth, proliferation, differentiation and 
transformation. Multiple forms of PTPase have been characterized and can be classified into 
two categories: soluble PTPases and transmembrane receptor proteins that contain PTPase 
domain(s). The currently known PTPases are listed below: 

Soluble PTPases. 

- PTPN1 (PTP-1B). 

- PTPN2 (T-cell PTPase; TC-PTP). 

- PTPN3 (HI) and PTPN4 (MEG), enzymes that contain an N-terminal band 4.1- like 
domain (see <PDOC00566>) and could act at junctions between the membrane and 
cytoskeleton. 

- PTPN5 (STEP). 

- PTPN6 (PTP-1C; HCP; SHP) and PTPN11 (PTP-2C; SH-PTP3; Syp), enzymes which 
contain two copies of the SH2 domain at its N-terminal extremity. The Drosophila protein 
corkscrew (gene csw) also belongs to this subgroup. 

- PTPN7 (LC-PTP; Hematopoietic protein-tyrosine phosphatase; HePTP). 

- PTPN8 (70Z-PEP). 

- PTPN9 (MEG2). 

- PTPN12 (PTP-G1; PTP-P19). 

- Yeast PTP1. 

- Yeast PTP2 which may be involved in the ubiquitin-mediated protein degradation 
pathway. 

- Fission yeast pypl and pyp2 which play a role in inhibiting the onset of mitosis. 

- Fission yeast pyp3 which contributes to the dephosphorylation of cdc2. 

- Yeast CDC14 which may be involved in chromosome segregation. 

- Yersinia virulence plasmid PTPAses (gene yopH). 

- Autographa calif ornica nuclear polyhedrosis virus 19 Kd PTPase. 
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Dual specificity PTPases. 

- DUSP1 (PTPN10; MAP kinase phosphatase- 1; MKP-1); which dephosphorylates MAP 
kinase on both Thr-183 and Tyr-185. 

- DUSP2 (PAC-1), a nuclear enzyme that dephosphorylates MAP kinases ERK1 and ERK2 
on both Thr and Tyr residues. 

- DUSP3 (VHR). 

- DUSP4 (HVH2). 

- DUSP5 (HVH3). 

- DUSP6 (Pystl; MKP-3). 

- DUSP7 (Pyst2; MKP-X). 

- Yeast MSGS, a PTPase that dephosphorylates MAP kinase FUS3. 
-Yeast YVH1. 

- Vaccinia virus HI PTPase; a dual specificity phosphatase. 
Receptor PTPases. 

Structurally, all known receptor PTPases, are made up of a variable length 
extracellular domain, followed by a transmembrane region and a C-terminal catalytic 
cytoplasmic domain. Some of the receptor PTPases contain fibronectin type III (FN-III) 
repeats, immunoglobulin-like domains, MAM domains or carbonic anhydrase-like domains 
in their extracellular region. The cytoplasmic region generally contains two copies of the 
PTPAse domain. The first seems to have enzymatic activity, while the second is inactive but 
seems to affect substrate specificity of the first. In these domains, the catalytic cysteine is 
generally conserved but some other, presumably important, residues are not. 

In the following table, the domain structure of known receptor PTPases is shown: 

Extracellular Intracellular 



Ig FN-3 CAH MAM PTPase 

Leukocyte common antigen (LCA) (CD45) 0 2 0 0 2 
Leukocyte antigen related (LAR) 3 8 0 0 2 
Drosophila DLAR 3 9 0 0 2 

Drosophila DPTP 2 2 0 0 2 
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PTP-alnha fF RP^ 


0 0 0 0 


PTP-beta 


0 16 0 0 1 




0 110 


DTD rl^lta 


0 >7 0 0 2 


PTP-epsilon 


0 0 0 0 2 


PTP-kappa 


14 0 12 


PTP-mu 


14 0 12 


PTP-zeta 


0 110 2 



PTPase domains consist of about 300 amino acids. There are two conserved cysteines, 
the second one has been shown to be absolutely required for activity. Furthermore, a number 
of conserved residues in its immediate vicinity have also been shown to be important. 

A signature pattern was derived for PTPase domains centered on the active site 
cysteine. 

There are three profiles for PTPases, the first one spans the complete domain and is 
not specific to any subtype. The second profile is specific to dual-specificity PTPases and the 
third one to the PTP subfamily. 



Consensus pattern [LIVMF]-H-C-x(2)-G-x(3)-[STC]-[STAGP]-x-[LIVMFY] [C is the active 
site residue] 

Notethe M-phase inducer phosphatases (cdc25-type phosphatase) are tyrosine- protein 
phosphatases that are not structurally related to the above PTPases. 

Notethis documentation entry is linked to both a signature pattern and to profiles. As 
profiles are much more sensitive than the pattern, you should use them if you have access to 
the necessary software tools to do so. 

[ 1] Fischer E.H., Charbonneau H., Tonks N.K. Science 253:401-406(1991). 
[ 2] Charbonneau H., Tonks N.K. Annu. Rev. Cell Biol. 8:463-493(1992). 
[ 3] Trowbridge I.S. J. BioL Chem. 266:23517-23520(1991). 
[ 4] Tonks N.K., Charbonneau H. Trends Biochem. Sci. 14:497-500(1989). 
[ 5] Hunter T. Cell 58:1013-1016(1989). 



780. Connexins signatures 

Gap junctions [1] are specialized regions of the plasma membrane which consist of 
closely packed pairs of transmembrane channels, the connexons, through which small 



Attorney No. 2750-1237P 

631 

molecules diffuse from a cell to a neighboring cell. Each connexon is composed of an 
hexamer of an integral membrane protein which is often referred to as connexin. In a given 
species there are a number of different, yet structurally related, tissue specific, forms of 
connexins. The types of connexins which are currently known are listed below. 

- Connexin 56 (Cx56). 

-Connexin 50 (Cx50) (lens fiber protein MP70). 

- Connexin 46 (Cx46) (alpha-3). 

- Connexin 45 (Cx45) (alpha-6). 
-Connexin 43 (Cx43) (alpha-1). 
-Connexin 40 (Cx40) (alpha-5). 
-Connexin 38 (Cx38) (alpha-2). 
-Connexin 37 (Cx37) (alpha-4). 
-Connexin 33 (Cx33) (alpha-7). 
-Connexin 32 (Cx32) (beta-1). 

- Connexin 31.1 (Cx31.1) (beta-4). 
-Connexin 31 (Cx31) (beta-3). 

- Connexin 303 (Cx30.3) (beta-5). 
-Connexin 26 (Cx26) (beta-2). 

Structurally the connexins consist of a short cytoplasmic N-terminal domain, followed 
by four transmembrane segments that delimit two extracellular and one cytoplasmic loops; 
the C-terminal domain is cytoplasmic and its length is variable (from 20 residues in Cx26 to 
260 residues in Cx56). The schematic representation of this structure is shown below. 

NH2 *** ##**#***###*#_£QQ]-[ 

** ** ** ** Cytoplasmic 

* ^ * * * * * * 

** ** ** Membrane 

* ^ * * 

** ** ** ** Extracellular 
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The sequences of the two extracellular loops are well conserved. In both loops there 
are three conserved cysteines which are involved in disulfide bonds. A signature patterns 
from each of these two loop regions has been built. 

Consensus patteraC-[DN]-T-x-Q-P-G-C-x(2)-V-C-[FY]-D [The three C's are involved in 
disulfide bonds] Consensus patternC-x(3,4)-P-C-x(3)-[LIVM]-[DEN]-C-[FY]-[LIVM]-[SA]- 
[KR]-P [The three C's are involved in disulfide bonds] 

[ 1] Goodenough D.A., Goliger J.A., Paul D.L. Annu. Rev. Biochem. 65:475-502(1996). 

781. Gram-positive cocci surface proteins 'anchoring' hexapeptide 

Surface proteins from Gram-positive cocci contains a conserved hexapeptide located a 
few residues downstream of a hydrophobic C-terminal membrane anchor region which is 
followed by a cluster of basic amino acids [1]. This structure is represented in the following 
schematic representation: 

+ +--h +-+ 

| Variable length extracellular domain |H| Anchor |B| 

+ - +-+ +-+ 

'H': conserved hexapeptide. 
'B': cluster of basic residues. 

It has been proposed that this hexapeptide sequence is responsible for a post-translational 
modification necessary for the proper anchoring of the proteins which bear it ? to the cell wall. 
Proteins known to contain such hexapeptide are listed below: 

- Aggregation substance from streptococcus faecalis (asal). 

- C5a peptidase from Streptococcus pyogenes (scpA). 

- C protein alpha-antigen from Streptococcus agalactiae (bca). 

- Cell surface antigen I/II (PAC) from Streptococcus mutans. 

- Dextranase from Streptococcus downei (dex). 

- Fibronectin-binding protein from Staphylococcus aureus (fnbA). 

- Fimbrial subunits from Actinomyces naeslundii and viscosus. 

- IgA binding protein from Streptococcus pyogenes (arp4). 

- IgA binding protein (B antigen) from Streptococcus agalactiae (bag). 
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- IgG binding proteins from Streptococci and Staphylococcus aureus. 

- Internalin A from Listeria monocytogenes (inlA). 

- M proteins from streptococci. 

- Muramidase-released protein from Streptococcus suis (mrp). 

- Nisin leader peptide processing protease from Lactococcus lactis (nisP). 

- Protein A from Staphylococcus aureus. 

- Trypsin-resistant surface T protein from streptococci. 

- Wall-associated protein from Streptococcus mutans (wapA). 

- Wall-associated serine proteinases from Lactococcus lactis. 

Consensus patternL-P-x-T-G- [STGAVDE] 

[ 1] Schneewind O., Jones K.F., Fischetti V.A. J. Bacterid. 172:3310-3317(1990). 

782. Gamma-glutamyltranspeptidase signature 

Gamma-glutamyltranspeptidase (EC 2.3.2.2) (GGT) [1] catalyzes the transfer of the 
gamma-glutamyl moiety of glutathione to an acceptor that may be an amino acid, a peptide or 
water (forming glutamate). GGT plays a key role in the gamma-glutamyl cycle, a pathway 
for the synthesis and degradation of glutathione. In prokaryotes and eukaryotes, it is an 
enzyme that consists of two polypeptide chains, a heavy and a light subunit, processed from a 
single chain precursor. The active site of GGT is known to be located in the light subunit. 

The sequences of mammalian and bacterial GGT show a number of regions of high 
similarity [2]. Pseudomonas cephalosporin acylases (EC 3.5.1.-) that convert 7-beta-(4- 
carboxybutanamido)-cephalosporanic acid (GL-7ACA) into 7-aminocephalosporanic acid 
(7ACA) and glutaric acid are evolutionary related to GGT and also show some GGT activity 
[3]. Like GGT, these GL-7ACA acylases, are also composed of two subunits. 

One of the conserved regions correspond to the N-terminal extremity of the mature 
light chains of these enzymes. This region has been used as a signature pattern. 

Consensus patternT-[STA]-H-x-[ST]-[LIVMA]-x(4)-G-[SN]-x-V-[STA]-x-T-x-T-[LIVM]- 
[NE]-x(l,2)-[FY]-G 

[ 1] Tate S.S., Meister A. Meth. Enzymol. 113:400-419(1985). 

[ 2] Suzuki H. ? Kumagai H„ Echigo T., Tochikura T. J. Bacteriol. 171:5169-5172(1989). 
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[ 3] Ishiye M., Niwa M. Biochim. Biophys. Acta 1132:233-239(1992). 

783. Ferrochelatase signature 

Ferrochelatase (EC 4.99.1.1) (protoheme ferrolyase) [1,2] catalyzes the last step in 
heme biosynthesis: the chelation of a ferrous ion to proto-porphyrin IX, to form protoheme. 

In eukaryotes, ferrochelatase is a mitochondrial protein bound to the inner membrane, 
whose active site faces the mitochondrial matrix. The mature form of eukaryotic 
ferrochelatase is composed of about 360 amino acids. In bacteria, ferrochelatase (gene hemH) 
[3] is a protein of from 310 to 380 amino acids. 

The human autosomal dominant disease protoporphyria is due to the reduced activity 
of ferrochelatase. 

The signature pattern for this enzyme is based on a conserved region which contains a 
histidine residue which could be involved in binding iron. 

Consensus pattern[LIVMF](2)-x-[ST]-x-H^ 
x(l,2)-Y 

[ 1] Labbe-Bois R. J. Biol. Chem. 265:7278-7283(1990). 

[ 2] Brenner D.A., Frasier F. Proc. Natl. Acad. Sci. U.S.A. 88:849-853(1991). 

[ 3] Miyamoto K., Nakahigashi K., Nishimura K., Inokuchi H. J. Mol. Biol. 219:393- 

398(1991). 

784. Cellulose-binding domain, bacterial type 

The microbial degradation of cellulose and xylans requires several types of enzyme 
such as endoglucanases (EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91) (exoglucanases), or 
xylanases (EC 3.2.1.8) [1]. 

Structurally, cellulases and xylanases generally consist of a catalytic domain joined 
to a cellulose-binding domain (CBD) by a short linker sequence rich in proline and/or 
hydroxy-amino acids. 

The CBD of a number of bacterial cellulases has been shown to consist of about 105 
amino acid residues [2], Enzymes known to contain such a domain are: 

- Endoglucanase (gene endl) from Butyrivibrio fibrisolvens. 

- Endoglucanases A (gene cenA) and B (cenB) from Cellulomonas firm, 

- Exoglucanases A (gene cbhA) and B (cbhB) from Cellulomonas fimi. 
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- Endoglucanase E-2 (gene celB) from Thermomonospora fusca. 

- Endoglucanase A (gene celA) from Microbispora bispora. 

- Endoglucanases A (gene celA), B (celB) and C (celC) from Pseudomonas fluorescens. 

- Endoglucanase A (gene celA) from Streptomyces lividans. 
5 - Exocellobiohydrolase (gene cex) from Cellulomonas fimi. 

- Xylanases A (gene xynA) and B (xynB) from Pseudomonas fluorescens. 

- Arabinofuranosidase C (EC 3.2.1.55) (xylanase C) (genexynC) from Pseudomonas 
fluorescens. 

- Chitinase 63 (EC 3.2.1.14) from Streptomyces plicatus. 
1 0 - Chitinase C from Streptomyces lividans. 

The CBD domain is found either at the N-terminal or at the C-terminal extremity of these 
enzymes. As it is shown in the following schematic representation, there are two conserved 
cysteines in this CBD domain - one at each extremity of the domain - which have been shown 
15 [3] to be involved in a disulfide bond. There are also four conserved tryptophan residues 
which could be involved in the interaction of the CBD with polysaccharides. 

+- + 

I I 
2 0 xCxxxxWxxxxxNxxxWxxxxxxxWxxxxxxxxWNxxxxxGxxxxxxxxxxCx 

'C: conserved cysteine involved in a disulfide bond. '*': position of the pattern. 

Consensus patternW-N-[STAGR]-[STDN]-[LIVM]-x(2)-[GST]-x-[GST]-x(2)- [LIVMFT]- 
25 [GA] 

[ 1] Gilkes N.R., Henrissat B., Kilburn D.G., Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 
55:303-315(1991). 

[ 2] Meinke A., Gilkes N.R., Kilburn D.G., Miller R.C. Jr., Warren R.A.J. Protein Seq. Data 
30 Anal. 4:349-353(1991). 

[ 3] Gilkes N.R., Claeyssens M., Aebersold R., Henrissat B., Meinke A., Morrison H.D., 
Kilburn D.G., Warren R.A.J., Miller R.C. Jr. Eur. J. Biochem. 202:367-377(1991). 



785. Amidases signature 
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It has been shown [1,2,3] that several enzymes from various prokaryotic and 
eukaryotic organisms which are involved in the hydrolysis of amides (amidases) are 
evolutionary related. These enzymes are listed below. 

- Indoleacetamide hydrolase (EC 3.5.1.-), a bacterial plasmid-encoded enzyme that catalyzes 
the hydrolysis of indole-3-acetamide (IAM) into indole-3-acetate (IAA), the second step in 
the biosynthesis of auxins from tryptophan. 

- Acetamidase from Emericella nidulans (gene amdS), an enzyme which allows acetamide to 
be used as a sole carbon or nitrogen source. 

- Amidase (EC 3.5.1.4) from Rhodococcus sp. N-774 and Brevibacterium sp. R312 (gene 
amdA). This enzyme hydrolyzes propionamides efficiently, and also at a lower efficiency, 
acetamide, acrylamide and indoleacetamide. 

- Amidase (EC 3.5.1.4) from Pseudomonas chlororaphis. 

- 6-aminohexanoate-cyclic-dimer hydrolase (EC 3.5.2.12) (nylon oligomers degrading 
enzyme El) (gene nylA), a bacterial plasmid encoded enzyme which catalyzes the first step 
in the degradation of 6-aminohexanoic acid cyclic dimer, a by-product of nylon manufacture 

[4]. 

- Glutamyl-tRNA(Gln) amidotransf erase subunit A [5]. 

- Mammalian fatty acid amide hydrolase (gene FAAH) [6]. 

- A putative amidase from yeast (gene AMD2). 

- Mycobacterium tuberculosis putative amidases amiA2, amiB2, amiC and amiD. 

All these enzymes contain in their central section a highly conserved region rich in glycine, 
serine, and alanine residues. This region has been used as a signature pattern. 

Consensus pattern: G-[GA]-S-[GS]-[GS]-G-x-[GSA]-[GSAVY]-x-[LIVM]-[GSA]-x(6)- 
[GSAT]-x-[GA]-x-[DE]-x-[GA]-x-S-[LIVM]-R-x-P-[GSAC] 

[ 1] Mayaux J.F., Cerbelaud E., Soubrier F., Faucher D., Petre D. J. Bacteriol. 172:6764- 
6773(1990). 

[ 2] Hashimoto Y., Nishiyama M., Ikehata O., Horinouchi S., Beppu T. Biochim. Biophys. 
Acta 1088:225-233(1991). 

[ 3] Chang T.-H., Abelson J. Nucleic Acids Res. 18:7180-7180(1990). 

[ 4] Tsuchiya K., Fukuyama S., Kanzaki N., Kanagawa K., Negoro S., Okada H. J. Bacteriol. 

171:3187-3191(1989). 
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[ 5] Curnow A.W,, Hong K.W., Yuan R., Kim S.I., Martins O., Winkler W., Henkin T.M., 
Soil D. Proc. Natl. Acad. Sci. U.S.A. 94:11819-11826(1997). 

[ 6] Cravatt B.F., Giang D.K., Mayfield S.P., Boger D.L., Lerner R.A., Gilula N.B. Nature 
384:83-87(1996). 

786. Glycosyl hydrolases family 10 active site 

The microbial degradation of cellulose and xylans requires several types of enzymes 
such as endoglucanases (EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91) (exoglucanases), or 
xylanases (EC 3.2.1.8) [1,2], Fungi and bacteria produces a spectrum of cellulolytic enzymes 
(cellulases) and xylanases which, on the basis of sequence similarities, can be classified into 
families. One of these families is known as the cellulase family F [3] or as the glycosyl 
hydrolases family 10 [4,E1]. The enzymes which are currently known to belong to this 
family are listed below. 

- Aspergillus awamori xylanase A (xynA). 

- Bacillus sp. strain 125 xylanase (xynA). 

- Bacillus stearothermophilus xylanase. 

- Butyrivibrio fibrisolvens xylanases A (xynA) and B (xynB). 

- Caldocellum saccharolyticum Afunctional endoglucanase/exoglucanase (celB). This 
protein consists of two domains; it is the N-terminal domain, which has exoglucanase 
activity, which belongs to this family. 

- Caldocellum saccharolyticum xylanase A (xynA). 

- Caldocellum saccharolyticum ORF4. This hypothetical protein is encoded in the xynABC 
operon and is probably a xylanase. 

- Cellulomonas fimi exoglucanase/xylanase (cex). 

- Clostridium stercorarium thermostable celloxylanase. 

- Clostridium thermocellum xylanases Y (xynY) and Z (xynZ). 

- Cryptococcus albidus xylanase. 

- Penicillium chrysogenum xylanase (gene xylP). 

- Pseudomonas fluorescens xylanases A (xynA) and B (xynB). 

- Ruminococcus flavefaciens bifunctional xylanase XYLA (xynA). This protein consists of 
three domains: a N-terminal xylanase catalytic domain that belongs to family 11 of glycosyl 
hydrolases; a central domain composed of short repeats of Gin, Asn an Trp, and a C-terminal 
xylanase catalytic domain that belongs to family 10 of glycosyl hydrolases. 

- Streptomyces lividans xylanase A (xlnA). 
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- Thermoanaerobacter saccharolyticum endoxylanase A (xynA). 

- Thermoascus aurantiacus xylanase. 

- Thermophilic bacterium Rt8.B4 xylanase (xynA). 

One of the conserved regions in these enzymes is centered on a conserved glutamic acid 
residue which has been shown [5], in the exoglucanase from Cellulomonas fimi, to be 
directly involved in glycosidic bond cleavage by acting as a nucleophile. This region has 
been used as a signature pattern. 

Consensus pattern[GTA]-x(2)-[LIVN]-x-[IVMF]-[ST]-E-[LIY]-[DN]-[LIVMF] [E is the 
active site residue] 

[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

[ 2] Gilkes N.R., Henrissat B. ? Kilburn D.G., Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 
55:303-315(1991). 

[ 3] Henrissat B., Claeyssens M., Tomme P., Lemesle L. ? Mornon J.-P. Gene 81:83-95(1989). 
[ 4] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 5] Tull D. ? Withers S.G., Gilkes N.R., Kilburn D.G., Warren R.A.J., Aebersold R. J. Biol. 
Chem. 266:15621-15625(1991). 

787. Fructose-bisphosphate aldolase class-II signatures 

Fructose-bisphosphate aldolase (EC 4.1.2.13) [1,2] is a glycolytic enzyme that 
catalyzes the reversible aldol cleavage or condensation of fructose- 1,6- bisphosphate into 
dihydroxyacetone-phosphate and glyceraldehyde 3-phosphate. There are two classes of 
fructose-bisphosphate aldolases with different catalytic mechanisms. Class-II aldolases [2], 
mainly found in prokaryotes and fungi, are homodimeric enzymes which require a divalent 
metal ion - generally zinc - for their activity. 

This family also includes the following proteins: 

- Escherichia coli galactitol operon protein gatY which catalyzes the transformation of 
tagatose 1,6-bisphosphate into glycerone phosphate and D- glyceraldehyde 3-phosphate. 

- Escherichia coli N-acetyl galactosamine operon protein agaY which catalyzes the same 
reaction as that of gatY. 
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As signature patterns for this class of enzyme, two conserved regions were selected. The first 
pattern is located in the first half of the sequence and contains two histidine residues that have 
been shown [4] to be involved in binding a zinc ion. The second is located in the C-terminal 
section and contains clustered acidic residues and glycines. 

Consensus pattern[FYVMT]-x(l,3)-[LIVMH]-[APN]-[LIVM]-x(l ? 2)-[LIVM]-H-x-D-H- 
[GACH] [The two H's are zinc ligands] 

Consensus pattern[LIVM]-E-x-E-[LIVM]-G-x(2)-[GM]-[GSTA]-x-E 

[ 1] Perham R.N. Biochem. Soc. Trans. 18:185-187(1990). 

[ 2] Marsh J.J., Lebherz H.G. Trends Biochem. Sci. 17:110-113(1992). 

[ 3] von der Osten C.H., Barbas C.F. Ill, Wong C-KL, Sinskey A J. Mol. Microbiol. 3:1625- 

1637(1989). 

[ 4] Berry A., Marshall K.E. FEES Lett. 318:11-16(1993). 

788. Prolyl oligopeptidase family serine active site 

The prolyl oligopeptidase family [1,2,3] consist of a number of evolutionary related 
peptidases whose catalytic activity seems to be provided by a charge relay system similar to 
that of the trypsin family of serine proteases, but which evolved by independent convergent 
evolution. The known members of this family are listed below. 

- Prolyl endopeptidase (EC 3.4.21.26) (PE) (also called post-proline cleaving enzyme). PE is 
an enzyme that cleaves peptide bonds on the C-terminal side of prolyl residues. The sequence 
of PE has been obtained from a mammalian species (pig) and from bacteria (Flavobacterium 
meningosepticum and Aeromonas hydrophila); there is a high degree of sequence 
conservation between these sequences. 

- Escherichia coli protease II (EC 3.4.21.83) (oligopeptidase B) (gene prtB) which cleaves 
peptide bonds on the C-terminal side of lysyl and argininyl residues. 

- Dipeptidyl peptidase IV (EC 3.4.14.5) (DPP IV). DPP IV is an enzyme that removes N- 
terminal dipeptides sequentially from polypeptides having unsubstituted N-termini provided 
that the penultimate residue is proline. 

- Yeast vacuolar dipeptidyl aminopeptidase A (DPAP A) (gene: STE13) which is responsible 
for the proteolytic maturation of the alpha-factor precursor. 

- Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene: DAP2). 



Attorney No. 2750-1237P 

640 

- Acylamino-acid-releasing enzyme (EC 3,4.19.1) (acyl-peptide hydrolase). This enzyme 
catalyzes the hydrolysis of the amino-terminal peptide bond of an N-acetylated protein to 
generate a N-acetylated amino acid and a protein with a free amino- terminus. 

A conserved serine residue has experimentally been shown (in E.coli protease II as well as in 
pig and bacterial PE) to be necessary for the catalytic mechanism. This serine, which is part 
of the catalytic triad (Ser, His, Asp), is generally located about 150 residues away from the C- 
terminal extremity of these enzymes (which are all proteins that contains about 700 to 800 
amino acids). 

Consensus patternD-x(3)-A-x(3)-[LIVMFYW]-x(14)-G-x-S-x-G-G-[LIVMFYW](2) [S is the 
active site residue] 

Note these proteins belong to families S9A/S9B/S9C in the classification of peptidases 
[4,E1]. 

[ 1] Rawlings N.D., Polgar L., Barrett A.J. Biochem. J. 279:907-911(1991). 
[ 2] Barrett AJ., Rawlings N.D. Biol. Chem. Hoppe-Seyler 373:353-360(1992). 
[ 3] Polgar L., Szabo E. 

Biol. Chem. Hoppe-Seyler 373:361-366(1992). 

[ 4] Rawlings N.D., Barrett AJ. Meth. Enzymol. 244:19-61(1994). 

789. Formate- tetrahydrofolate ligase signatures 

Formate-tetrahydrofolate ligase (EC 6.3.43) (formyltetrahydrofolate synthetase) 
(FTHFS) is one of the enzymes participating in the transfer of one-carbon units, an essential 
element of various biosynthetic pathways. In many of these processes the transfers of one- 
carbon units are mediated by the coenzyme tetrahydrofolate (THF). Various reactions 
generate one-carbon derivatives of THF which can be interconverted between different 
oxidation states by FTHFS, methylenetetrahydrofolate dehydrogenase (EC 1.5.1.5) and 
methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9). 

In eukaryotes the FTHFS activity is expressed by a multifunctional enzyme, C-l- 
tetrahydrofolate synthase (Cl-THF synthase), which also catalyzes the dehydrogenase and 
cyclohydrolase activities. Two forms of Cl-THF synthases are known [1], one is located in 
the mitochondrial matrix, while the second one is cytoplasmic. In both forms the FTHFS 
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domain consist of about 600 amino acid residues and is located in the C-terminal section of 
Cl-THF synthase. In prokaryotes FTHFS activity is expressed by a monofunctional 
homotetrameric enzyme of about 560 amino acid residues [2]. 

The sequence of FTHFS is highly conserved in all forms of the enzyme. As signature 
patterns, two regions that are almost perfectly conserved were selected. The first one is a 
glycine-rich segment located in the N-terminal part of FTHFS and which could be part of an 
ATP-binding domain [2]. The second pattern is located in the central section of FTHFS. 

Consensus pattemG-[LIVM]-K-G-G-A-A-G-G-G-Y 
Consensus patternV-A-T-[IV]-R-A-L-K-x-[HN]-G-G 

[ 1] Shannon K.W., Rabinowitz J.C. J. Biol. Chem. 263:7717-7725(1988). 

[ 2] Lovell C.R., Przybyla A., Ljungdahl L.G. Biochemistry 29:5687-5694(1990). 

790. Transthyretin signatures 

Transthyretin (prealbumin) [1] is a thyroid hormone-binding protein that seems to 
transport thyroxine (T4) from the bloodstream to the brain. It is a protein of about 130 amino 
acids that assembles as a homotetramer and forms an internal channel that binds thyroxine. 
Transthyretin is mainly synthesized in the brain choroid plexus. In humans, variants of the 
protein are associated with distinct forms of amyloidosis. 

The sequence of transthyretin is highly conserved in vertebrates. A number of 
uncharacterized proteins also belong to this family: 

- Escherichia coli hypothetical protein yedX. 

- Bacillus subtilis hypothetical protein yunM. 

- Caenorhabditis elegans hypothetical protein R09H10.3. 

- Caenorhabditis elegans hypothetical protein ZK697.8. 

Two regions were selected as signature patterns. The first located in the N-terminal extremity 
starts with a lysine known to be involved in binding T4. The second pattern is located in the 
C-terminal extremity. 



Consensus pattern[KH]-[IV]-L-[DN]-x(3)-G-x-P-A-x(2)-[IV]-x-[IV] [The K binds thyroxine] 
Consensus patternY-[TH]-[IV]-[AP]-x(2)-L-S-[PO]-[FYW]-[GS]-[FY]-[QS] 
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[ 1] Schreiber G., Richardson SJ. Comp. Biochem. Physiol. 116B:137-160(1997). 

791. Dihydropteroate synthase signatures 

All organisms require reduced folate cofactors for the synthesis of a variety of 
metabolites. Most microorganisms must synthesize folate de novo because they lack the 
active transport system of higher vertebrate cells which allows these organisms to use dietary 
folates. Enzymes that are involved in the biosynthesis of folates are therefore the target of a 
variety of antimicrobial agents such as trimethoprim or sulfonamides. 

Dihydropteroate synthase (EC 2.5.1.15) (DHPS) catalyzes the condensation of 6- 
hydroxymethyl-7,8-dihydropteridine pyrophosphate to para-aminobenzoic acid to form 7,8- 
dihydropteroate. This is the second step in the three steps pathway leading from 6- 
hydroxymethyl-7,8-dihydropterin to 7,8-dihydrofolate. DHPS is the target of sulfonamides 
which are substrates analog that compete with para-aminobenzoic acid. 

Bacterial DHPS (gene sul or folP) [1] is a protein of about 275 to 315 amino acid 
residues which is either chromosomally encoded or found on various antibiotic resistance 
plasmids. In the lower eukaryote Pneumocystis carinii, DHPS is the C-terminal domain of a 
multifunctional folate synthesis enzyme (gene fas) [2]. 

Two signature patterns for DHPS were developed, the first signature is located in the 
N-terminal section of these enzymes, while the second signature is located in the central 
section. 

Consensus pattern[LIVM]-x-[AG]-[LIVMF](2)-N-x-T-x-D-S-F-x-D-x-[SG] 
Consensus pattern[GE]-[SA]-x-[LIVM](2)-D-[LIVM]-G-[GP]-x(2)-[STA]-x-P 

[ 1] Slock J., Stahly D.P., Han C.-Y., Six E.W., Crawford LP. J. Bacteriol. 172:7211- 
7226(1990). 

[ 2] Volpes F., Dyer M. ? Scaife J.G., Darby G., Stammers D.K., Delves CJ. Gene 112:213- 
218(1992). 

792. Phosphatidylinositol 3- and 4-kinases signatures 

Phosphatidylinositol 3-kinase (PI3-kinase) (EC 2.7.1.137) [1] is an enzyme that 
phosphorylates phosphoinositides on the 3-hydroxyl group of the inositol ring. The exact 
function of the three products of PI3-kinase - PI-3-P, PI-3,4-P(2) and PI-3,4,5-P(3) - is not 
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yet known, although it is proposed that they function as second messengers in cell signalling. 
Currently, three forms of PI3-kinase are known: 

- The mammalian enzyme which is a heterodimer of a 110 Kd catalytic chain (pi 10) and an 
85 Kd subunit (p85) which allows it to bind to activated tyrosine protein kinases. There are at 
least two different types of plOO subunits (alpha and beta). 

- Yeast TOR1/DRR1 and TOR2/DRR2 [2], PI3-kinases required for cell cycle activation. 
Both are proteins of about 280 Kd. 

- Yeast VPS34 [3], a PI3-kinase involved in vacuolar sorting and segregation. VPS34 is a 
protein of about 100 Kd. 

- Arabidopsis thaliana and soybean VPS34 homologs. 

Phosphatidylinositol 4-kinase (PI4-kinase) (EC 2.7.1.67) [4] is an enzyme that acts on 
phosphatidylinositol (PI) in the first committed step in the production of the second 
messenger inositol-l,4 5 5,-trisphosphate. Currently the following forms of PI4-kinases are 
known: 

- Human PI4-kinase alpha. 

- Yeast PIK1, a nuclear protein of 120 Kd. 

- Yeast STT4, a protein of 214 Kd. 

The PI3- and PI4-kinases share a well conserved domain at their C-terminal section; this 
domain seems to be distantly related to the catalytic domain of protein kinases [2]. Two 
signature patterns were developed from the best conserved parts of this domain. 

Four additional proteins belong to this family: 

- Mammalian FKBP-rapamycin associated protein (FRAP) [5], which acts as the target for 
the cell-cycle arrest and immunosuppressive effects of the FKBP12-rapamycin complex. 

- Yeast protein ESR1 [6] which is required for cell growth, DNA repair and meiotic 
recombination. 

- Yeast protein TEL1 which is involved in controlling telomere length. 

- Yeast hypothetical protein YHR099w ? a distantly related member of this family. 

- Fission yeast hypothetical protein SpAC22E12.16C. 



Consensus pattern[LIVMFAC]-K-x(l ? 3)-[DEA]-[DE]-[LIVMC]-R-0-[DE]-x(4)-Q 



Attorney No. 2750-1237P 

644 

Consensus pattern[GS]-x-[AV]-x(3)-[LIVM]^^ 
x(2)-N 

[ 1] Hiles I.D., Otsu M., Volinia S., Fry M.J., Gout L, Dhand R., Panayotou G. ; Ruiz-Larrea 
F., Thompson A., Totty N.F., Hsuan J J., Courtneidge S.A., Parker P J., Waterfield M.D. Cell 
70:419-429(1992). 

[ 2] Kunz J., Henriquez R. ? Schneider U. ? Deuter-Reinhard M. ? Movva N. ? Hall M.N. Cell 
73:585-596(1993). 

[ 3] Schu P.V., Takegawa K., Fry M.J., Stack J.H., Waterfield M.D., Emr S.D. Science 
260:88-91(1993). 

[ 4] Garcia-Bustos J.R, Marini F., Stevenson L, Frei C, Hall M.N. EMBO J. 13:2352- 
2361(1994). 

[ 5] Brown E.J., Albers M.W., Shin T.B., Ichikawa K., Keith C.T., Lane W.S., Schreiber S.L. 
Nature 369:756-758(1994). 

[ 6] Kato R., Ogawa H. Nucleic Acids Res. 22:3104-3112(1994). 

793. FAD-dependent glycerol-3 -phosphate dehydrogenase signatures 

FAD-dependent glycerol-3-phosphate dehydrogenase (EC 1.1.99.5) (GPD) catalyzes 
the conversion of glycerol-3 -phosphate into dihydroxyacetone phosphate. In bacteria [1] it is 
associated with the utilization of glycerol coupled to respiration. In Escherichia coli, two 
isozymes are known: one expressed under anaerobic conditions (gene glpA) and one in 
aerobic conditions (gene glpD). In eukaryotes, a mitochondrial form of GPD participates in 
the glycerol phosphate shuttle in conjunction with an NAD-dependent cytoplasmic GPD (EC 
1.1.1.8) [2,3]. 

These enzymes are proteins of about 60 to 70 Kd which contain a probable FAD- 
binding domain in their N-terminal extremity. The mammalian enzyme differs from the 
bacterial or yeast proteins by having an EF-hand calcium-binding region (See 
<PDOC00018>) in its C-terminal extremity. 

Two signature patterns were developed. One based on the first half of the FAD- 
binding domain and one which corresponds to a conserved region in the central part of these 
enzymes. 

Consensus pattern[IV]-G-G-G-x(2)-G-[STACV]-G-x-A-x-D-x(3)-R-G 
Consensus patternG-G-K-x(2)-[GSTE]-Y-R-x(2)-A 
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[ 1] Austin D., Larson T.J. J. Bacteriol. 173:101-107(1991). 

[ 2] Roennow B., Kielland-Brandt M.C. Yeast 9:1121-1130(1993). 

[ 3] Brown L.J., McDonald MJ., Lehn D.A., Moran S.M. J. Biol. Chem. 269:14363- 

14366(1994). 

794. NOLl/NOP2/sun family signature 

The following proteins seems to be evolutionary related: 

- Mammalian proliferating-cell nucleolar antigen pl20 (gene NOLI) which may play a role 
in the regulation of the cell cycle and the increased nucleolar activity that is associated with 
the cell proliferation. 

- Yeast nucleolar protein NOP2 (or YNA1) which could be involved in nucleolar function 
during the onset of growth, and in the maintenance of nucleolar structure. 

- Yeast hypothetical protein YBL024w. 

- Bacterial protein sun (also known as fmu). 

- Escherichia coli hypothetical protein yebU. 

- Mycobacterium tuberculosis hypothetical protein MtCY21B4.24. 

- Methanococcus jannaschii hypothetical protein MJ0026. 

NOLI is a protein of 855 residues, NOP2 consists of 618 residues, YBL024w of 684, sun is a 
protein of about 430 to 450 residues and MJ026 has 274 residues. They share a conserved 
central domain which contains some highly conserved regions. One of these regions was 
selected as a signature pattern. 

Consensus pattern[FV]-D-[KRA]-[LIVMA]-L-x-D-[AV]-P-C-[ST]-[GA] 

795 . moaA / nifB / pqqE family signature 

A number of proteins involved in the biosynthesis of metallo cofactors have been 
shown [1,2] to be evolutionary related. These proteins are: 

- Bacterial and archebacterial protein moaA, which is involved in the biosynthesis of the 
molybdenum cofactor (molybdopterin; MPT). 

- Arabidopsis thaliana cnx2, a protein involved in molybdopterin biosynthesis and which is 
highlys similar to moaA. 

- Bacillus subtilis narA, which seems to be the moaA ortholog in that bacteria. 
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- Bacterial protein nifB (or fixZ) which is involved in the biosynthesis of the nitrogenase 
iron-molybdenum cofactor. 

- Bacterial protein pqqE which is involved in the biosynthesis of the cofactor pyrrolo- 
quinoline-quinone (POO). 

- Pyrococcus furiosus cmo, a protein involved in the synthesis of a molybdopterin-based 
tungsten cofactor. 

- Caenorhabditis elegans hypothetical protein F49E2.1. 

All these proteins share, in their N-terminal region, a conserved domain that contains three 
cysteines. In moaA, these cysteines have been shown [1] to be important for the biological 
activity. They could be inolved in the binding of an iron-sulfur cluster. 

Consensus pattern[LIV]-x(3)-C-[NP]-[LIVMF]-[QRS]-C-x-[FYM]-C [The three Cs are 
putative Fe-S ligands 

[ 1] Menendez C. ? Igloi G. ? Henninger H. ? Brandsch R. Arch. Microbiol. 164:142-151(1995). 
[ 2] Hoff T., Schnorr K.M., Meyer C, Caboche M. J. Biol. Chem. 270:6100-6107(1995). 

796. Forkhead-associated (FHA) domain profile 

The forkhead-associated (FHA) domain [1 ? E1] is a putative nuclear signalling domain 
found in a variety of otherwise unrelated proteins. The FHA domain comprise approximately 
55 to 75 amino acids and contains three highly conserved blocks separated by divergent 
spacer regions. Currently it has been found in the following proteins: 

- Four transcription factors that also contain a forkhead (FH) domain: mouse myocyte 
nuclear factor 1 (MNF1), yeast transcription factor FHL1, which probably controls pre- 
mRNA processing, and yeast FKH1 and FKH2. In those protein the FHA domain is located 
N-terminal of the DNA-binding FH domain. 

- Kinase-associated protein phosphatase (KAPP) from Arabidopsis thaliana, a protein which 
specifically interacts with the receptor-type Ser/Thr-kinase RLK5. In KAPP, the FHA 
domain maps to a region that interacts with the receptor-type protein kinase RLK5 only if the 
kinase is phosphorylated on serine residues [2]. 

- Two protein kinases from yeast that are involved in mediating the nuclear response to DNA 
damage: DUN1 and SPK1/SAD1 [3]. The latter is the only known protein containing two 
copies of the FHA domain. 
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- Protein kinase cdsl from fission yeast contains a FHA domain and might be the ortholog of 
SPK1. 

- Protein kinase MEK1 from yeast, which is involved in meiotic recombination. 

- Human nuclear antigen Ki67 which is expressed only in proliferating cells. 

- Yeast hypothetical protein YHR115c, which contains a RING-finger C-terminal of the 
FHA domain. 

- Yeast hypothetical proteins L8083.1 and 9346.10, which contain an extensive coiled-coil 
region C-terminal of the FHA domain. 

- Caenorhabditis elegans hypothetical protein ZK632.2. 

- Caenorhabditis elegans hypothetical protein C01G6.5. 

- FraH from the prokaryote Anabaena, which contains a zinc-finger motif N-terminal of the 
FHA domain. 

- An ORF from the bacterium Streptomyces, which is on the opposite strand of the protein 
kinase pksl, overlapping the ORF of the kinase. 

[ 1] Hofmann K.O., Bucher P. Trends Biochem. Sci. 20:347-349(1995). 

[ 2] Stone J.M., Collinge MA, Smith R.D., Horn M.A., Walker J.C. Science 266:793- 

795(1994). 

[ 3] Navas T.A., Zhou Z., Elledge S J. Cell 80:29-39(1995). 

797. Ald_Xan_dh_C 

Aldehyde oxidase and xanthine dehydrogenase, C terminus 

[1] Romao MJ, Archer M, Moura I, Moura JJ, LeGall J, Engh R, Schneider M, Hof P, Huber 
R; Medline: 96072968 "Crystal structure of the xanthine oxidase-related aldehyde oxido- 
reductase from D. gigas." Science 1995;270:1170-1176. 

Number of members: 54 

798. Glyco_hydro_38 
Glycosyl hydrolases family 38 

Glycosyl hydrolases are key enzymes of carbohydrate metabolism. 



Number of members: 20 
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[1] Henrissat B; Medline: 98313424; Glycosidase families" Biochem Soc Trans 
1998;26:153-156. 

799. HECT 

HECT-domain (ubiquitin- transferase). 

The name HECT comes from Homologous to the E6-AP Carboxyl 
Terminus. 

Number of members: 43 

[1] Huibregtse JM, Scheffner M, Beaudenon S, Howley PM; Medline: 95223981; A family 
of proteins structurally and functionally related to the E6-AP ubiquitin-protein ligase." Proc 
Natl Acad Sci U S A 1995;92:2563-2567. 

800. HRDC 
HRDC domain 

The HRDC (Helicase and RNase D C-terminal) domain has a putative role in nucleic 
acid binding. Mutations in the HRDC domain cause human disease. 

Number of members : 1 9 

[1] Morozov V, Mushegian AR, Koonin EV, Bork P; Medline: 98060076; A putative 
nucleic acid-binding domain in Bloom's and Werner's syndrome helicases" Trends Biochem 
Sci 1997;22:417-418. 

801. Integrase 

Integrase mediates integration of a DNA copy of the viral genome into the host 
chromosome. Integrase is composed of three domains. The amino-terminal domain is a zinc 
binding domain. The central domain is the catalytic domain [l].The carboxyl terminal 
domain is a DNA binding domain [2], 



Number of members: 581 
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[1] Dyda F, Hickman AB, Jenkins TM, Engelman A, Craigie R, Davies DR; Medline: 
95099322. Crystal structure of the catalytic domain of HIV-1 integrase: similarity to other 
polynucleotidyl transferases." Science 1994;266:1981-1986. 

[2] Lodi PJ, Ernst JA, Kuszewski J, Hickman AB, Engelman A, Craigie R, Clore GM, 
Gronenborn AM; Medline: 95359147; Solution structure of the DNA binding domain of 
HIV-1 integrase." Biochemistry 1995;34:9826-9833 

802. lig_chan 
Ligand-gated ion channel 

This family includes the four transmembrane regions of the ionotropic glutamate 
receptors and NMDA receptors. 

Number of members: 128 

[1] Tong G, Shepherd D, Jahr CE; Medline: 95184014; Synaptic desensitization of NMDA 
receptors by calcineurin." Science 1995;267:1510-1512. 

803. RhoGAP 
RhoGAP domain 

GTPase activator proteins towards Rho/Rac/Cdc42-like small GTPases. 
Number of members: 97 

[1] Musacchio A, Cantley LC, Harrison SC; Medline: 97121392; Crystal structure of the 
breakpoint cluster region-homology domain from phosphoinositide 3-kinase p85 alpha 
subunit." Proc Natl Acad Sci U S A 1996;93:14373-14378. 

[2] Barrett T, Xiao B, Dodson EJ, Dodson G, Ludbrook SB, Nurmahomed K, Garnblin SJ, 
Musacchio A, Smerdon SJ, Eccleston JF; Medline: 97162209; The structure of the GTPase- 
activating domain from pSOrhoGAP." Nature 1997;385:458-461. 

[3] Rittinger K, Walker PA, Eccleston JF, Nurmahomed K, Owen D, Laue E, Garnblin SJ, 
Smerdon SJ; Medline: 97404320; Crystal structure of a small G protein in complex with the 
GTPase-activating protein rhoGAP." Nature 1997;388:693-697. 
[4] Boguski MS, McCormick F; Medline: 94081948; Proteins regulating Ras and its 
relatives." Nature 1993;366:643-654. 
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804. vwd 

von Willebrand factor type D domain 

[1] Bork P; Medline: 93327926; The modular architecture of a new family of growth 
regulators related to connective tissue growth factor/' FEBS lett 1993;327:125-130. 

Number of members: 92 

805. zf-C4_Topoisom 
Topoisomerase DNA binding C4 zinc finger 

[1] Tse-Dinh YC, Beran-Steed RK; Medline: 89034032; Escherichia coli DNA 
topoisomerase I is a zinc 

metalloprotein with three repetitive zinc-binding domains." J Biol Chem 1988;263: 15857- 
15859. 

[2] Ahumada A ? Tse-Dinh YC; Medline: 99011409; The Zn(II) binding motifs of E. coli 
DNA topoisomerase I is part of a high-affinity DNA binding domain." Biochem Biophys Res 
Commun 1998;251:509-514. 

Number of members : 5 1 

806. AIRC 
AIR carboxylase 

Members of this family catalyse the decarboxylation of l-(5-phosphoribosyl)-5-amino-4- 
imidazole-carboxylate (AIR). This family catalyse the sixth step of de novo purine 
biosynthesis. Some members of this family contain two copies of this 
domain. Number of members: 35 

807. Bromodomain signature and profile 

PROSITE cross-reference(s): PS00633; BROMODOMAIN l , PS50014; 
BROMODOMAIN_2 

The bromodomain [1,2,3] is a conserved region of about 70 amino acids found in the 
following proteins: 
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- Higher eukaryotes transcription initiation factor TFIID 250 Kd subunit (TBP-associated 
factor p250) (gene CCG1). P250 associated with the TFIID TATA-box binding protein and 
seems essential for progression of the Gl phase of the cell cycle. 

- Human RING3, a protein of unknown function encoded in the MHC class II locus. 

- Mammalian CREB-binding protein (CBP), which mediates cAMP-gene regulation by 
binding specifically to phosphorylated CREB protein. 

- Drosophila female sterile homeotic protein (gene fsh), required maternally for proper 
expression of other homeotic genes involved in pattern formation, such as Ubx. 

- Drosophila brahma protein (gene brm), a protein required for the activation of multiple 
homeotic genes. 

- Mammalian homologs of brahma. In human, three brahma-like proteins are known: 
SNF2a(hBRM), SNF2b, and BRG1. 

- Human BS69, a protein that binds to adenovirus E1A and inhibits El A transactivation 

- Human peregrin (or Brl4Q). 

- Yeast BDF1 [3], a transcription factor involved in the expression of a broad class of genes 
including snRNAs. 

- Yeast GCN5, a general transcriptional activator operating in concert with certain other 
DNA-binding transcriptional activators, such as GCN4, HAP2/3/4 or ADA2. 

- Yeast NPS1/STH1, involved in G(2) phase control in mitosis. 

- Yeast SNF2/SWI2, which is part of a complex with the SNF5, SNF6, SWI3 and 
ADR6/SWI1 proteins. This SWI-complex is involved in transcriptional activation. 

- Yeast SPT7, a transcriptional activator of Ty elements and possibly other genes. 

- Caenorhabditis elegans protein cbp-1. 

- Yeast hypothetical protein YGR056w, 

- Yeast hypothetical protein YKR008w. 

- Yeast hypothetical protein L9638.1. 

Some proteins contain a region which, while similar to some extent to a classical 
bromodomain, diverges from it by either lacking part of the domain or because of an 
insertion. These proteins are: 
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- Mammalian protein HRX (also known as All-1 or MLL) ? a protein involved in 
translocations leading to acute leukemias and which possibly acts as a transcriptional 
regulatory factor. HRX contains a region similar to the C- terminal half of the bromodomain. 

- Caenorhabditis elegans hypothetical protein ZK783 A The bromodomain of this protein has 
a 23 amino-acid insertion. 

- Yeast protein YTA7. This protein contains a region with significant similarity to the C- 
terminal half of the bromodomain. As it is a member of the AAA family (see 
<PDOC00572>) it is also in a functionally different context. 

The above proteins generally contain a single bromodomain, but some of them contain two 
copies, this is the case of BDF1, CCG1, fsh, RING3, YKR008w and L9638.1. 

The exact function of this domain is not yet known but it is thought to be involved in protein- 
protein interactions and it may be important for the assembly or activity of multicomponent 
complexes involved in transcriptional activation. 

The consensus pattern that has been developed spans a major part of the bromodomain; a 
more sensitive detection is available through the use of a profile which spans the whole 
domain. 

Consensus pattern[STANVF]-x(2)-F-x(4)-[DNS]-x(5 ? 7)-[DENQTF]-Y-[HFY]-x(2)- 
[LIVMFY]-x(3)-[LIVM]-x(4)-[LIVM]-x(6 ? 8)-Y-x(12,13)-[LIVM]- 
x(2)-N-[SACF]-x(2)-[FY] 

References 

[ 1] Haynes S.R., Doolard C, Winston F. ? Beck S., Trowsdale J., Dawid LB. Nucleic Acids 
Res. 20:2693-2603(1992). 

[ 2] Tamkun J.W., Deuring R., Scott M.P., Kissinger M., Pattatucci A.M., Kaufman T.C., 
Kennison J.A, Cell 68:561-572(1992). 
[ 3] Tamkun J.W. Curr. Opin. Genet. Dev. 5:473-477(1995). 

808. (CH) Actinin-type actin-binding domain signatures 

PROSITE cross-reference(s): PS00019; ACTININ_1, PS00020; ACTININ_2 
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Alpha-actinin is a F-actin cross-linking protein which is thought to anchoractin to a variety of 
intracellular structures [1]. The actin-binding domain of alpha-actinin seems to reside in the 
first 250 residues of the protein. A similar actin-binding domain has been found in the N- 
terminal region of many different actin-binding proteins [2,3]: 

- In the beta chain of spectrin (or fodrin). 

- In dystrophin, the protein defective in Duchenne muscular dystrophy (DMD) and which 
may play a role in anchoring the cytoskeleton to the plasma membrane. 

- In the slime mold gelation factor (or ABP-120). 

- In actin-binding protein ABP-280 (or filamin), a protein that link actin filaments to 
membrane glycoproteins. 

- In fimbrin (or plastin), an actin-bundling protein. Fimbrin differs from the above proteins in 
that it contains two tandem copies of the actin-binding domain and that these copies are 
located in the C-terminal part of the protein. 

Two conserved regions were selected as signature patterns for this type of main. The first of 
this region is located at the beginning of the domain, hile the second one is located in the 
central section and has been shown to be essential for the binding of actin. 

Consensus pattern[EQ]-x(2)-[ATV]-[FY]-x(2)-W-x-N 

Consensus pattern[LIVM]-x-[SGN]-[LIVM]-[DAGHE]-fSAG]-x-[DNEAG]-[LIVM]-x- 
[DEAG]-x(4)-[LIVM]-x-[LM]-[SAG]-[LIVM]-[LIVMT]-W-x- [LIVM](2) 

[ 1] Schleicher ML, Andre E., Harmann A., Noegel A.A. Dev. Genet. 9:521-530(1988). 
[ 2] Matsudaira P. Trends Biochem. Sci. 16:87-92(1991). 
[ 3] Dubreuil R.R. BioEssays 13:219-226(1991). 

809. (COX1) Heme-copper oxidase subunit I, copper B binding region signature 
PROSITE cross-reference(s): PS00077; COX1 

Heme-copper respiratory oxidases [1] are oligomeric integral membrane protein 
complexes that catalyze the terminal step in the respiratory chain: they 
transfer electrons from cytochrome c or a quinol to oxygen. Some terminal 
oxidases generate a transmembrane proton gradient across the plasma membrane 
(prokaryotes) or the mitochondrial inner membrane (eukaryotes). The enzyme 
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complex consists of 3-4 subunits (prokaryotes) up to 13 polypeptides (mammals) 
of which only the catalytic subunit (equivalent to mammalian subunit 1 (CO I)) 
is found in all heme-copper respiratory oxidases. The presence of a bimetallic 
center (formed by a high-spin heme and copper B) as well as a low-spin heme, 
both ligated to six conserved histidine residues near the outer side of four 
transmembrane spans within CO I is common to all family members [2-4]. 

In contrary to eukaryotes the respiratory chain of prokaryotes is branched to 
multiple terminal oxidases. The enzyme complexes vary in heme and copper 
composition, substrate type and substrate affinity. The different respiratory 
oxidases allow the cells to customize their respiratory systems according a 
variety of environmental growth conditions [1]. 

Recently also a component of an anaerobic respiratory chain has been found to 
contain the copper B binding signature of this family: nitric oxide reductase 
(NOR) exists in denitrifying species of Archae and Eubacteria. 

Enzymes that belong to this family are: 

- Mitochondrial-type cytochrome c oxidase (EC 1.9.3.1) which uses cytochrome 
c as electron donor. The electrons are transferred via copper A (Cu(A)) and 
heme a to the bimetallic center of CO I that is formed by a penta- 
coordinated heme a and copper B (Cu(B)). Subunit 1 contains 12 
transmembrane regions. Cu(B) is said to be ligated to three of the 
conserved histidine residues within the transmembrane segments 6 and 7. 

- Quinol oxidase from prokaryotes that transfers electrons from a quinol to 
the binuclear center of polypeptide I. This category of enzymes includes 
Escherichia coli cytochrome O terminal oxidase complex which is a component 
of the aerobic respiratory chain that predominates when cells are grown at 
high aeration. 

- FixN, the catalytic subunit of a cytochrome c oxidase expressed in 
nitrogen-fixing bacteroids living in root nodules. The high affinity for 
oxygen allows oxidative phosphorylation under low oxygen concentrations. A 
similar enzyme has been found in other purple bacteria. 
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- Nitric oxide reductase (EC 1.7.99.7) from Pseudomonas stutzeri. NOR reduces 
nitrate to dinitrogen. It is a heterodimer of norC and the catalytic 
subunit norB. The latter contains the 6 invariant histidine residues and 12 
transmembrane segments [5]. 

As a signature pattern the copper-binding region was used. 

Consensus pattern[YWG]-[LIVFYWTA](2)-[VGS]-H-[LNP]-x-V-x(44,47)-H-H 
three H f s are copper B ligands] 

Notecytochrome bd complexes do not belong to this family. 



[i] 

Garcia-Horsman J.A., Barquera B., Rumbley J., Ma J., Gennis R.B. 

J. Bacteriol. 176:5587-5600(1994). 

[2] 

Castresana J., Luebben M., Saraste M., Higgins D.G. 
EMBO J. 13:2516-2525(1994). 

[3] 

Capaldi R.A., Malatesta F., Darley-Usmar V.M. 
Biochim. Biophys. Acta 726:135-148(1983). 
[4] 

Holm L., Saraste M., Wikstrom M. 
EMBO J. 6:2819-2823(1987). 
[5] 

Saraste M, Castresana J. 
FEBS Lett. 341:1-4(1994). 

810. (dehydrog_molyb) Eukaryotic molybdopterin oxidoreductases signature 
PROSITE cross-reference(s): PS00559; MOLYBDOPTERIN EUK 



A number of different eukaryotic oxidoreductases that require and bind a 
molybdopterin cof actor have been shown [1] to share a few regions of sequence 
similarity. These enzymes are: 
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- Xanthine dehydrogenase (EC 1.1.1.204), which catalyzes the oxidation of 
xanthine to uric acid with the concomitant reduction of NAD. Structurally, 
this enzyme of about 1300 amino acids consists of at least three distinct 
domains: an N-terminal 2Fe-2S ferredoxin-like iron-sulfur binding domain 

(see <PDOC00175>) ? a central FAD/NAD-binding domain and a C-terminal Mo- 
pterin domain. 

- Aldehyde oxidase (EC 1.2.3.1), which catalyzes the oxidation aldehydes into 
acids. Aldehyde oxidase is highly similar to xanthine dehydrogenase in its 
sequence and domain structure. 

- Nitrate reductase (EC 1.6.6.1), which catalyzes the reduction of nitrate 
to nitrite. Structurally, this enzyme of about 900 amino acids consists of 

an N-terminal Mo-pterin domain, a central cytochrome b5-type heme-binding 
domain (see <PDOC00170>) and a C-terminal FAD/NAD-binding cytochrome 
reductase domain. 

- Sulfite oxidase (EC 1.8.3.1), which catalyzes the oxidation of sulfite to 
sulfate. Structurally, this enzyme of about 460 amino acids consists of an 
N-terminal cytochrome b5 -binding domain followed by a Mo-pterin domain. 

There are a few conserved regions in the sequence of the molybdopterin-binding 
domain of these enzymes. The pattern uses to detect these proteins is based 
on one of them. It contains a cysteine residue which could be involved in 
binding the molybdopterin cofactor. 

Consensus pattern[GA]-x(3)-[KRNQHT]-x(ll,14)-[LIVMFYWS]-x(8)-[LIVMF]-x-C- 
x(2)-[DEN]-R-x(2)-[DE] 

[1] 

Wootton J.C., Nicolson R.E., Cock J.M., Walters D.E., Burke J.F., Doyle 
W.A., Bray R.C. 

Biochim. Biophys. Acta 1057:157-185(1991). 

811. (DNAJigase) ATP-dependent DNA ligase signatures 

PROSITE cross-reference(s): PS00697; DNA JLIGASE_A1 , PS00333; DNA_LIGASE_A2 
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DNA ligase (polydeoxyribonucleotide synthase) is the enzyme that joins two DNA 
fragments by catalyzing the formation of an internucleotide ester bond between 
phosphate and deoxyribose. It is active during DNA replication, DNA repair and 
DNA recombination. There are two forms of DNA ligase: one requires ATP 
(EC 6.5.1.1), the other NAD (EC 6.5.1.2). 

Eukaryotic, archaebacterial, virus and phage DNA ligases are ATP-dependent. 
During the first step of the joining reaction, the ligase interacts with ATP 
to form a covalent enzyme-adenylate intermediate. A conserved lysine residue 
is the site of adenylation [1,2]. 

Apart from the active site region, the only conserved region common to all 
ATP-dependent DNA ligases is found [3] in the C-terminal section and contains 
a conserved glutamate as well as four positions with conserved basic residues. 

Signature patterns were developed for both conserved regions. 

Consensus pattern[EDQH]-x-K-x-[DN]-G-x-R-[GACIVM] [K is the active site 
residue] 

Consensus patternE-G-[LIVMA]-[LIVM](2)-[KR]-x(5 ? 8)-[YW]-[ONEK]-x(2 ? 6)- 
[KRH]-x(3,5)-K-[LIVMFY]-K 

Sequences known to belong to this class detected by the patternALL, except 
for archebacterial DNA ligases. 

[1] 

Tomkinson A.E., Totty N.F., Ginsburg M., Lindahl T. 
Proc. Natl. Acad. Sci. U.S.A. 88:400-404(1991). 
[2] 

Lindahl T., Barnes D.E. 

Annu. Rev. Biochem. 61:251-281(1992). 

[3] 

Kletzin A. 
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Nucleic Acids Res. 20:5389-5396(1992). 

812. (FAD_Gly3P_dh) FAD-dependent glycerol-3 -phosphate dehydrogenase signatures 
PROSITE cross-reference(s): PS00977; FAD_G3PDH_1, PS00978; FAD_G3PDH_2 



FAD-dependent glycerol-3 -phosphate dehydrogenase (EC 1.1.99.5) (GPD) catalyzes 
the conversion of glycerol-3-phosphate into dihydroxyacetone phosphate. In 
bacteria [1] it is associated with the utilization of glycerol coupled to 
respiration. In Escherichia coli, two isozymes are known: one expressed under 
anaerobic conditions (gene glpA) and one in aerobic conditions (gene glpD). In 
eukaryotes, a mitochondrial form of GPD participates in the glycerol phosphate 
shuttle in conjunction with an NAD-dependent cytoplasmic GPD (EC 1.1.1.8) [2, 
3]. 

These enzymes are proteins of about 60 to 70 Kd which contain a probable 
FAD-binding domain in their N-terminal extremity. The mammalian enzyme differs 
from the bacterial or yeast proteins by having an EF-hand calcium-binding 
region (See <PDOC00018>) in its C-terminal extremity. 

Two signature patterns were developed. One based on the first half of the FAD- 
binding domain and one which corresponds to a conserved region in the central 
part of these enzymes. 

Consensus pattern[IV]-G-G-G-x(2)-G-[STACV]-G-x-A-x-D-x(3)-R-G 



Consensus patternG-G-K-x(2)-[GSTE]-Y-R-x(2)-A 
[1] 

Austin D., Larson T.J. 

J. Bacteriol. 173:101-107(1991). 

[2] 

Roennow B. ? Kielland-Brandt M.C. 

Yeast 9:1121-1130(1993). 

[3] 

Brown L.J., McDonald MJ. ? Lehn D.A., Moran S.M. 
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J. Biol. Chem. 269:14363-14366(1994). 

813. (Fapy_DNA__glyco) Formamidopyrimidine-DNA glycosylase signature 
PROSITE cross-reference(s): PS01242; FPG 

Formamidopyrimidine-DNA glycosylase (EC 3.2.2.23) [1] (Fapy-DNA glycosylase) 
(gene fpg) is a bacterial enzyme involved in DNA repair and which excise 
oxidized purine bases to release 2 ? 6-diamino-4-hydroxy-5N-methylformamido- 
pyrirnidine (Fapy) and 7 ? 8-dihydro-8-oxoguanine (8-OxoG) residues. In addition 
to its glycosylase activity, FPG can also nick DNA at apurinic/apyrimidinic 
sites (AP sites). FPG is a monomeric protein of about 32 Kd which binds and 
require zinc for its activity. 

The binding site for zinc seems to be located in the C-terminal part of the 

enzyme where fours conserved and essential [2] cysteines are located. A signature pattern 

was developed based on this region. 

Consensus patternC-x(2 ? 4)-C-x-[GTAQ]-x-[IV]-x(7)-R-[GSTAN]-[STA]-x-[FYI]-C- x(2)-C- 
Q 

[The four Cs are putative zinc ligands] 



[1] 

Duwat P., de Oliveira R. ? Ehrlich S.D., Boiteux S. 
Microbiology 141:411-417(1995). 

[2] 

O'Connor T.E., Graves R.J., Demurcia G., Castaing B., Laval J. 
J. Biol. Chem. 268:9063-9070(1993). 

814. (G_glu_transpept) Gamma-glutamyltranspeptidase signature 
PROSITE cross-reference(s): PS00462; G_GLU_TRANSPEPTIDASE 

Gamma-glutamyltranspeptidase (EC 2.3.2.2) (GGT) [1] catalyzes the transfer of 
the gamma-glutamyl moiety of glutathione to an acceptor that may be an amino 
acid, a peptide or water (forming glutamate). GGT plays a key role in the 
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gamma-glutamyl cycle, a pathway for the synthesis and degradation of 
glutathione. In prokaryotes and eukaryotes, it is an enzyme that consists of 
two polypeptide chains, a heavy and a light subunit, processed from a single 
chain precursor. The active site of GGT is known to be located in the light 
subunit. 

The sequences of mammalian and bacterial GGT show a number of regions of 
high similarity [2]. Pseudomonas cephalosporin acylases (EC 3.5.1.-) that 
convert 7-beta-(4-carboxybutanamido)-cephalosporanic acid (GL-7ACA) into 
7-aminocephalosporanic acid (7ACA) and glutaric acid are evolutionary related 
to GGT and also show some GGT activity [3]. Like GGT, these GL-7ACA acylases, 
are also composed of two subunits. 

One of the conserved regions correspond to the N-terminal extremity of the 
mature light chains of these enzymes. This region was used as a signature 
pattern. 

Consensus patternT-[STA]-H-x-[ST]-[LIVMA]-x(4)-G-[SN]-x-V-[STA]-x-T-x-T- 
[LIVM]-[NE]-x(l,2)-[FY]-G 

[1] 

Tate S.S., Meister A. 

Meth. Enzymol. 113:400-419(1985). 

[2] 

Suzuki H., Kumagai H., Echigo T., Tochikura T. 

J. Bacteriol. 171:5169-5172(1989). 

[3] 

Ishiye M., Niwa M. 

Biochim. Biophys. Acta 1132:233-239(1992). 
815. G-protein gamma subunit profile 

PROSITE cross-reference(s): PS50058; G PROTEIN GAMMA 



Guanine nucleotide-binding proteins (G proteins) [1] act as intermediaries in 
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the transduction of signals generated by transmembrane receptors. G proteins 
consist of three subunits (alpha, beta, and gamma). The alpha subunit binds to 
and hydrolyzes GTP; the functions of the beta and gamma subunits are less 
clear but they seem to be required for the replacement of GDP by GTP as well 
as for membrane anchoring and receptor recognition. 

The gamma subunits are small proteins (from 70 to 110 residues) that are 
bound to the membrane via a isoprenyl group (either a farnesyl or a geranyl- 
geranyl) covalently linked to their C-terminus. In mammals there are at least 
12 different isoforms of gamma subunits. 

The Caenorhabditis elegans protein egl-10, which is a regulator of G-protein 
signalling, contains a G-protein gamma-like domain. 

A profile was developed that spans the complete length of the gamma 
subunit. 

[1] 

Pennington S.R. 

Protein Prof. 2:16-315(1995). 

816. GNS1/SUR4 family signature 
PROSITE cross-reference(s): PS01188; GNS1_SUR4 

The following group of eukaryotic integral membrane proteins, whose exact 
function has not yet clearly been established, are evolutionary related [1]: 

- Yeast GNS1 [2], a protein involved in synthesis of 1,3-beta-glucan. 

- Yeast SUR4 (or APA1, SRE1) [3], a protein that could act in a glucose- 
signaling pathway that controls the expression of several genes that are 
transcriptionally regulated by glucose. 

- Yeast hypothetical protein YJL196c. 

- Caenorhabditis elegans hypothetical protein C40H1.4. 

- Caenorhabditis elegans hypothetical protein D2024.3. 
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The proteins have from 290 to 435 amino acid residues. Structurally, they seem 

to be formed of three sections: a N-terminal region with two transmembrane 

domains, a central hydrophilic loop and a C-terminal region that contains from 

one to three transmembrane domains. A conserved region that contains three histidines 

selected as a signature pattern. This region is located in the 

hydrophilic loop. 

Consensus patternL-x-F-L-H-x-Y-H-H 

[i] 

Bairoch A. 

Unpublished observations (1996). 
[2] 

El-Sherbeini M., Clemas J.A. 

J. Bacteriol. 177:3227-3234(1995). 

[3] 

Garcia-Arranz M., Maldonado A.M., Mazon M J., Portillo F. 
J. Biol. Chem. 269:18076-18082(1994). 

817. Immunoglobulins and major histocompatibility complex proteins signature 
PROSITE cross-reference(s): PS00290; IG_MHC 

The basic structure of immunoglobulin (Ig) [1] molecules is a tetramer of two 
light chains and two heavy chains linked by disulfide bonds. There are two 
types of light chains: kappa and lambda, each composed of a constant domain 
(CL) and a variable domain (VL). There are five types of heavy chains: alpha, 
delta, epsilon, gamma and mu, all consisting of a variable domain (VH) and 
three (in alpha, delta and gamma) or four (in epsilon and mu) constant 
domains (CHI to CH4). 

The major histocompatibility complex (MHC) molecules are made of two chains. 
In class I [2] the alpha chain is composed of three extracellular domains, a 
transmembrane region and a cytoplasmic tail. The beta chain (beta-2- 
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microglobulin) is composed of a single extracellular domain. In class II [3], 
both the alpha and the beta chains are composed of two extracellular domains, 
a transmembrane region and a cytoplasmic tail. 

5 It is known [4,5] that the Ig constant chain domains and a single 

extracellular domain in each type of MHC chains are related. These 
homologous domains are approximately one hundred amino acids long and 
include a conserved intradomain disulfide bond. A small pattern 

around the C-terminal cysteine is involved in this disulfide bond which can be used to detect 
1 0 these category of Ig related proteins. 

Consensus pattern[FY]-x-C-x-[VA]-x-H-Sequences known to belong to this 
class detected by the pattern: Ig heavy chains type Alpha C region : All, 
in CH2 and CH3. Ig heavy chains type Delta C region : All, in CH3. Ig 

15 heavy chains type Epsilon C region: All, in CHI, CH3 and CH4. Ig heavy 
chains type Gamma C region : All, in CH3 and also CHI in some cases Ig 
heavy chains type Mu C region : All, in CH2, CH3 and CH4. Ig light chains 
type Kappa C region : In all CL except rabbit and Xenopus. Ig light chains 
type Lambda C region : In all CL except rabbit. MHC class I alpha chains : 

2 0 All, in alpha-3 domains, including in the cytomegalovirus MHC-1 homologous 
protein [6]. Beta-2-microglobulin : All. MHC class II alpha chains: All, 
in alpha-2 domains. MHC class II beta chains: All, in beta-2 domains. 

[1] 

25 GoughN. 

Trends Biochem. Sci. 6:203-205(1981). 
[2] 

Klein J., Figueroa F. 
Immunol. Today 7:41-44(1986). 
30 [3] 

Figueroa F., Klein J. 
Immunol. Today 7:78-81(1986). 
[4] 

Orr H.T., Lancet D., Robb R.J., Lopez de Castro J.A., Strominger J.L. 
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Nature 282:266-270(1979). 
[5] 

Cushley W., Owen MJ. 
Immunol. Today 4:88-92(1983). 
[6] 

Beck S., Barrel B.G. 
Nature 331:269-272(1988). 

818. (IGFBP) Insulin-like growth factor binding proteins signature 
PROSITE cross-reference(s): PS00222; IGF_BINDING 

The insulin-like growth factors (IGF-I and IGF-II) bind to specific binding 
proteins in extracellular fluids with high affinity [1,2,3]. These IGF-binding 
proteins (IGFBP) prolong the half-life of the IGFs and have been shown to 
either inhibit or stimulate the growth promoting effects of the IGFs on cells 
culture. They seem to alter the interaction of IGFs with their cell surface 
receptors. There are at least six different IGFBPs and they are structurally 
related. 

The following growth-factor inducible proteins are structurally related to 
IGFBPs and could function as growth-factor binding proteins [4,5]: 

- Mouse protein cyr61 and its probable chicken homolog, protein CEF-10. 

- Human connective tissue growth factor (CTGF) and its mouse homolog, protein 
FISP-12. 

- Vertebrate protein NOV. 

As a signature pattern a conserved cysteine-rich region locatedin the N-terminal 
section of these proteins is used. 

Consensus patternG-C-[GS]-C-C-x(2)-C-A-x(6)-C 

Sequences known to belong to this class detected by the patternALL, except 
for IGFBP-6 r s. 
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[1] 

Rechler M.M. 

Vitam. Horm. 47:1-114(1993). 
[2] 

Shimasaki S., Ling N. 

Prog. Growth Factor Res. 3:243-266(1991). 

[3] 

Clemmons D.R. 

Trends Endocrinol. Metab. 1:412-417(1990). 
[4] 

Bradham D.M., Igarashi A., Potter R.L., Grotendorst G.R. 

J. Cell Biol. 114:1285-1294(1991). 

[5] 

Maloisel V., Martinerie C, Dambrine G., Plassiart G., Brisac M., Crochet 
J., Perbal B. 

Mol. Cell. Biol. 12:10-21(1992). 

819. LMWPc : Low molecular weight phosphotyrosine protein phosphatase 
Number of members: 34 

[l]Medline: 94329182, The crystal structure of a low-molecular-weight phosphotyrosine 
protein phosphatase. Su XD, Taddei N, Stefani M, Ramponi G, Nordlund P; Nature 
1994;370:575-578. 

820. (myosin_head) ATP/GTP-binding site motif A (P-loop) 
PROSITE cross-reference(s): PS00017; ATP_GTP_A 

From sequence comparisons and crystallographic data analysis it has been shown 
[1,2,3,4,5,6] that an appreciable proportion of proteins that bind ATP or GTP 
share a number of more or less conserved sequence motifs. The best conserved 
of these motifs is a glycine-rich region, which typically forms a flexible 
loop between a beta-strand and an alpha-helix. This loop interacts with one of 
the phosphate groups of the nucleotide. This sequence motif is generally 
referred to as the W consensus sequence [1] or the P-loop' [5]. 
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There are numerous ATP- or GTP-binding proteins in which the P-loop is found. 
A number of protein families for which the relevance of the 
presence of such motif has been noted is listed below: 

- ATP synthase alpha and beta subunits (see <PDOC00137>). 

- Myosin heavy chains. 

- Kinesin heavy chains and kinesin-like proteins (see <PDOC00343>). 

- Dynamins and dynamin-like proteins (see <PDOC00362>). 

- Guanylate kinase (see <PDOC00670>). 

- Thymidine kinase (see <PDOC00524>). 

- Thymidylate kinase (see <PDOC01034>). 

- Shikimate kinase (see <PDOC00868>). 

- Nitrogenase iron protein family (nifH/frxC) (see <PDOC00580>). 

- ATP-binding proteins involved in 'active transport 1 (ABC transporters) [7] 
(see <PDOC00185>). 

- DNA and RNA helicases [8,9,10]. 

- GTP-binding elongation factors (EF-Tu, EF-lalpha, EF-G, EF-2, etc.). 

- Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, Yptl, SEC4, etc.). 

- Nuclear protein ran (see <PDOC00859>). 

- ADP-ribosylation factors family (see <PDOC00781>). 

- Bacterial dnaA protein (see <PDOC00771>). 

- Bacterial recA protein (see <PDOC00131>). 

- Bacterial recF protein (see <PDOC00539>). 

- Guanine nucleotide-binding proteins alpha subunits (Gi, Gs, Gt, GO, etc.). 

- DNA mismatch repair proteins mutS family (See <PDOC00388>). 

- Bacterial type II secretion system protein E (see <PDOC00567>). 

Not all ATP- or GTP-binding proteins are picked-up by this motif. A number of 
proteins escape detection because the structure of their ATP-binding site is 
completely different from that of the P-loop. Examples of such proteins are 
the E1-E2 ATPases or the glycolytic kinases. In other ATP- or GTP-binding 
proteins the flexible loop exists in a slightly different form; this is the 
case for tubulins or protein kinases. A special mention must be reserved for 
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adenylate kinase, in which there is a single deviation from the P-loop 
pattern: in the last position Gly is found instead of Ser or Thr. 

Consensus pattern[AG]-x(4)-G-K-[ST] 
[1] 

Walker J.E., Saraste M., Runswick M.J., Gay N.J. 

EMBO J. 1:945-951(1982). 

[2] 

Moller W., Anions R. 
FEBS Lett. 186:1-7(1985). 
[3] 

Fry D.C., Kuby S.A., Mildvan A.S. 

Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 

[4] 

Dever T.E., Glynias M.J., Merrick W.C. 

Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). 

[5] 

Saraste M., Sibbald P.R., Wittinghofer A. 
Trends Biochem. Sci. 15:430-434(1990). 
[6] 

Koonin E.V. 

J. Mol. Biol. 229:1165-1174(1993). 
[7] 

Higgins C.F., Hyde S.C., Mimmack M.M., Gileadi U., Gill D.R., Gallagher 
M.P. 

J. Bioenerg. Biomembr. 22:571-592(1990). 
[8] 

Hodgman T.C. 

Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 
[9] 

Linder P., Lasko P., Ashburner M., Leroy P., Nielsen P.J., Nishi K., 
Schnier J., Slonimski P.P. 
Nature 337:121-122(1989). 
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[10] 

Gorbalenya A.E., Koonin E.V., Donchenko A.P., Blinov VJVL 
Nucleic Acids Res. 17:4713-4730(1989). 

821. PE: PE family 

This family named after a PE motif near to the amino terminus of the domain. The PE family 
of proteins all contain an amino-terminal region of about 110 amino acids. The carboxyl 
terminus of this family are variable and fall into several classes. The largest class of PE 
proteins is the highly repetitive PGRS class which have a high glycine content. The function 
of these proteins is uncertain but it has been suggested that they may be related to antigenic 
variation of Mycobacterium tuberculosis [1]. Number of members: 88 

[1] Medline: 98295987. Deciphering the biology of Mycobacterium tuberculosis from the 
complete genome sequence. Cole ST, Brosch R ? Parkhill J ? Gamier T, Churcher C ? Harris D, 
Gordon SV, Eiglmeier K, Gas S, Barry CE 3rd, Tekaia F, Badcock K, Basham D, Brown D, 
Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T, Gentles S, Hamlin N, Holroyd 
S, Hornsby T, Jagels K, Barrell BG, et al; Nature 1998;393:537-544. 

822. (RNB) Ribonuclease II family signature 

PROSITE cross-reference(s): PS01175; RIBONUCLEASE JI 

On the basis of sequence similarities, the following bacterial and eukaryotic 
proteins seem to form a family: 

- Escherichia coli and related bacteria ribonuclease II (EC 3.1.13.1) (RNase 
II) (gene rnb) [1]. RNase II is an exonuclease involved in mRNA decay. It 
degrades mRNA by hydrolyzing single-stranded polyribonucleotides 
processively in the 3 1 to 5 r direction. 

- Bacterial protein vacB. In Shigella flexneri, vacB has been shown to be 
required for the expression of virulence genes at the posttranscriptional 
level. 

- Yeast protein SSD1 (or SRK1) which is implicated in the control of the cell 
cycle Gl phase. 

- Yeast protein DIS3 [2], which binds to ran (GSP1) and ehances the the 
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nucleo tide-releasing activity of RCC1 on ran. 

- Fission yeast protein dis3, which is implicated in mitotic control. 

- Neurospora crassa cyt-4, a mitochondrial protein required for RNA 5 1 and 3 T 
end processing and splicing. 

- Yeast protein MSU1, which is involved in mitochondrial biogenesis. 

- Synechocystis strain PCC 6803 protein zam [3], which control resistance to 
the carbonic anhydrase inhibitor acetazolamide. 

- Caenorhabditis elegans hypothetical protein F48E8.6. 

The size of these proteins range from 644 residues (rnb) to 1250 (SSD1). While 
their sequence is highly divergent they share a conserved domain in their C- 
terminal section [4]. It is possible that this domain plays a role in a 

putative exonuclease function that would be common to all these proteins. A signature pattern 
was developed based on the core of this conserved domain. 

Consensus pattern[HIHFYE]-[GSTAM]-[LW^ 
[SA]-P-[LIVMA]-[RQ]-[KR]-[FY]-x-D-x(3)-[HO] 

[1] 

Zilhao R., Camelo L., Arraiano CM. 
Mol. Microbiol. 8:43-51(1993). 
[2] 

Noguchi E., Hayashi NL, Azuma Y. ? Seki T., Nakamura M., Nakashima N., 
Yanagida M., He X., Mueller U., Sazer S. ? Nishimoto T. 
EMBO 1 15:5595-5605(1996). 
[3] 

Beuf L., Bedu S., Cami B., Joset F. 
Plant Mol. Biol. 27:779-788(1995). 
[4] 

Mian LS. 

Nucleic Acids Res. 25:3187-3195(1997). 



823. Src homology 2 (SH2) domain profile 
PROSITE cross-reference(s): PS50001; SH2 
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The Src homology 2 (SH2) domain is a protein domain of about 100 amino-acid 
residues first identified as a conserved sequence region between the 
oncoproteins Src and Fps [1]. Similar sequences were later found in many other 
intracellular signal-transducing proteins [2], SH2 domains function as 
regulatory modules of intracellular signalling cascades by interacting with 
high affinity to phosphotyrosine-containing target peptides in a sequence- 
specific and strictly phosphorylation-dependent manner [3,4,5,6]. 

The SH2 domain has a conserved 3D structure consisting of two alpha helices 
and six to seven beta-strands. The core of the domain is formed by a 
continuous beta-meander composed of two connected beta-sheets [7]. 

So far, SH2 domains have been identified in the following proteins: 

- Many vertebrate, invertebrate and retroviral cytoplasmic (non-receptor) 
protein tyrosine kinases. In particular in the Src, Abl, Bkt, Csk and ZAP70 
families of kinases. 

- Mammalian phosphatidylinositol-specific phospholipase C gamma- 1 and -2. Two 
copies of the SH2 domain are found in those proteins in between the 

catalytic 'X- f and T-boxes' (see <PDOC50007>). 

- Mammalian phosphatidyl inositol 3-kinase regulatory p85 subunit 

- Some vertebrate and invertebrate protein-tyrosine phosphatases. 

- Mammalian Ras GTPase-activating protein (GAP). 

- Adaptor proteins mediating binding of guanine nucleotide exchange factors 
to growth factor receptors: vertebrate GRB2, Caenorhabditis elegans sem-5 
and Drosophila DRK. 

- Mammalian Vav oncoprotein, a guanine-nucleotide exchange factor of the 
CDC24 family. 

- Miscellanous proteins interacting with vertebrate receptor protein 
tyrosine kinases: oncoprotein Crk, mammalian cytoplasmic proteins Nek, She. 

- STAT proteins (signal transducers and activators of transcription). 

- Chicken tensin. 

- Yeast transcriptional control protein SPT6. 
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The profile developed to detect SH2 domains is based on a structural alignment 
consisting of 8 gap-free blocks and 7 linker regions totaling 92 match 
positions. 

5 

[1] 

Sadowski I., Stone J.C., Pawson T. 
Mol. Cell. Biol. 6:4396-4408(1986). 
[2] 

1 0 Russel R.B., Breed J., Barton G.J. 
FEBS Lett. 304:15-20(1992). 

[3] 

Marangere L.E.M., Pawson T. 
J. Cell Sci. Suppl. 18:97-104(1994). 
15 [4] 

Pawson T., Schlessinger J. 
Curr. Biol. 3:434-442(1993). 

[5] 

Mayer B.J., Baltimore D. 
2 0 Trends Cell. Biol. 3:8-13(1993). 
[6] 

Pawson T. 

Nature 373:573-580(1995). 
[7] 

2 5 Kuriyan J. ? Cowburn D. 

Curr. Opin. Struct. Biol. 3:828-837(1993). 

824. Sulfate transporters signature 

PROSITE cross-reference(s): PS01130; SULFATE JTRANSP 

30 

A number of proteins involved in the transport of sulfate across a membrane 
as well as some yet uncharacterized proteins have been shown [1,2] to be 
evolutionary related. These proteins are: 
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- Neurospora crassa sulfate permease II (gene cys-14). 

- Yeast sulfate permeases (genes SUL1 and SUL2). 

- Rat sulfate anion transporter 1 (SAT-1). 

- Mammalian DTDST, a probable sulfate transporter which, in Human, is 
involved in the genetic disease, diastrophic dysplasia (DTD). 

- Sulfate transporters 1, 2 and 3 from the legume Stylosanthes hamata. 

- Human pendrin (gene PDS), which is involved in a number of hearing loss 
genetic diseases. 

- Human protein DRA (Down-Regulated in Adenoma). 

- Soybean early nodulin 70. 

- Escherichia coli hypothetical protein ychM. 

- Caenorhabditis elegans hypothetical protein F41D9.5. 

As expected by their transport function, these proteins are highly hydrophobic 
and seem to contain about 12 transmembrane domains. The best conserved region 
seems to be located in the second transmembrane region and is used as a 
signature pattern. 

Consensus pattern[PAV]-x-Y-[GS]-L-Y-[STAG](2)-x(4)-[LIVFYA]-[LIVST]-[YI]- 
x(3)-[GA]-[GST]-S-[KR] 

[1] 

Sandal N.N., Marcker K.A. 

Trends Biochem. Sci. 19:19-19(1994). 

[2] 

Smith F.W., Hawkesford M.J., Prosser I.M., Clarkson D.T. 
MoL Gen. Genet. 247:709-715(1995). 

825. TYA: TYA transposon protein 

Ty are yeast transposons. A 5.7kb transcript codes for p3 a fusion protein of TYA and TYB. 
The TYA protein is analogous to the gag protein of retroviruses. TYA a is cleaved to form 
46kd protein which can form mature virion like particles [1]. Number of members: 59 



Attorney No. 2750-1237P 

673 

[1] Medline: 97404699. Cryo-electron microscopy structure of yeast Ty retrotransposon 
virus-like particles. Palmer KJ, Tichelaar W, Myers N, Burns NR ? Butcher SJ, Kingsman AJ, 
Fuller SD, Saibil HR; J Virol 1997;71:6863-6868. 

826. Aldolase JI 

Class II Aldolase and Adducin N-terminal domain. 

-!- This family includes class II aldolases and adducins which have not been ascribed any 
enzymatic function. Number of members: 37 

References: 

[1] Medline: 93294819. The spatial structure of the class II L-fuculose-1 -phosphate aldolase 

from Escherichia coli. Dreyer MK ? Schulz GE; J Mol Biol 1993;231:549-553. 

[2] Medline: 96256522. Catalytic mechanism of the metal-dependent fuculose aldolase from 

Escherichia coli as derived from the structure. Dreyer MK, Schulz GE; J Mol Biol 

1996;259:458-466. 

827. CBD_2 

-!- Two tryptophan residues are involved in cellulose binding. 

-!- Cellulose binding domain found in bacteria. Number of members: 51 

References: 

[1] Medline: 95284032. Solution structure of a cellulose-binding domain from Cellulomonas 
fimi by nuclear magnetic resonance spectroscopy. Xu GY ? Ong E, Gilkes NR, Kilburn DG, 
Muhandiram DR, Harris-Brandts M, Carver JP, Kay LE, Harvey TS; Biochemistry 
1995;34:6993-7009. 

828. P 

A unique feature of the eukaryotic subtilisin-like proprotein convertases is the presence of an 
additional highly conserved sequence of approximately 150 residues (P domain) located 
immediately downstream of the catalytic domain. 
Number of members: 91 



References: 
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[1] Medline: 94252314. A C-terminal domain conserved in precursor processing proteases is 
required for intramolecular N-terminal maturation of pro-Kex2 protease. Gluschankof P ? 
Fuller RS; EMBO J 1994;13:2280-2288, 

[2] Medline: 98225190. Regulatory roles of the P domain of the subtilisin-like prohormone 
5 convertases. Zhou A, Martin S, Lipkind G, LaMendola J ? Steiner DF; J Biol Chem 
1998;273:11107-11114. 

829. Uncharacterized protein family UPF0020 signature 
PROSITE cross-reference(s): PS01261; UPF0020 

1 0 The following uncharacterized proteins have been shown [1] to share regions of 
similarities: 

- Escherichia coli hypothetical protein ycbY and HI01 16/15, the corresponding Haemophilus 
influenzae protein. 

1 5 - Bacillus subtilis hypothetical protein ypsC. 

- Synechocystis strain PCC 6803 hypothetical protein slr0064. 

- Methanococcus jannaschii hypothetical proteins MJ0438 and MJ0710. 

These are hydrophilic proteins of from 40 Kd to about 80 Kd. They can be 
2 0 picked up in the database by the following pattern. 

Consensus patternD-P-[LIVMF]-C-G-[ST]-G-x(3)-[LI]-E 
References: 

2 5 [1] Bairoch A. Unpublished observations (1997). 

830. Uncharacterized protein family UPF0031 signatures 

PROSITE cross-reference(s): PS01049; UPF0031_1; PS01050; UPF0031_2 
The following uncharacterized proteins have been shown [1] to share regions of 

3 0 similarities: 

- Yeast chromosome XI hypothetical protein YKL151c. 

- Caenorhabditis elegans hypothetical protein R107.2. 

- Escherichia coli hypothetical protein yjeF. 
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- Bacillus subtilis hypothetical protein yxkO. 

- Helicobacter pylori hypothetical protein HP1363. 

- Mycobacterium tuberculosis hypothetical protein MtCY77.05c. 

- Mycobacterium leprae hypothetical protein B229_C2_201. 

- Synechocystis strain PCC 6803 hypothetical protein slll433. 

- Methanococcus jannaschii hypothetical protein MJ1586. 

These are proteins of about 30 to 40 Kd whose central region is well 
conserved. They can be picked up in the database by the following patterns. 

Consensus pattern[SAV]-[IVW]-[LVA]-[LIV]-G-[PNS]-G-L-[GP]-x-[DENQT] 
Consensus pattern[GA]-G-x-G-D-[TV]-[LT]-[STA]-G-x-[LIVM] 

831. (ACOX) 
Acyl-CoA oxidase 

This is a family of Acyl-CoA oxidases EC:1.3.3.6. Acyl-coA oxidase converts acyl-CoA into 
trans-2-enoyl-CoA [1]. 

Number of members: 39 

[1] Hayashi H ? De Bellis L, Yamaguchi K, Kato A, Hayashi M, Nishimura M; Medline: 
98192624. Molecular characterization of a glyoxysomal long chain acyl-CoA oxidase that is 
synthesized as a precursor of higher molecular mass in pumpkin." J Biol Chem 
1998;273:8301-8307. 

832. (AICARFTJMPCHas) 
AICARFT/IMPCHase bienzyme 

This is a family of bifunctional enzymes catalysing the last steps in de novo purine 
biosynthesis. The bifunctional enzyme is found in both prokaryotes and eukaryotes. The 
second last step is catalysed by 5-aminoimidazole-4-carboxamide ribonucleotide 
formyltransf erase EC:2.1.23 (AICARFT), this enzyme catalyses the formylation of AICAR 
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with 10-formyl-tetrahydrofolate to yield FAICAR and tetrahydrofolate [1]. The last step is 
catalysed by IMP (Inosine monophosphate) cyclohydrolase EC:3.5.4.10 (IMPCHase), 
cyclizing FAICAR (5-formylaminoimidazole-4-carboxamide ribonucleotide) to IMP [1], 

Number of members: 22 

[1] Akira T, Komatsu M, Nango R, Tomooka A, Konaka K, Yamauchi M, Kitamura Y, 
Nomura S, Tsukamoto I; Medline: 97473523 Molecular cloning and expression of a rat 
cDNA encoding 5-aminoimidazole-4-carboxamide ribonucleotide formyltransferase/IMP 
cyclohydrolase" [published erratum appears in Gene 1998 Feb 27;208(2):337] Gene 
1997;197:289-293. 

[2] Rayl EA, Moroson BA, Beardsley GP; Medline: 96147205 The human purH gene 
product, 5-aminoimidazole-4-carboxamide ribonucleotide formyltransferase/IMP 
cyclohydrolase. Cloning, sequencing, expression, purification, kinetic analysis, and domain 
mapping." J Biol Chem 1996;271:2225-2233. 



833. (AOX) 
Alternative oxidase 

The alternative oxidase is used as a second terminal oxidase in the mitochondria, electrons 
are transfered directly from reduced ubiquinol to oxygen forming water [2]. This is not 
coupled to ATP synthesis and is not inhibited by cyanide, this pathway is a single step 
process [1], In rice the transcript levels of the alternative oxidase are increased by low 
temperature [1]. 

Number of members: 27 

[1] Ito Y, Saisho D, Nakazono M, Tsutsumi N, Hirai A; Medline: 98086211 Transcript 
levels of tandem-arranged alternative oxidase genes in rice are increased by low 
temperature." Gene 1997;203:121-129. 
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[2] Li Q, Ritzel RG, McLean LL, Mcintosh L, Ko T, Bertrand H, Nargang FE; Medline: 
96366413 Cloning and analysis of the alternative oxidase gene of Neurospora crassa." 
Genetics 1996;142:129-140. 

834. (APH) 

Protein kinases signatures and profile 

Cross-reference(s): PS00107; PROTEIN JQNASE_ATP, PS00108; 
PROTEIN_KINASE_ST, PS00109; PROTEIN JQNASEJTYR, PS50011; 
PROTEIN_KINASE_DOM 

Eukaryotic protein kinases [1 to 5] are enzymes that belong to a very extensive family of 
proteins which share a conserved catalytic core common to both serine/threonine and tyrosine 
protein kinases. There are a number of conserved regions in the catalytic domain of protein 
kinases. Two of these regions have been selected to build signature patterns. The first region, 
which is located in the N-terminal extremity of the catalytic domain, is a glycine-rich stretch 
of residues in the vicinity of a lysine residue, which has been shown to be involved in ATP 
binding. The second region, which is located in the central part of the catalytic domain, 
contains a conserved aspartic acid residue which is important for the catalytic activity of the 
enzyme [6]; two signature patterns were derived for that region: one specific for serine/ 
threonine kinases and the other for tyrosine kinases. A profile was developed which is based 
on the alignment in [1] and covers the entire catalytic domain. 

Consensus pattern: [LIV]-G-{P}-G-{P}-[FYWMGSTNH]-[SGA]-{PW}-[LIVCAT]-{PD}-x- 
[GSTACLIVMFY]-x(5,18)-[LIVMFYWCSTAR]-[AIVP]-[LIVMFAGCKR]-K [K binds 
ATP] 

Sequences known to belong to this class detected by the pattern the majority of known 
protein kinases but it fails to find a number of them, especially viral kinases which are quite 
divergent in this region and are completely missed by this pattern. 



Consensus pattern: [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-K-x(2)-N-[LIVMFYCT](3) [D is 
an active site residue] 
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Sequences known to belong to this class detected by the pattern. Most serine/ threonine 
specific protein kinases with 10 exceptions (half of them viral kinases) and also Epstein-Barr 
virus BGLF4 and Drosophila ninaC which have respectively Ser and Arg instead of the 
conserved Lys and which are therefore detected by the tyrosine kinase specific pattern 
described below. 

Consensus pattern: [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-[RSTAC]-x(2)-N-[LIVMFYC](3) 
[D is an active site residue] tyrosine specific protein kinases with the exception of human 
ERBB3 and mouse blk. This pattern will also detect most bacterial aminoglycoside 
phosphotransferases [8,9] and herpesviruses ganciclovir kinases [10]; which are proteins 
structurally and evolutionary related to protein kinases. Sequences known to belong to this 
class detected by the profile ALL, except for three viral kinases. This profile also detects 
receptor guanylate cyclases (see <PDOC00430>) and 2-5A-dependent ribonucleases. 
Sequence similarities between these two families and the eukaryotic protein kinase family 
have been noticed before. It also detects Arabidopsis thaliana kinase- like protein TMKL1 
which seems to have lost its catalytic activity. 

Note if a protein analyzed includes the two protein kinase signatures, the probability of it 
being a protein kinase is close to 100%. Note eukaryotic-type protein kinases have also been 
found in prokaryotes such as Myxococcus xanthus [11] and Yersinia pseudotuberculosis. 
Note the patterns shown above has been updated since their publication in [7], Note this 
documentation entry is linked to both signature patterns and a profile. As the profile is much 
more sensitive than the patterns, you should use it if you have access to the necessary 
software tools to do so. 

References 

[ 1] Hanks S.K., Hunter T., FASEB J. 9:576-596(1995). 

[ 2] Hunter T., Meth. EnzymoL 200:3-37(1991). 

[ 3] Hanks S.K., Quinn A.M., Meth. Enzymol. 200:38-62(1991). 

[ 4] Hanks S.K., Curr. Opin. Struct. Biol. 1:369-383(1991). 

[ 5] Hanks S.K., Quinn A.M., Hunter T., Science 241:42-52(1988). 

[ 6] Knighton D.R., Zheng J. ? Ten Eyck L.F., Ashford V.A., Xuong N.-H., Taylor, S.S., 

Sowadski J.M., Science 253:407-414(1991). 
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[ 7] Bairoch A., Claverie J.-M., Nature 331:22(1988). 
[ 8] Benner S., Nature 329:21-21(1987). 
[ 9] Kirby R., J. Mol. Evol. 30:489-492(1992). 
[10] Littler E., Stuart A.D., Chee M.S., Nature 358:160-162(1992). 
5 [11] Munoz-Dorado J., Inouye S., Inouye M., Cell 67:995-1006(1991). 

835. (Asp_Glu_race) 

Aspartate and glutamate racemases signatures 

10 

Cross-reference(s) PS00923; ASP_GLU_RACEMASE_1 PS00924; 
ASP_GLU_RACEMASE_2 

Aspartate racemase (EC 5.1.1.13) and glutamate racemase (EC 5.1.1.3) are two evolutionary 
1 5 related bacterial enzymes that do not seem to require a cofactor for their activity [1]. 

Glutamate racemase, which interconverts L-glutamate into D-glutamate, is required for the 
biosynthesis of peptidoglycan and some peptide-based antibiotics such as gramicidin S. In 
addition to characterized aspartate and glutamate racemases, this family also includes a 
hypothetical protein from Erwinia carotovora and one from Escherichia coli (ygeA). Two 
2 0 conserved cysteines are present in the sequence of these enzymes. They are expected to play 
a role in catalytic activity by acting as bases in proton abstraction from the substrate. 
Signature patterns were developed for both cysteines. 

Consensus pattern: [IVA]-[LIVM]-x-C-x(0,l)-N-[ST]-[MSA]-[STH]-[LIVFYSTANK] 

25 

Consensus pattern: [LIVM](2)-x-[AG]-C-T-[DEH]-[LIVMFY]-[PNGRS]-x-[LIVM] 
[ 1] Gallo K.A., Knowles J.R., Biochemistry 32:3981-3990(1993). 

30 

836. (ATP-sulfurylase) 
ATP-sulfurylase 
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This family consists of ATP-sulfurylase or sulfate adenylyltransferase EC:2.7.7.4 some of 
which are part of a bifunctional polypeptide chain associated with adenosyl phosphosulphate 
(APS) kinase APS_kinase. Both enzymes are required for PAPS (phosphoadenosine- 
phosphosulfate) synthesis from inorganic sulphate [2]. ATP sulfurylase catalyses the 
synthesis of adenosine-phosphosulfate APS from ATP and inorganic sulphate [1]. 

Number of members: 37 

[1] Kurima K, Warman ML, Krishnan S, Domowicz M, Krueger RC Jr, Deyrup A, Schwartz 
NB; Medline: 98337975 A member of a family of sulfate-activating enzymes causes murine 
brachymorphism" [published erratum appears in Proc Natl Acad Sci U S A 1998 Sep 
29;95(20):12071] Proc Natl Acad Sci U S A 1998;95:8681-8685. 

[2] Rosenthal E, Leustek T; Medline: 96096529 A multifunctional Urechis caupo protein, 
PAPS synthetase, has both ATP sulfurylase and APS kinase activities." Gene 1995;165:243- 
248. 

837. (ATP-synt_F) 

ATP synthase (F/14-kDa) subunit 

This family includes 14-kDa subunit from vATPases [1], which is in the peripheral catalytic 
part of the complex [2]. The family also includes archaebacterial ATP synthase subunit F [3]. 

Number of members: 23 

[1] Guo Y, Kaiser K, Wieczorek H, Dow JA; Medline: 96269411 The Drosophila 
melanogaster gene vhal4 encoding a 14-kDa F-subunit of the vacuolar ATPase." Gene 
1996;172:239-243. 

[2] Peng SB, Crider BP, Tsai SJ, Xie XS, Stone DK; Medline: 96216416 Identification of a 
14-kDa subunit associated with the catalytic sector of clathrin-coated vesicle H+- ATPase." J 
Biol Chem 1996;271:3324-3327. 
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[3] Wilms R, Freiberg C, Wegerle E, Meier I, Mayer F, Muller V; Medline: 96324968 
Subunit structure and organization of the genes of the Al AO ATPase from the Archaeon 
Methanosarcina mazei Gol." J Biol Chem 1996;271:18843-18852. 

838. (CBD_4) 
Starch binding domain 

Number of members: 48 

839. (CbiX) 

The function of CbiX is uncertain, however it is found in cobalamin biosynthesis operons and 
so may have a related function. Some CbiX proteins contain a striking histidine-rich region at 
their C-terminus, which suggests that it might be involved in metal chelation [1]. 

Number of members: 6 

[1] Raux E, Lanois A, Warren MJ, Rambach A, Thermes C; Medline: 98416126 Cobalamin 
(vitamin B12) biosynthesis: identification and characterization of a Bacillus megaterium cobl 
operon." Biochem J 1998;335:159-166. 

840. (Complexl_51K) 

Respiratory-chain NADH dehydrogenase 51 Kd subunit signatures Cross-reference(s) 
PS00644; COMPLEXl_51K_l PS00645; COMPLEXl_5 1K_2 

Respiratory-chain NADH dehydrogenase (EC 1.6.5.3) [1,2] (also known as complex I or 
NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex located in the inner 
mitochondrial membrane which also seems to exist in the chloroplast and in cyanobacteria 
(as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide subunits of this 
bioenergetic enzyme complex there is one with a molecular weight of 51 Kd (in mammals), 
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which is the second largest subunit of complex I and is a component of the iron-sulfur (IP) 
fragment of the enzyme. It seems to bind to NAD, FMN, and a 2Fe-2S cluster. 

The 51 Kd subunit is highly similar to [3,4]: 

- Subunit alpha of Alcaligenes eutrophus NAD-reducing hydrogenase (gene hoxF) which 
also binds to NAD, FMN, and a 2Fe-2S cluster. 

- Subunit NQOl of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 

- Subunit F of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoF). 

The 51 Kd subunit and the bacterial hydrogenase alpha subunit contains three regions of 
sequence similarities. The first one most probably corresponds to the NAD-binding site, the 
second to the FMN-binding site, and the third one, which contains three cysteines, to the iron- 
sulfur binding region. Signature patterns have been developed for the FMN-binding and for 
the 2Fe-2S binding regions. 

Consensus pattern: G-[AM]-G-[AR]-Y-[LIVM]-C-G-[DE](2)-[STA](2)-[LIM](2)-[EN]- S 
Consensus pattern: E-S-C-G-x-C-x-P-C-R-x-G [The three C's are putative 2Fe-2S ligands] 

[ 1] Ragan C.I., Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss FL, Friedrich T., Hofhaus G., Preis D„ Eur. J. Biochem. 197:563-576(1991). 

[ 3] Fearnley I.M., Walker J.E. Biochim. Biophys. Acta 1140:105-134(1992). 

[ 4] Weidner U., Geier S., Ptock A., Friedrich T., Leif FL, Weiss FL, J. Mol. Biol. 233:109- 

122(1993). 

841. (DAP_epimerase) 
Diaminopimelate epimerase signature 

Cross-reference(s) PS01326; DAPEPIMERASE 

Diaminopimelate epimerase (EC 5.1.1.7) catalyzes the isomeriazation of L,L- to D,L-meso- 
diaminopimelate in the biosynthetic pathway leading from aspartate to lysine. This enzyme ii 
a protein of about 30 Kd. Two conserved cysteines seem [1] to function as the acid and base 
in the catalytic mechanism. As a signature pattern, the region surrounding the first of these 
two active site cysteines were selected. 
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Consensus pattern: N-x-D-G-S-x(4)-C-G-N-[GA]-x-R [C is an active site residue] Sequences 
known to belong to this class detected by the pattern ALL, except for an Anabaena dapF 
which has a Ser instead of the active site Cys. 

[ 1] Chilli M., Zheng R., Scapin G., Blanchard J.S., Biochemistry 37:16452-16458(1998). 

842. (DNA_gyraseB_C) 

DNA topoisomerase II signature 

Cross-reference(s) PS00177; TOPOISOMERASEJI 

DNA topoisomerase I (EC 5.99.1.2) [1,2,3,4,E1] is one of the two types of enzyme that 
catalyze the interconversion of topological DNA isomers. Type II topoisomerases are ATP- 
dependent and act by passing a DNA segment through a transient double-strand break. 
Topoisomerase II is found in phages, archaebacteria, prokaryotes, eukaryotes, and in 
African Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three 
subunits (the product of genes 39, 52 and 60). In prokaryotes and in archaebacteria the 
enzyme, known as DNA gyrase, consists of two subunits (genes gyrA and gyrB [E2]). In 
some bacteria, a second type II topoisomerase has been identified; it is known as 
topoisomerase IV and is required for chromosome segregation, it also consists of two 
subunits (genes parC and parE). In eukaryotes, type II topoisomerase is a homodimer. 

There are many regions of sequence homology between the different subtypes of 
topoisomerase II. The relation between the different subunits is shown in the following 
representation: 

< About- 1400-resi dues > 

[ Protein 39-*- — ][ — Protein 52 — ] Phage T4 

[ gyrB * ][ gyrA- ] Prokaryote II 

Archaebacteria 

[ parE * ][ parD ] Prokaryote IV 

r * ] Eukaryote and 
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ASF 

'*': Position of the pattern. 

As a signature pattern for this family of proteins, a region that contains a highly conserved 
pentapeptide was selected. The pattern is located in gyrB, in parE, and in protein 39 of phage 
T4 topoisomerase. 

Consensus pattern: [LIVMA]-x-E-G-[DN]-S-A-x-[STAG] 

[ 1] Sternglanz R., Curr. Opin. Cell Biol. 1:533-535(1990). 

[ 2] Bjornsti M.A., Curr. Opin. Struct. Biol. 1:99-103(1991). 

[ 3] Sharma A., Mondragon A., Curr. Opin. Struct. Biol. 5:39-47(1995). 

[ 4] Roca J., Trends Biochem. Sci. 20:156-160(1995). 

843. (DUF16) 

Protein of unknown function 

The function of this protein is unknown. It appears to only occur in Mycoplasma 
pneumoniae. 

Number of members: 26 

[1] Himmelreich R, Hilbert H, Plagens H, Pirkl E, Li BC, Herrmann R; Medline: 97105885 
Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae." 
Nucleic Acids Res 1996;24:4420-4449. 

844. (DUF21) 

Domain of unknown function 

This transmembrane region has no known function. Many of the sequences in this family are 
annotated as hemolysins, however this is due to a similarity to Swiss:Q54318 that does not 
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contain this domain. This domain is found in the N-terminus of the proteins adjacent to two 
intracellular CBS domains CBS. 

Number of members: 42 

845. (DUF56) 

Integral membrane protein 

The members of this family are putative integral membrane proteins. The function of the 
family is unknown, however the family includes Sec59 from yeast. Sec59 is a dolichol 
kinase EC:2.7.1.108, but it is not clear if the enzymatic activity resides in this region or its N 
terminal region. 

Number of members: 13 

846. (DUF94) 

Domain of unknown function 

The function of this domain is unknown. It is found in both eukaryotes and archaebacteria. 
The alignment contains a completely conserved aspartate residue that may be functionally 
important. The eukaryotic domains contains three conserved cysteines and a histidine that 
might be metal binding, however these are absent in the archaebacterial proteins. 

Number of members: 9 

847. (FF) 



FF domain 
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This domain may be involved in protein-protein interaction [1]. 
Number of members: 42 

[1] Bedford MT, Leder P; Medline: 99322199 The FF domain: a novel motif that often 
accompanies WW domains." Trends Biochem Sci 1999;24:264-265. 

848. (FLO LFY) 
Floricaula / Leafy protein 

This family consists of various plant development proteins which are homologues of 
floricaula (FLO) and Leafy (LFY) proteins which are floral meristem identity proteins. 
Mutations in the sequences of these proteins affect flower and leaf development. 

Number of members: 16 

[1] Hofer J, Turner L, Hellens R, Ambrose M, Matthews P, Michael A, Ellis N; Medline: 
9741 1151 UNIFOLI ATA regulates leaf and flower morphogenesis in pea." Curr Biol 
1997;7:581-587. 

[2] Weigel D, Alvarez J, Smyth DR, Yanofsky MF, Meyerowitz EM; Medline: 92274452 
LEAFY controls floral meristem identity in Arabidopsis." Cell 1992;69:843-859. 

849. (G-patch) 
G-patch domain 

This domain is found in a number of RNA binding proteins, and is also found in proteins that 
contain RNA binding domains. This suggests that this domain may have an RNA binding 
function. This domain has seven highly conserved glycines. 



Number of members: 47 
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[1] Aravind L, Koonin EV; Medline: 10470032 G-patch: a new conserved domain in 
eukaryotic RNA-processing proteins and type D retroviral polyproteins." Trends Biochem 
Sci 1999;24:342-344. 

850. (Gram-ve_j)orins) 

General diffusion Gram-negative porins signature 
Cross-reference(s) PS00576; GRAM_NEG_PORIN 

The outer membrane of Gram-negative bacteria acts as a molecular filter for hydrophilic 
compounds. Proteins, known as porins [1], are responsible for the 'molecular sieve' properties 
of the outer membrane. Porins form large water- filled channels which allows the diffusion of 
hydrophilic molecules into the periplasmic space. Some porins form general diffusion 
channels that allows any solutes up to a certain size (that size is known as the exclusion limit) 
to cross the membrane, while other porins are specific for a solute and contain a binding site 
for that solute inside the pores (these are known as selective porins). As porins are the major 
outer membrane proteins, they also serve as receptor sites for the binding of phages and 
bacteriocins. General diffusion porins generally assemble as trimer in the membrane and the 
transmembrane core of these proteins is composed exclusively of beta strands [2]. It has been 
shown [3] that a number of general porins are evolutionary related, these porins are: 

- Enterobacteria phoE. 

- Enterobacteria ompC. 

- Enterobacteria ompF. 

- Enterobacteria nmpC. 

- Bacteriophage PA-2 LC. 

- Neisseria PLA. 

- Neisseria PI.B. 

As a signature pattern a conserved region was selected, located in the C-terminal part of these 
proteins, which spans two putative transmembrane beta strands. 

Consensus pattern: [LIVMFY]-x(2)-G-x(2)-Y-x-F-x-K-x(2)-[SN]-[STAV]-[LIVMFYW]- V 
[1] Benz R., Bauer K., Eur. J. Biochem. 176:1-19(1988). 
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[2] Jap B.K., Walian P.J., Q. Rev. Biophys. 23:367-403(1990). 

[3] Jeanteur D., Lakey J.H., Pattus F., Mol. Microbiol. 5:2153-2164(1991). 

851. (HlyD) 

HlyD family secretion proteins signature 
Cross-reference(s) PS00543; HLYD^FAMILY 

Gram-negative bacteria produce a number of proteins which are secreted into the growth 
medium by a mechanism that does not require a cleaved N-terminal signal sequence. These 
proteins, while having different functions, require the help of two or more proteins for their 
secretion across the cell envelope. Amongst which a protein belonging to the ABC 
transporters family (see the relevant entry <PDOC00185>) and a protein belonging to a 
family which is currently composed [1 to 5] of the following members: 
Gene Species Protein which is exported 



hlyD Escherichia coli Hemolysin 
appD A.pleuropneumoniae Hemolysin 
lcnD Lactococcus lactis Lactococcin A 
lktD A.actinomycetemcomitans Leukotoxin 

Pasteurella haemolytica 
rtxD A.pleuropneumoniae Toxin-III 

cyaD Bordetella pertussis Calmodulin-sensitive adenylate cyclase- 
hemolysin (cyclolysin) 
cvaA Escherichia coli Colicin V 

prtE Erwinia chrysanthemi Extracellular proteases B and C 
aprE Pseudomonas aeruginosa Alkaline protease 
emrA Escherichia coli Drugs and toxins 
yjcR Escherichia coli Unknown 

These proteins are evolutionary related and consist of from 390 to 480 amino acid residues. 
They seem to be anchored in the inner membrane by a N-terminal transmembrane region. 
Their exact role in the secretion process is not yet known. The C-terminal section of these 
proteins is the best conserved region; a signature pattern from that region was derived. 
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Consensus pattern: [LIVM]-x(2)-G-[LM]-x(3)-[STGAV]-x-[LIVMT]-x-[LIVMT]-[GE]-x- 
[KR]-x-[LIVMFYW](2)-x-[LIVMFYW](3) 

Sequences known to belong to this class detected by the pattern ALL, except for emrA and 
yjcR. 

References: 

[1] Gilson L., Mahanty H.K., Kolter R., EMBO J. 9:3875-3884(1990). 

[2] Letoffe S., Delepelaire P., Wandersman C, EMBO J. 9:1375-1382(1990). 

[3] Stoddard G.W., Petzel J.P., van Belkum M.J., Kok J., McKay L.L., Appl. Environ. 

Microbiol. 58:1952-1961(1992). 

[4] Duong F., Lazdunski A., Cami B., Murgier M., Gene 121:47-54(1992). 
[5] Lewis K., Trends Biochem. Sci. 19:119-123(1994). 

852. (IBR) 

In Between Ring fingers 

The IBR (In Between Ring fingers) domain is found to occur between pairs of ring fingers 
(zf-C3HC4). The function of this domain is unknown. This domain has also been called the 
C6HC domain and DRIL (for double RING finger linked) domain [2]. 
Number of members: 25 

[1] Morett E, Bork P; Medline: 10366851 A novel transactivation domain in parkin.'Trends 
Biochem Sci 1999;24:229-231. 

[2] van der Reijden BA, Erpelinck-Verschueren CA, Lowenberg B, Jansen JH; Medline: 
99349709 TRIADs: a new class of proteins with a novel cysteine-rich signature." Protein 
Sci 1999;8:1557-1561. 



853. (IPPT) 
IPP transferase 
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[1] Durand JM, Bjork GR, Kuwae A, Yoshikawa M, Sasakawa C; Medline: 97440126 The 

modified nucleoside 2-methylthio-N6-isopentenyladenosine in tRNA of Shigella flexneri is 

required for expression of virulence genes." J Bacteriol 1997;179:5777-5782. 

[2] Boguta M, Hunter LA, Shen WC, Gillman EC, Martin NC, Hopper AK; Medline: 

94187700 Subcellular locations of MOD5 proteins: mapping of sequences sufficient for 

targeting to mitochondria and demonstration that mitochondrial and nuclear isoforms 

commingle in the cytosol." Mol Cell Biol 1994;14:2298-2306. 

[3] Gillman EC, Slusher LB, Martin NC, Hopper AK; Medline: 91203856 MOD5 

translation initiation sites determine N6-isopentenyladenosine modification of mitochondrial 

and cytoplasmic tRNA." Mol Cell Biol 1991;11:2382-2390. 

854. (KE2) 

KE2 family protein 

The function of members of this family is unknown, although they have been suggested to 
contain a DNA binding leucine zipper motif [2]. 

Number of members: 9 

[1] Ha H, Abe K, Artzt K; Medline: 92084131 Primary structure of the embryo-expressed 
gene KE2 from the mouse H-2K region." Gene 1991;107:345-346. 

[2] Shang HS, Wong SM, Tan HM, Wu M; Medline: 95129859 YKE2, a yeast nuclear gene 
encoding a protein showing homology to mouse KE2 and containing a putative leucine- 
zipper motif." Gene 1994;151:197-201. 

855. (Lipoprotein_6) 

Prokaryotic membrane lipoprotein lipid attachment site 
Cross-reference(s) PS00013; PROKAR_LIPOPROTEIN 

In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptide, 
which is cleaved by a specific lipoprotein signal peptidase (signal peptidase II). The 
peptidase recognizes a conserved sequence and cuts upstream of a cysteine residue to which 
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a glyceride-fatty acid lipid is attached [1]. Some of the proteins known to undergo such 
processing currently include (for recent listings see [1,2,3]): 

- Major outer membrane lipoprotein (murein-lipoproteins) (gene lpp). 

- Escherichia coli lipoprotein-28 (gene nlpA). 
5 - Escherichia coli lipoprotein-34 (gene nlpB). 

- Escherichia coli lipoprotein nlpC 

- Escherichia coli lipoprotein nlpD. 

- Escherichia coli osmotically inducible lipoprotein B (gene osmB). 

- Escherichia coli osmotically inducible lipoprotein E (gene osmE). 
1 0 - Escherichia coli peptidoglycan-associated lipoprotein (gene pal). 

- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 

- Escherichia coli copper homeostasis protein cutF (or nlpE). 

- Escherichia coli plasmids traT proteins. 

- Escherichia coli Col plasmids lysis proteins. 
15 - A number of Bacillus beta-lactamases. 

- Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 

- Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 

- Borrelia hermsii variable major protein 21 (gene vmp21) and 7 (gene vmp7). 

- Chlamydia trachomatis outer membrane protein 3 (gene omp3), 
2 0 - Fibrobacter succinogenes endoglucanase cel-3. 

- Haemophilus influenzae proteins Pal and Pep. 

- Klebsiella pullulunase (gene pulA). 

- Klebsiella pullulunase secretion protein pulS. 

- Mycoplasma hyorhinis protein p37. 

2 5 - Mycoplasma hyorhinis variant surface antigens A, B, and C (genes vlpABC). 

- Neisseria outer membrane protein H.8. 

- Pseudomonas aeruginosa lipopeptide (gene IppL). 

- Pseudomonas solanacearum endoglucanase egl. 

- Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 

3 0 - Rickettsia 17 Kd antigen. 

- Shigella flexneri invasion plasmid proteins mxiJ and mxiM. 

- Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 

- Treponema pallidium 34 Kd antigen. 

- Treponema pallidium membrane protein A (gene tmpA). 
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- Vibrio harveyi chitobiase (gene chb). 

- Yersinia virulence plasmid protein yscJ. 

- Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper-binding 
protein. This is the first archaebacterial protein known to be modified in such a fashion). 

From the precursor sequences of all these proteins, a consensus pattern and a set of rules 
to identify this type of post-translational modification were derived. 

Consensus pattern: {DERK}(6)-[LIVMFWSTAG](2)-[LIVMFYSTAGCQ]-[AGS]-C [C is 
the lipid attachment site] Additional rules: 1) 

The cysteine must be between positions 15 and 35 of the sequence in consideration. 2) There 
must be at least one Lys or one Arg in the first seven positions of the sequence. Sequences 
known to belong to this class detected by the pattern ALL. Other sequence(s) detected in 
SWISS-PROT some 100 prokaryotic proteins. Some of them are not membrane lipoproteins, 
but at least half of them could be. 

References 

[1] Hayashi S., Wu H.C., J. Bioenerg. Biomembr. 22:451-471(1990). 
[2] Klein P., Somorjai R.L, Lau P.C.K., Protein Eng. 2:15-20(1988). 
[3] von Heijne G., Protein Eng. 2:531-534(1989). 

[4] Mattar S., Scharf B., Kent S.B.H., Rodewald K., Oesterhelt D. ? Engelhard M. J. Biol. 
Chem. 269:14939-14945(1994). 

856. (Lipoprotein_7) 
Adhesin lipoprotein 

This family consists of the p50 and variable adherence-associated antigen (Vaa) adhesins 
from Mycoplasma hominis. M. hominis is a mycoplasma associated with human urogenital 
diseases, pneumonia, and septic arthritis [1]. An adhesin is a cell surface molecule that 
mediates adhesion to other cells or to the surrounding surface or substrate. The Vaa antigen is 
a 50-kDa surface lipoprotein that has four tandem repetitive DNA sequences encoding a 
periodic peptide structure, and is highly immunogenic in the human host [1], p50 is also a 50- 
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kDa lipoprotein, having three repeats A,B and C, that may be a tetramer of 191-kDa in its 
native environment [2]. 

Number of members: 18 

[1] Zhang Q, Wise KS; Medline: 96294788 Molecular basis of size and antigenic variation 
of a Mycoplasma hominis adhesin encoded by divergent vaa genes. Infect Immun 
1996;64:2737-2744. 

[2] Henrich B, Kitzerow A, Feldmann RC, Schaal H, Hadding U; Medline: 97047675 
Repetitive elements of the Mycoplasma hominis adhesin p50 can be differentiated by 
monoclonal antibodies." Infect Immun 1996;64:4027-4034. 

857. (MaoCJike) 
MaoC like domain 

The MaoC protein is found to share similarity with a wide variety of enzymes; estradiol 17 
beta-dehydrogenase 4, peroxisomal hydratase-dehydrogenase-epimerase, fatty acid synthase 
beta subunit. All these enzymes contain other domains. This domain is also present in the 
NodN nodulation protein N. No specific function has been assigned to this region of any of 
these proteins. The maoC gene is part of a operon with maoA which is involved in the 
synthesis of monoamine oxidase [1]. 

Number of members: 46 

[1] Sugino H, Sasaki M, Azakami H, Yamashita M, Murooka Y Medline: 96235221 A 
monoamine-regulated Klebsiella aerogenes operon containing the monoamine oxidase 
structural gene (maoA) and the maoC gene." J Bacteriol 1992;174:2485-2492. 



858. (MSP) 

Manganese-stabilizing protein / photosystem II polypeptide 
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This family consists of the 33 KDa photosystem II polypeptide from the oxygen evolving 
complex (OEC) of plants and cyanobacteria. The protein is also known as the manganese- 
stabilizing protein as it is associated with the manganese complex of the OEC and may 
provide the ligands for the complex [1]. 

Number of members: 17 

[1] Philbrick JB, Zilinskas BA; Medline: 88334494 "Cloning, nucleotide sequence and 
mutational analysis of the gene encoding the Photosystem II manganese-stabilizing 
polypeptide of Synechocystis 6803." Mol Gen Genet 1988;212:418-425. 

859. (NAC) 

[1] Makarova KS, Aravind L, Galperin MY, Grishin NV, Tatusov RL, Wolf YI, Koonin EV; 
Medline: 99342100 Comparative genomics of the Archaea (Euryarchaeota): evolution of 
conserved protein families, the stable core, and the variable shell." Genome Res 1999;9:608- 
628. 

Number of members: 27 



860. (Nop) 

Putative snoRNA binding domain 

This family consists of various Pre RNA processing ribonucleoproteins. The function of the 
aligned region is unknown however it may be a common RNA or snoRNA or Noplp binding 
domain. Nop5p (Nop58p) Swiss:Q12499 from yeast is the protein component of a 
ribonucleoprotein protein required for pre-18s rRNA processing and is suggested to function 
with Noplp in a snoRNA complex [1]. Nop56p Swiss:O00567 and Nop5p interact with 
Noplp and are required for ribosome biogenesis [2]. Prp31p Swiss:p49704 is required for 
pre-mRNA splicing in S. cerevisiae [3]. 



Number of members: 23 



Attorney No. 2750-1237P 

695 

[1] Wu P, Brockenbrough JS, Metcalfe AC, Chen S, Aris JP; Medline: 98298165 Nop5p is a 
small nucleolar ribonucleoprotein component required for pre- 18 S rRNA processing in 
yeast." J Biol Chem 1998;273:16453-16463. 

[2] Gautier T, Berges T, Tollervey D, Hurt E;Medline: 8038777 Nucleolar KKE/D repeat 
proteins Nop56p and Nop58p interact with Noplp and are required for ribosome biogenesis." 
Mol Cell Biol 1997;17:7088-7098. 

[3] Weidenhammer EM, Singh M, Ruiz-Noriega M, Woolford JL Jr; Medline: 96184869 
The PRP31 gene encodes a novel protein required for pre-mRNA splicing in Saccharomyces 
cerevisiae. " Nucleic Acids Res 1996;24:1164-1170. 



861. (Nramp) 

Natural resistance-associated macrophage protein 

The natural resistance-associated macrophage protein (NRAMP) family consists of Nramp 1, 
Nramp2, and yeast proteins Smf 1 and Smf2. The NRAMP family is a novel family of 
functional related proteins defined by a conserved hydrophobic core of ten transmembrane 
domains [5]. This family of membrane proteins are divalent cation transporters. Nrampl is an 
integral membrane protein expressed exclusively in cells of the immune system and is 
recruited to the membrane of a phagosome upon phagocytosis [1]. By controlling divalent 
cation concentrations Nrampl may regulate the interphagosomal replication of bacteria [1]. 
Mutations in Nrampl may genetically predispose an individual to susceptibility to diseases 
including leprosy and tuberculosis conversely this might however provide protection form 
rheumatoid arthritis [1]. Nramp2 is a multiple divalent cation transporter for Fe2+ ? Mn2+ and 
Zn2+ amongst others it is expressed at high levels in the intestine; and is major transferrin- 
independent iron uptake system in mammals [1], The yeast proteins Smfl and Smf2 may also 
transport divalent cations [3]. 

Number of members: 36 

[1] Govoni G, Gros P; Medline: 98383996 Macrophage NRAMP1 and its role in resistance 
to microbial infections." Inflamm Res 1998;47:277-284. 
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[2] Agranoff DD ; Krishna S Medline: 98294035 Metal ion homeostasis and intracellular 
parasitism." Mol Microbiol 1998;28:403-412. 

[3] Pinner E, Gruenheid S, Raymond M, Gros P; Medline: 98030569 Functional 
complementation of the yeast divalent cation transporter family SMF by NRAMP2, a 
member of the mammalian natural resistance- associated macrophage protein family/' J Biol 
Chem 1997;272:28933-28938. 

[4] Cellier M, Belouchi A, Gros P; Medline: 96402487 Resistance to intracellular infections: 

comparative genomic analysis of Nramp." Trends Genet 1996;12:201-204. 

[5] Cellier M ? Prive G, Belouchi A, Kwan T, Rodrigues V, Chia W, Gros P; Medline: 

96036029 Nramp defines a family of membrane proteins." Proc Natl Acad Sci U S A 

1995;92:10089-10093. 

862. (NTP_transf_2) 
Nucleotidyltransferase domain 

Members of this family belong to a large family of nucleotidyltransferases [1]. 
Number of members: 83 

[1] Holm U Sander C; Medline: 96005605 DNA polymerase beta belongs to an ancient 
nucleotidyltransferase superfamily." Trends Biochem Sci 1995;20:345-347. 

863. (Paramyxo_P) 
Paramyxovirus P phosphoprotein 

This family consists of paramyxovirus P phosphoprotein from sendai virus and human and 
bovine parainfluenza viruses. The P protein is an essential part of the viral RNA polymerase 
complex formed form the P and L proteins [1]. The exact role of the P protein in this complex 
in unknown but it is involved in multiple protein-protein interactions and binding the 
polymerase complex to the nucleocapsid or ribonucleoprotein template [1]. It also appears to 
be important for the proper folding of the L protein [1]. The paramyxoviruses have a 
negative sense ssRNA genome [1]. 
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Number of members: 15 

[1] Bowman MC ? Smallwood S, Moyer SA; Medline: 99329169 Dissection of Individual 
Functions of the Sendai Virus Phosphoprotein in Transcription." J Virol 1999;73:6474-6483. 
[2] Matsuoka Y, Curran J, Pelet T, Kolakofsky D, Ray R, Compans RW; Medline: 91237868 
The P gene of human parainfluenza virus type 1 encodes P and C proteins but not a 
cysteine-rich V protein." J Virol 1991;65:3406-3410. 

864. (Patatin) 

This family consists of various patatin glycoproteins from plants. The patatin protein 
accounts for up to 40% of the total soluble protein in potato tubers [2]. Patatin is a storage 
protein but it also has the enzymatic activity of lipid acyl hydrolase, catalysing the cleavage 
of fatty acids from membrane lipids [2]. 

Number of members : 21 

[1] Banfalvi Z, Kostyal Z, Barta E; Medline: 95107249 Solanum brevidens possesses a non- 
sucrose-inducible patatin gene." Mol Gen Genet 1994;245:517-522. 

[2] Mignery GA ? Pikaard CS ? Park WD; Medline: 88226014 Molecular characterization of 
the patatin multigene family of potato." Gene 1988;62:27-44. 

865. (Pentapeptide_2) 
Pentapeptide repeats (8 copies) 

These repeats are found in many mycobacterial proteins. These repeats are most common in 
the PPE family of proteins, where they are found in the MPTR subfamily of PPE proteins. 
The function of these repeats is unknown. The repeat can be approximately described as 
XNXGX, where X can be any amino acid. These repeats are similar to Pentapeptide [1], 
however it is not clear if these two families are structurally related. 
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[1] Bateman A, Murzin A, Teichmann SA; Medline: 98318059 Structure and distribution of 
5 pentapeptide repeats in bacteria." Protein Sci 1998;7:1477-1480, 

[2] Cole ST, Brosch R, Parkhill J, Gamier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, 
Gas S, Barry CE 3rd, Tekaia F, Badcock K, Basham D, Brown D, Chillingworth T, Connor 
R, Davies R, Devlin K, Feltwell T, Gentles S, Hamlin N, Holroyd S, Hornsby T, Jagels K, 
Barrell BG; Medline: 98295987 Deciphering the biology of Mycobacterium tuberculosis 
1 0 from the complete genome sequence." Nature 1998;393:537-544. 

866. (Peptidase_C13) 
Peptidase C13 family 

15 

This family of peptidases is known as the hemoglobinase family because it contains a globin 
degrading enzyme from blood parasites Swiss:P42665. However relatives are found in plants 
and other organisms that have other functions. Members of this family are asparaginyl 
peptidases [1]. 

20 

Number of members: 26 

[1] Chen JM, Dando PM, Rawlings ND, Brown MA, Young NE, Stevens RA, Hewitt E, 
Watts C, Barrett AJ; Medline: 97218252 Cloning, isolation, and characterization of 
2 5 mammalian legumain, an asparaginyl endopeptidase." J Biol Chem 1997;272:8090-8098. 

867. (Pro_dh) 
Proline dehydrogenase 

30 

Number of members: 25 
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[1] Ling M, Allen SW, Wood JM; Medline: 95055736 Sequence analysis identifies the 
proline dehydrogenase and delta 1- pyrroline-5-carboxylate dehydrogenase domains of the 
multifunctional Escherichia coli PutA protein." J Mol Biol 1994;243:950-956. 

868. (PsbP) 

This family consists of the 23 kDa subunit of oxygen evolving system of photosystem II or 
PsbP from various plants (where it is encoded by the nuclear genome) and Cyanobacteria. 
The 23 KDa PsbP protein is required for PSII to be fully operational in vivo, it increases the 
affinity of the water oxidation site for CI- and provides the conditions required for high 
affinity binding of Ca2+ [2]. 

Number of members : 25 

[1] Rova EM, Mc Ewen B, Fredriksson PO, Styring S; Medline: 97067138 Photoactivation 
and photoinhibition are competing in a mutant of Chlamydomonas reinhardtii lacking the 23- 
kDa extrinsic subunit of photosystem II." J Biol Chem 1996;271:28918-28924. 
[2] Kochhar A, Khurana JP, Tyagi AK; Medline: 97191538 Nucleotide sequence of the 
psbP gene encoding precursor of 23-kDa polypeptide of oxygen-evolving complex in 
Arabidopsis thaliana and its expression in the wild-type and a constitutively 
photomorphogenic mutant." DNA Res 1996;3:277-285. 

869. (PUA) 

The PUA domain named after PseudoUridine synthase and Archaeosine transglycosylase, 
was detected in archaeal and eukaryotic pseudouridine synthases, archaeal archaeosine 
synthases, a family of predicted ATPases that may be involved in RNA modification, a 
family of predicted archaeal and bacterial rRNA methylases. Additionally, the PUA domain 
was detected in a family of eukaryotic proteins that also contain a domain homologous to the 
translation initiation factor elFl/SUIl; these proteins may comprise a novel type of 
translation factors. Unexpectedly, the PUA domain was detected also in bacterial and yeast 
glutamate kinases; this is compatible with the demonstrated role of these enzymes in the 
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regulation of the expression of other genes [1], It is predicted that the PUA domain is an 
RNA binding domain. 

Number of members: 48 

[1] Aravind L 7 Koonin EV; Medline: 99193178 Novel predicted RNA-binding domains 
associated with the translation machinery." J Mol Evol 1999;48:291-302. 

870. (RF1) 
eRFl-like proteins 

Members of this family are peptide chain release factors. The eukaryotic Release Factor 1 
proteins (eRFls) are involved in termination of translation. The eRFl protein is functional for 
all stop codons and appears to abolish read-through of these codons. This family also 
includes other proteins for which the precise molecular function is unknown. Many of them 
are from Archaebacteria. These proteins may also be involved in translation termination but 
this awaits experimental verification. Number of members: 25 

[1] Frolova L, Le Goff X ? Rasmussen HH, Cheperegin S, Drugeon G, Kress M, Arman I, 
Haenni AL> Celis JE, Philippe M, et al; Medline: 95082951 A highly conserved eukaryotic 
protein family possessing properties of polypeptide chain release factor" [see comments] 
Nature 1994;372:701-703. 

[2] Drugeon G, Jean-Jean O, Frolova L, Le Goff X, Philippe M, Kisselev L, Haenni AL; 
Medline: 97315314 Eukaryotic release factor 1 (eRFl) abolishes readthrough and competes 
with suppressor tRNAs at all three termination codons in messenger RNA." Nucleic Acids 
Res 1997;25:2254-2258. 

871. (Ribosomal_L14e)Ribosomal protein L14 

This family includes the eukaryotic ribosomal protein L14. 
Number of members: 15 
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872. (Ribosomal_S27) 
Ribosomal protein S27a 

This family of ribosomal proteins consists mainly of the 40S ribosomal protein S27a which is 
5 synthesized as a C-terminal extension of ubiquitin (CEP). The S27a domain compromises the 
C-terminal half of the protein. The synthesis of ribosomal proteins as extensions of ubiquitin 
promotes their incorporation into nascent ribosomes by a transient metabolic stabilization and 
is required for efficient ribosome biogenesis [3]. The ribosomal extension protein S27a 
contains a basic region that is proposed to form a zinc finger; its fusion gene is proposed as a 
1 0 mechanism to maintain a fixed ratio between ubiquitin necessary for degrading proteins and 
ribosomes a source of proteins [2]. 

Number of members: 36 

15 

873. (Spermine_synth) 
Spermine/spermidine synthase 

Spermine and spermidine are polyamines. This family includes spermidine synthase that 

2 0 catalyses the fifth (last) step in the biosynthesis of spermidine from arginine ? and spermine 

synthase. 

Number of members: 39 

25 [1] Mezquita J, Pau M ? Mezquita C; Medline: 97449308 Characterization and expression of 
two chicken cDNAs encoding ubiquitin fused to ribosomal proteins of 52 and 80 amino 
acids." Gene 1997;195:313-319. 

[2] Redman KL, Rechsteiner M; Medline: 89181932 Identification of the long ubiquitin 
extension as ribosomal protein S27a " Nature 1989;338:438-440. 

3 0 [3] Finley D, Bartel B ? Varshavsky A; Medline: 89181925 The tails of ubiquitin precursors 

are ribosomal proteins whose fusion to ubiquitin facilitates ribosome biogenesis." Nature 
1989;338:394-401. 
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874. (Surp) 
Surp module 

[1] Denhez F, Lafyatis R; Medline: 94266805 Conservation of regulated alternative splicing 
5 and identification of functional domains in vertebrate homologs to the Drosophila splicing 
regulator, suppressor-of-white-apricot." J Biol Chem 1994;269:16170-16179. 

This domain is also known as the SWAP domain. SWAP stands for Suppressor-of-White- 
APricot. It has been suggested that these domains may be RNA binding [1]. 

10 

Number of members: 32 

875. (TFIIE) 

1 5 TFIIE alpha subunit 

The general transcription factor TFIIE has an essential role in eukaryotic transcription 
initiation together with RNA polymerase II and other general factors. Human TFIIE consists 
of two subunits TFIIE-alpha Swiss:P29083 and TFIIE-beta Swiss:P29084 and joins the 

2 0 preinitiation complex after RNA polymerase II and TFIIF [1]. This family consists of the 

conserved amino terminal region of eukaryotic TFIIE-alpha [2] and proteins from 
archaebacteria that are presumed to be TFIIE-alpha subunits also Swiss:O29501 [3]. 

Number of members: 12 

25 

[1] Ohkuma Y, Sumimoto H, Hoffmann A, Shimasaki S, Horikoshi M, Roeder RG; Medline: 
92065982 Structural motifs and potential sigma homologies in the large subunit of human 
general transcription factor TFIIE." Nature 1991;354:398-401. 

[2] Ohkuma Y ? Hashimoto S, Roeder RG, Horikoshi M; Medline: 93087200 Identification of 

3 0 two large subdomains in TFIIE-alpha on the basis of homology between Xenopus and human 

sequences. Nucleic Acids Res 1992;20:5838-5838. 

[3] Klenk HP, Clayton RA, Tomb JF ? White O, Nelson KE ? Ketchum KA, Dodson RJ, Gwinn 
M, Hickey EK, Peterson JD> Richardson DL, Kerlavage AR ? Graham DE, Kyrpides NC, 
Fleischmann RD, Quackenbush J, Lee NH, Sutton GG, Gill S, Kirkness EF, Dougherty BA, 
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McKenney K, Adams MD, Loftus B, Venter JC, et al; Medline: 98049343 The complete 
genome sequence of the hyperthermophilic, sulphate- reducing archaeon Archaeoglobus 
fulgidus." Nature 1997;390:364-370. 

5 

876. (Transglut_core) 

Cross-reference(s) PS00547; TRANSGLUTAMINASES 

Transglutaminases (EC 2.3.2.13) (TGase) [1,2] are calcium-dependent enzymes that catalyze 
the cross-linking of proteins by promoting the formation of isopeptide bonds between the 
gamma-carboxyl group of a glutamine in one polypeptide chain and the epsilon-amino group 
of a lysine in a second polypeptide chain. TGases also catalyze the conjugation of polyamines 
to proteins. The best known transglutaminase is blood coagulation factor XIII, a plasma 
tetrameric protein composed of two catalytic A subunits and two non-catalytic B subunits. 
Factor XIII is responsible for cross-linking fibrin chains, thus stabilizing the fibrin clot. Other 
forms of transglutaminases are widely distributed in various organs, tissues and body fluids. 
Sequence data is available for the following forms of TGase: 

- Transglutaminase K (Tgase K), a membrane-bound enzyme found in mammalian epidermis 
and important for the formation of the cornified cell envelope (gene TGM1). 

- Tissue transglutaminase (TGase C), a monomeric ubiquitous enzyme located in the 
cytoplasm (gene TGM2). 

- Transglutaminase 3, responsible for the later stages of cell envelope formation in the 
epidermis and the hair follicle (gene TGM3). 

- Transglutaminase 4 (gene TGM4). 

A conserved cysteine is known to be involved in the catalytic mechanism of TGases. The 
erythrocyte membrane band 4.2 protein, which probably plays an important role in regulating 
the shape of erythrocytes and their mechanical properties, is evolutionary related to TGases. 
3 0 However the active site cysteine is substituted by an alanine and the 4.2 protein does not 
show TGase activity. 
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Consensus pattern:[GT]-Q-[CA]-W-V-x-[SAH^ 

[LV]-G [The first C is the active site residue] Sequences known to belong to this class 
detected by the patternALL. Other sequence(s) detected in SWISS-PROTNONE. 

5 [ 1] Ichinose A., Bottenus R.K, Davie E.W. J. Biol. Chem. 265:13411-13414(1990). 
[ 2] Greenberg C.S., Birckbichler P.J., Rice R.EL FASEB J. 5:3071-3077(1991). 

877. (TruB_N) 

10 TruB family pseudouridylate synthase (N terminal domain) 

Members of this family are involved in modifying bases in RNA molecules. They carry out 
the conversion of uracil bases to pseudouridine. This family includes TruB, a pseudouridylate 
synthase that specifically converts uracil 55 to pseudouridine in most tRNAs. This family 
1 5 also includes Cbf5p that modifies rRNA [2]. 

Number of members: 33 

[1] Nurse K, Wrzesinski J, Bakin A, Lane BG, Ofengand J; Medline: 96079944 Purification, 

2 0 cloning, and properties of the tRNA psi 55 synthase from Escherichia coli." RNA 

1995;1:102-112. 

[2] Lafontaine DLJ ? Bousquet-Antonelli C, Henry Y, Caizergues-Ferrer M, Tollervey D; 
Medline: 98139521 The box H + ACA snoRNAs carry Cbf5p 9 the putative rRNA 
pseudouridine synthase." Genes Dev 1998;12:527-537. 

25 

878. (UDPGP) 

UTP-glucose- 1 -phosphate uridylyltransf erase 

3 0 This family consists of UTP-glucose- 1 -phosphate uridylyltransferases, EC:2.7.7.9. Also 

known as UDP-glucose pyrophosphorylase (UDPGP) and Glucose- 1-phosphate 
uridylyl transferase. UTP-glucose- 1-phosphate uridylyltransferase catalyses the 
interconversion of MgUTP + glucose-l-phosphate and UDP-glucose + MgPPi [1]. UDP- 
glucose is an important intermediate in mammalian carbohydrate interconversion involved in 
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various metabolic roles depending on tissue type [1]. In Dictyostelium (slime mold) mutants 
in this enzyme abort the development cycle [2]. Also within the family is UDP-N- 
acetylglucosamine Swiss:Q16222 or AGX1 [3] and two hypothetical proteins from Borrelia 
burgdorferi the lyme disease spirochaete Swiss:051893 and Swiss:O51036. 

Number of members: 18 

[1] Duggleby RG, Chao YC, Huang JG, Peng HL, Chang HY; Medline: 96202932 Sequence 
differences between human muscle and liver cDNAs for UDPglucose pyrophosphorylase and 
kinetic properties of the recombinant enzymes expressed in Escherichia coli." Eur J Biochem 
1996;235:173-179. 

[2] Ragheb JA, Dottin RP; Medline: 87231075 Structure and sequence of a UDP glucose 
pyrophosphorylase gene of Dictyostelium discoideum." Nucleic Acids Res 1987;15:3891- 
3906. 

[3] Mio T, Yabe T, Arisawa M, Yamada-Okabe H; Medline: 98269105 The eukaryotic 
UDP-N-acetylglucosamine pyrophosphatases. Gene cloning, protein expression, and 
catalytic mechanism. J Biol Chem 1998;273:14392-14397. 

879. (UPF004) 

Uncharacterized protein family UPF0044 signature 
Cross-reference(s) PS01301; UPF0044 

The following uncharacterized proteins have been shown [1] to be highlysimilar: 

- Bacillus subtilis hypothetical protein yqel. 

- Escherichia coli hypothetical protein yhbY and HI1333, the corresponding Haemophilus 
influenzae protein. 

- Methanococcus jannaschii hypothetical protein MJ0652. 

These are small proteins of 10 to 15 Kd. They can be picked up in the database 
by the following pattern. This pattern is located in the N-terminal part of 
these proteins. 
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Consensus pattern: L-[ST]-x(3)-K-x(3)-[KR]-[SGA]-x-[GA]-H-x-L-x-P-[LIV]-x(2)- [LIV]- 
[GA]-x(2)-G Sequences known to belong to this class detected by the patternALL. Other 
sequence(s) detected in SWISS-PROTNONE. 

880. (zf-A20) 
A20-like zinc finger 

A20- (an inhibitor of cell death)-like zinc fingers. The zinc 
finger mediates self-association in A20. These fingers also 
mediate IL-l-induced NF-kappa B activation. 

Number of members: 22 

[1] Heyninck K, Beyaert R; Medline: 99126071 The cytokine-inducible zinc finger protein 
A20 inhibits IL-l-induced NF- kappaB activation at the level of TRAF6. FEBS Lett 
1999;442:147-150. 

[2] De Valck D, Heyninck K, Van Criekinge W, Contreras R,Beyaert R, Fiers W; Medline: 
96390831 A20, an inhibitor of cell death, self-associates by its 
zinc finger domain." FEBS Lett 1996;384:61-64. 

[3] Song HY, Rothe M, Goeddel DV; Medline: 96270609 The tumor necrosis factor- 
inducible zinc finger protein A20 interacts with TRAF1/TRAF2 and inhibits NF-kappaB 
activation. Proc Natl Acad Sci U S A 1996;93:6721-6725. 

[4] Opipari AW Jr, Boguski MS, Dixit VM; Medline: 90368626 The A20 cDNA induced by 
tumor necrosis factor alpha encodes a novel type of zinc finger protein." J Biol Chem 
1990;265:14705-14708. 

881. (zf-PARP) 

Poly(ADP-ribose) polymerase zinc finger domain 

Cross-reference(s) PS00347; PARP_ZN_FINGER_1 PS50064; PARP_ZN_FINGER_2 

Poly(ADP-ribose) polymerase (EC 2.4.2.30) (PARP) [1,2] is a eukaryotic enzyme that 
catalyzes the covalent attachment of ADP-ribose units from NAD(+) to various nuclear 
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acceptor proteins. This post-translational modification of nuclear proteins is dependent 
on DNA. It appears to be involved in the regulation of various important cellular 
processes such as differentiation, proliferation and tumor transformation as well as in the 
regulation of the molecular events involved in the recovery of the cell from DNA damage. 
Structurally, PARP, about 1000 amino-acids residues long, consists of three distinct 
domains: an N-terminal zinc- dependent DNA-binding domain, a central automodification 
domain and a C-terminal NAD-binding domain. The DNA-binding region contains a pair of 
zinc finger domains which have been shown to bind DNA in a zinc-dependent manner. The 
zinc finger domains of PARP seem to bind specifically to single-stranded DNA. DNA ligase 
III [3] contains, in its N-terminal section, a single copy of a zinc finger highly similar to 
those of PARP. 

Consensus pattern: C-[KR]-x-C-x(3)-I-x-K-x(3)-[RG]-x(16,18)-W-[FYH]-H-x(2)-C [The 
three Cs and the H are zinc ligands] Sequences known to belong to this class detected by the 
patternALL. Other sequence(s) detected in SWISS-PROTNONE. Sequences known to 
belong to this class detected by the profile ALL. Other sequence(s) detected in SWISS- 
PROTNONE. 

Note: This documentation entry is linked to both signature patterns and a profile. As the 
profile is much more sensitive than the patterns, you should use it if you have access to the 
necessary software tools to do so. 

[ 1] Althaus F.R., Richter CR. Mol. Biol. Biochem. Biophys. 37:1-126(1987). 

[ 2] de Murcia G., Menissier de Murcia J. Trends Biochem. Sci. 19:172-176(1994). 

[ 3] Wei Y.-F., Robins P., Carter K., Caldecott K., Pappin D J.C., Yu G.-L., Wang R.-P., 

Shell B.K., Nash R.A., Schar P., Barnes D.E., Haseltine W.A., Lindahl T. Mol. Cell. Biol. 

15:3206-3216(1995). 

882. Adenylylsulfate kinase (APS Jdnase) 

Enzyme that catalyses the phosphorylation of adenylylsulfate to S'-phosphoadenylylsulfate. 
This domain contains an ATP binding P-loop motif. Number of members: 34 
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[1] MacRae IJ, Rose AB, Segel IH; Medline: 99003196 Adenosine 5 ! -phosphosulfate kinase 
from Penicillium chrysogenum. site- directed mutagenesis at putative phosphoryl-accepting 
and ATP P-loop residues. J Biol Chem 1998;273:28583-28589. 

883. DNA polymerase family B signature DNA_POLYMERASE_B (DNA_pol_B) 

Replicative DNA polymerases (EC 2.7.7.7) are the key enzymes catalyzing the 
accurate replication of DNA. They require either a small RNA molecule or a protein as a 
primer for the de novo synthesis of a DNA chain. On the basis of sequence similarity, a 
number of DNA polymerases have been grouped [1 to 7] under the designation of DNA 
polymerase family B. These are: 

- Higher eukaryotes polymerases alpha. 

- Higher eukaryotes polymerases delta. 

- Yeast polymerase I/alpha (gene POL1), polymerase II/epsilon (gene POL2), polymerase 
Ill/delta (gene POL3) and polymerase REV3. 

- Escherichia coli polymerase II (gene dinA or polB). 

- Archaebacterial polymerases. 

- Polymerases of viruses from the herpesviridae family. 

- Polymerases from Adenoviruses. 

- Polymerases from Baculoviruses. 

- Polymerases from Chlorella viruses. 

- Polymerases from Poxviruses. 

- Bacteriophage T4 polymerase. 

- Podoviridae bacteriophages Phi-29, M2 and PZA polymerase. 

- Tectiviridae bacteriophage PRD1 polymerase. 

- Polymerases encoded on mitochondrial linear DNA plasmids in various fungi and plants 
(Kluyveromyces lactis pGKLl and pGKL2, Agaricus bitorquis pEM, Ascobolus immersus 
pAI2, Claviceps purpurea pCLKl, Neurospora Kalilo and Maranhar, maize S-l, etc). 

Six regions of similarity (numbered from I to VI) are found in all or a subset of the above 
polymerases. The most conserved region (I) includes a conserved tetrapeptide with two 
aspartate residues. Its function is not yet known. However, it has been suggested [3] that it 
may be involved in binding a magnesium ion. This conserved region was selected as a 
signature for this family of DNA polymerases. 
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Consensus pattern [YA]-[GLIVMSTAC]-D-T-D-[SG]-[LIVMFTC]-x-[LIVMSTAC] 
Sequences known to belong to this class detected by the patternALL, except for yeast 
polymerase Il/epsilon, Agaricus bitorquis pEM and Sulfolobus solfataricus polymerase II. 

[ 1] Jung G., Leavitt M.C., Hsieh J.-C, Ito J. Proc. Natl. Acad. Sci. U.S.A. 84:8287- 
8291(1987). 

[ 2] Bernad A., Zaballos A., Salas M. ? Blanco L. EMBO J. 6:4219-4225(1987). 

[ 3] Argos P. Nucleic Acids Res. 16:9909-9916(1988). 

[ 4] Wang T.S.-F., Wong S.W, Korn D. FASEB J. 3:14-21(1989). 

[ 5] Delarue M., Poch O., Todro N., Moras D., Argos P. Protein Eng. 3:461-467(1990). 

[ 6] Ito J., Braithwaite D.K. Nucleic Acids Res. 19:4045-4057(1991). 

[ 7] Braithwaite D.K., Ito J. Nucleic Acids Res. 21:787-802(1993). 

884. DNA polymerase family X signature - DNA_POLYMERASE_X (DNA_j>olymeraseX) 

DNA polymerases (EC 2.7.7.7) can be classified, on the basis of sequence similarity [1] ? into 
at least four different groups: A, B, C and X. DNA polymerases that belong to family X are 
listed below [2]: 

- Vertebrate polymerase beta, involved in DNA repair. 

- Yeast polymerase IV (POL4) [3], an enzyme with similar characteristics to that of the 
mammalian polymerase beta. 

- Terminal deoxynucleotidyltransferase (TdT) (EC 2.7.7.31). TdT catalyzes the elongation of 
polydeoxynucleotide chains by terminal addition. One of the functions of this enzyme is the 
addition of nucleotides at the junction of rearranged Ig heavy chain and T cell receptor gene 
segments during the maturation of B and T cells. 

- African Swine Fever virus protein 0174L [4]. 

- Fission yeast hypothetical protein SpAC2F7.06c. 

These enzymes are small (about 40 Kd) compared with other polymerases and their reaction 
mechanism operates via a distributive mode, i.e. they dissociate from the template-primer 
after addition of each nucleotide. 



Attorney No. 2750-1237P 

710 

As a signature pattern for this family of DNA polymerases, a highly conserved region that 
contains a conserved arginine and two conserved aspartic acid residues were selected. The 
latter together with the arginine have been shown [5] to be involved in primer binding in 
polymerase beta. 

Consensus pattern G-[SG]-[LFY]-x-R-[GE]-x(3)-[SGCL]-x-D-[LIVM]-D- [LIVMFY](3)- 
x(2)-[SAP] Sequences known to belong to this class detected by the patternALL. 

[ 1] Ito L, Braithwaite D.K. Nucleic Acids Res. 19:4045-4057(1991). 

[ 2] Matsukage A., Nishikawa K., Ooi T. ? Seto Y., Yamaguchi M. J. Biol. Chem. 262:8960- 

8962(1987). 

[ 3] Prasad R., Widen S.G., Singhal R.K., Watkins J., Prakash L. ? Wilson S.H. Nucleic Acids 
Res. 21:5301-5307(1993). 

[ 4] Yanez R.J., Rodriguez J.M., Nogal M.L., Yuste L. ? Enriquez C. ? Rodriguez J.F., Vinuela 
E. Virology 208:249-278(1995). 

[ 5] Date T., Yamamoto S. ? Tanihara K., Nishimoto Y., Matsukage A. Biochemistry 30:5286- 
5292(1991). 

885. DUF14 - Domain of unknown function 

This domain is found in glutamate synthase, tungsten formylmethanofuran dehydrogenase 
subunit c (FwdC) and molybdenum formylmethanofuran dehydrogenase subunit c (FmdC). 
It has no known function. Number of members: 52 

[1] Hochheimer A, Hedderich R, Thauer RK; Medline: 99035764. The formylmethanofuran 
dehydrogenase isoenzymes in Methanobacterium wolfei and Methanobacterium 
thermoautotrophicum: induction of the molybdenum isoenzyme by molybdate and 
constitutive synthesis of the tungsten isoenzyme." Arch Microbiol 1998;170:389-393. 

886. DUF18-Domain of unknown function 

This domain of unknown function is found in several C. elegans proteins. The domain is 120 
amino acids long and rich in cysteine residues. There are 16 conserved cysteine positions in 
the domain. Number of members: 34 



887, DUF27-Domain of unknown function 
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This domain is found in a number of otherwise unrelated proteins. This domain is found at 
the C-terminus of the macro-H2A histone protein Swiss:Q02874. This domain is found in 
the non-structural proteins of several types of ssRNA viruses such as NSP2 from alphaviruses 
Swiss:P03317. This domain is also found on its own in a family of proteins from bacteria 
Swiss:P75918, archaebacteria Swiss:059182 and eukaryotes Swiss:Q17432, suggesting that 
it is involved in an important and ubiquitous cellular process. Number of members: 66 

888. DUF37-Domain of unknown function 

This domain is found in short (70 amino acid) hypothetical proteins from various bacteria. 
The domain contains three conserved cysteine residues. Swiss:Q44066 from Aeromonas 
hydrophila has been found to have hemolytic activity (unpublished). Number of members: 
19 

889. EGF-like domain signatures. (EGF-like) 

A sequence of about thirty to forty amino-acid residues long found in the sequence of 
epidermal growth factor (EGF) has been shown [1 to 6] to be present, in a more or less 
conserved form, in a large number of other, mostly animal proteins. The proteins currently 
known to contain one or more copies of an EGF-like pattern are listed below. 

- Adipocyte differentiation inhibitor (gene PREF-1) from mouse (6 copies). 

- Agrin, a basal lamina protein that causes the aggregation of acetylcholine receptors on 
cultured muscle fibers (4 copies). 

- Amphiregulin, a growth factor (1 copy). 

- Betacellulin, a growth factor (1 copy). 

- Blastula proteins BP10 and Span from sea urchin which are thought to be involved in 
pattern formation (1 copy). 

- BM86, a glycoprotein antigen of cattle tick (7 copies). 

- Bone morphogenic protein 1 (BMP-1), a protein which induces cartilage and bone 
formation and which expresses metalloendopeptidase activity (1-2 copies). Homologous 
proteins are found in sea urchin - suBMP (1 copy) - and in Drosophila - the dorsal-ventral 
patterning protein tolloid (2 copies). 

- Caenorhabditis elegans developmental proteins lin-12 (13 copies) and glp-1 (10 copies). 

- Caenorhabditis elegans APX-1 protein, a patterning protein (4.5 copies). 

- Calcium-dependent serine proteinase (CASP) which degrades the extracellular matrix 
proteins type I and IV collagen and fibronectin (1 copy). 
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- Cartilage matrix protein CMP (1 copy). 

- Cartilage oligomeric matrix protein COMP (4 copies). 

- Cell surface antigen 114/A10 (3 copies). 

- Cell surface glycoprotein complex transmembrane subunit ASGP-2 from rat (2 copies). 

- Coagulation associated proteins C, Z (2 copies) and S (4 copies). 

- Coagulation factors VII, IX, X and XII (2 copies). 

- Complement Clr components (1 copy). 

- Complement Cls components (1 copy). 

- Complement-activating component of Ra-reactive factor (RARF) (1 copy). 

- Complement components C6, C7, C8 alpha and beta chains, and C9 (1 copy). 

- Crumbs, an epithelial development protein from Drosophila (29 copies). 

- Epidermal growth factor precursor (7-9 copies). 

- Exogastrula-inducing peptides A, C, D and X from sea urchin (1 copy). 

- Fat protein, a Drosophila cadherin-related tumor suppressor (5 copies). 

- Fetal antigen 1, a probable neuroendocrine differentiation protein, which is derived from 
the delta-like protein (DLK) (6 copies). 

- Fibrillin 1 (47 copies) and fibrillin 2 (14 copies). 

- Fibropellins LA (21 copies), IB (13 copies), IC (8 copies), II (4 copies) and III (8 copies) 
from the apical lamina - a component of the extracellular matrix - of sea urchin. 

- Fibulin-1 and -2, two extracellular matrix proteins (9-11 copies). 

- Giant-lens protein (protein Argos), which regulates cell determination and axon guidance in 
the Drosophila eye (1 copy). 

- Growth factor-related proteins from various poxviruses (1 copy). 

- Gurken protein, a Drosophila developmental protein (1 copy). 

- Heparin-binding EGF-like growth factor (HB-EGF), transforming growth factor alpha 
(TGF-alpha), growth factors Lin- 3 and Spitz (1 copy); the precursors are membrane proteins, 
the mature form is located extracellular. 

- Hepatocyte growth factor (HGF) activator (EC 3.4.21.-) (2 copies). 

- LDL and VLDL receptors, which bind and transport low-density lipoproteins and very low- 
density lipoproteins (3 copies). 

- LDL receptor-related protein (LRP), which may act as a receptor for endocytosis of 
extracellular ligands (22 copies). 

- Leucocyte antigen CD97 (3 copies), cell surface glycoprotein EMR1 (6 copies) and cell 
surface glycoprotein F4/80 (7 copies). 
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- Limulus clotting factor C, which is involved in hemostasis and host defense mechanisms in 
Japanese horseshoe crab (1 copy). 

- Meprin A alpha subunit, a mammalian membrane-bound endopeptidase (1 copy). 

- Milk fat globule-EGF factor 8 (MFG-E8) from mouse (2 copies). 

- Neuregulin GGF-I and GGF-II, two human glial growth factors (1 copy). 

- Neurexins from mammals (3 copies). 

- Neurogenic proteins Notch, Xotch and the human homolog Tan-1 (36 copies), Delta (9 
copies) and the similar differentiation proteins Lag-2 from Caenorhabditis elegans (2 copies), 
Serrate (14 copies) and Slit (7 copies) from Drosophila. 

- Nidogen (also called entactin), a basement membrane protein from chordates (2-6 copies). 

- Ookinete surface proteins (24 Kd, 25 Kd, 28 Kd) from Plasmodium (4 copies). 

- Pancreatic secretory granule membrane major glycoprotein GP2 (1 copy). 

- Perforin, which lyses non-specifically a variety of target cells (1 copy). 

- Proteoglycans aggrecan (1 copy), versican (2 copies), perlecan (at least 2 copies), brevican 
(1 copy) and chondroitin sulfate proteoglycan (gene PG-M) (2 copies). 

- Prostaglandin G/H synthase 1 and 2 (EC 1.14.99.1) (1 copy), which is found in the 
endoplasmatic reticulum. 

- SI -5, a human extracellular protein whose ultimate activity is probably modulated by the 
environment (5 copies). 

- Schwannoma-derived growth factor (SDGF), an autocrine growth factor as well as a 
mitogen for different target cells (1 copy). 

- Selectins. Cell adhesion proteins such as ELAM-1 (E-selectin), GMP-140 (P-selectin), or 
the lymph-node homing receptor (L-selectin) (1 copy). 

- Serine/threonine-protein kinase homolog (gene Pro25) from Arabidopsis thaliana, which 
may be involved in assembly or regulation of light-harvesting chlorophyll A/B protein (2 
copies). 

- Sperm-egg fusion proteins PH-30 alpha and beta from guinea pig (1 copy). 

- Stromal cell derived protein-1 (SCP-1) from mouse (6 copies). 

- TDGF-1, human teratocarcinoma-derived growth factor 1 (1 copy). 

- Tenascin (or neuronectin), an extracellular matrix protein from mammals (14.5 copies), 
chicken (TEN- A) (13.5 copies) and the related proteins human tenascin-X (18 copies) and 
tenascin-like proteins TEN-A and TEN-M from Drosophila (8 copies). 

- Thrombomodulin (fetomodulin), which together with thrombin activates protein C (6 
copies). 



Attorney No. 2750-1237P 

714 

- Thrombospondin 1, 2 (3 copies), 3 and 4 (4 copies), adhesive glycoproteins that mediate 
cell-to-cell and cell-to-matrix interactions. 

- Thyroid peroxidase 1 and 2 (EC 1.11.1.8) from human (1 copy). 

- Transforming growth factor beta-1 binding protein (TGF-B1-BP) (16 or 18 copies). 

- Tyrosine-protein kinase receptors Tek and Tie (EC 2.7.1.112) (3 copies). 

- Urokinase-type plasminogen activator (EC 3.4.21.73) (UPA) and tissue plasminogen 
activator (EC 3.4.21.68) (TPA) (1 copy). 

- Uromodulin (Tamm-horsfall urinary glycoprotein) (THP) (3 copies). 

- Vitamin K-dependent anticoagulants protein C (2 copies) and protein S (4 copies) and the 
similar protein Z, a single-chain plasma glycoprotein of unknown function (2 copies). 

- 63 Kd sperm flagellar membrane protein from sea urchin (3 copies). 

- 93 Kd protein (gene nel) from chicken (5 copies). 

- Hypothetical 337.6 Kd protein T20G5.3 from Caenorhabditis elegans (44 copies). 

The functional significance of EOF domains in what appear to be unrelated proteins is not yet 
clear. However, a common feature is that these repeats are found in the extracellular domain 
of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandin 
G/H synthase). The EGF domain includes six cysteine residues which have been shown (in 
EGF) to be involved in disulfide bonds. The main structure is a two-stranded beta-sheet 
followed by a loop to a C-terminal short two-stranded sheet. Subdomains between the 
conserved cysteines strongly vary in length as shown in the following schematic 
representation of the EGF-like domain: 

+ + + + i i i 

| x(4)-C-x(0,48)-C-x(3,12)-C-x(l,70)-C-x(l,6)-C-x(2)-G-a-x(0,21)-G-x(2)-C-x | 

+ + 

'C: conserved cysteine involved in a disulfide bond. 

'G ! : often conserved glycine 

'a': often conserved aromatic amino acid 

'* f : position of both patterns. 

V: any residue 
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The region between the 5th and 6th cysteine contains two conserved glycines of which at 
least one is present in most EGF-like domains. Two patterns were created for this domain, 
each including one of these C-terminal conserved glycine residues. 

Consensus pattern: C-x-C-x(5)-G-x(2)-C [The 3 C's are involved in disulfide bonds] 
Sequences known to belong to this class detected by the pattern A majority, but not those that 
have very long or very short regions between the last 3 conserved cysteines of their EGF-like 
domain(s). Other sequence(s) detected in SWISS-PROT87 proteins, of which 27 can be 
considered as possible candidates. 

Consensus pattern: C-x-C-x(2)-[GP]-[FYW]-x(4,8)-C [The three C's are involved in disulfide 
bonds] Sequences known to belong to this class detected by the patternA majority, but not 
those that have very long or very short regions between the last 3 conserved cysteines of their 
EGF-like domain(s). Other sequence(s) detected in SWISS-PROT83 proteins, of which 49 
can be considered as possible candidates. Note The beta chain of the integrin family of 
proteins contains 2 cysteine- rich repeats which were said to be dissimilar with the EGF 
pattern [7]. 

Note Laminin EGF-like repeats (see <PDOC00961>) are longer than the average EGF 
module and contain a further disulfide bond C-terminal of the EGF-like region. Perlecan and 
agrin contain both EGF-like domains and laminin-type EGF-like domains. Note the pattern 
do not detect all of the repeats of proteins with multiple EGF-like repeats. Note see 
<PDOC00913> for an entry describing specifically the subset of EGF- like domains that bind 
calcium. 

[ 1] Davis C.G. New Biol. 2:410-419(1990). 

[ 2] Blomquist M.C., Hunt L.T., Barker W.C. Proc. Natl. Acad. Sci. U.S.A. 81:7363- 
7367(1984). 

[ 3] Barker W.C, Johnson G.C., Hunt L.T., George D.G. Protein Nucl. Acid Enz. 29:54- 
68(1986). 

[ 4] Doolittle R.F., Feng D.F., Johnson M.S. Nature 307:558-560(1984). 

[ 5] Appella E., Weber IX, Blasi F. FEBS Lett. 231:1-4(1988). 

[ 6] Campbell I.D., Bork P. Curr. Opin. Struct. Biol. 3:385-392(1993). 
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[ 7] Tamkun J.W., DeSimone D.W., Fonda D., Patel R.S., Buck C, Horwitz A.F., Hynes 
R.O. Cell 46:271-282(1986). 

890. Haml family (Hamlpjike) 

This family consists of the HAM1 protein Swiss:P47119 and hypothetical archaeal bacterial 
and C. elegans proteins. HAM1 controls 6-N-hydroxylaminopurine (HAP) sensitivity and 
mutagenesis in S. cerevisiae Swiss:P47119 [1]. The HAM1 protein protects the cell from 
HAP, either on the level of deoxynucleoside triphosphate or the DNA level by a yet 
unidentified set of reactions [1], Number of members: 19 

[1] Noskov VN, Staak K, Shcherbakova PV, Kozmin SG, Negishi K, Ono BC, Hayatsu H, 
Pavlov YI; Medline: 96381244 HAM1, the gene controlling 6-N-hydroxylaminopurine 
sensitivity and mutagenesis in the yeast Saccharomyces cerevisiae." Yeast 1996;12:17-29. 

891. (HC03_cotransp) 

Anion exchange is a cellular transport function which contributes to the regulation of cell pH 
and volume. Anion exchangers are a family of functionally related proteins that contributes to 
these properties by maintaining the intracellular level of the two principal anions: chloride 
and HC03-. The best characterized anion exchanger is the band 3 protein [1], which is an 
erythrocyte anion exchange membrane glycoprotein. Band 3 is a protein of about 900 amino 
acids which consists of a cytoplasmic N-terminal domain of about 400 residues and an 
hydrophobic C-terminal section of about 500 residues that contains at least ten 
transmembrane regions. The cytoplasmic domain provides binding sites for cytoskeletal 
proteins, while the integral membrane domain is responsible for anion transport. Band 3 
protein is specific to erythroid cells, at least two other proteins [2] structurally and 
functionally related to band 3, are found in nonerythroid tissues: 

- AE2 (or B3 related protein; B3RP), a protein of 1200 residues, which seems to be present 
in a variety of cell types including lymphoid, kidney, and choroid plexus. 

- AE3, a protein of 1200 residues, which is specific to neurons. 

Structurally AE2 and AE3 are very similar to band 3, the main difference being an extension 
of some 300 residues of the N-terminal domain in AE2 and AE3. 
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Two signature patterns were developed for these proteins. The first pattern is based on a 
conserved stretch of sequence that contains four clustered positive charged residues and 
which is located at the C-terminal extremity of the cytoplasmic domain, just before the first 
transmembrane segment from the integral domain. The second pattern is based on the 
perfectly conserved sequence of the fifth transmembrane segment; this segment contains a 
lysine, which is the covalent binding site for the isothiocyanate group of DIDS, an inhibitor 
of anion exchange. 

Consensus pattern F-G-G-[LIVM](2)-[KR]-D-[LIVM]-[RK]-R-R-Y Sequences known to 
belong to this class detected by the pattern ALL. 

Consensus pattern [FI]-L-I-S-L-I-F-I-Y-E-T-F-x-K-L Sequences known to belong to this 
class detected by the pattern ALL. 

[ 1] Jay D., Cantley L. Annu. Rev. Biochem. 55:511-538(1986). 
[ 2] Reithmeier R.A.F. Curr. Opin. Struct. Biol. 3:515-523(1993). 

892. ATP phosphoribosyltransferase signature (HisG) 

ATP phosphoribosyltransferase (EC 2.4.2.17) is the enzyme that catalyzes the first step in the 
biosynthesis of histidine in bacteria, fungi and plants. It is a protein of about 23 to 32 Kd. As 
a signature pattern a region located in the C-terminal part of this enzyme was selected. 

Consensus pattern E-x(5)-G-x-[SAG]-x(2)-[IV]-x-D-[LIV]-x(2)-[ST]-G-x-T-[LM] 
Sequences known to belong to this class detected by the pattern ALL. 

893. HNH endonuclease (FfNH) 
Number of members: 56 

[1] Shub DA, Goodrich-Blair H, Eddy SR; Medline: 95117127 Amino acid sequence motif 
of group I intron endonucleases is conserved in open reading frames of group II introns." 
Trends Biochem Sci 1994;19:402-404. 

[2] Dalgaard JZ, Klar AJ, Moser MJ, Holley WR, Chatterjee A, Mian IS; Medline: 98026854 
Statistical modeling and analysis of the LAGLIDADG family of site- specific endonucleases 
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and identification of an intein that encodes a site-specific endonuclease of the HNH family." 
Nucleic Acids Res 1997;25:4626-4638. 

[3] Gorbalenya AE; Medline: 95004046 Self-splicing group I and group II introns encode 
homologous (putative) DNA endonucleases of a new family." Protein Sci 1994;3:1117-1120. 

894. NEUROHYPOPHYSJHORM (hormone5) 

Oxytocin (or ocytocin) and vasopressin [1] are small (nine amino acid residues), structurally 
and functionally related neurohypophysial peptide hormones. Oxytocin causes contraction of 
the smooth muscle of the uterus and of the mammary gland while vasopressin has a direct 
antidiuretic action on the kidney and also causes vasoconstriction of the peripheral vessels. 
Like the majority of active peptides, both hormones are synthesized as larger protein 
precursors that are enzymatically converted to their mature forms. Peptides belonging to this 
family are also found in birds, fish, reptiles and amphibians (mesotocin, isotocin, valitocin, 
glumitocin, aspargtocin, vasotocin, seritocin, asvatocin, phasvatocin), in worms (annetocin), 
octopi (cephalotocin), locust (locupressin or neuropeptide F1/F2) and in molluscs 
(conopressins G and S) [2]. The pattern developed to detect this category of peptides spans 
their entire sequence and includes four invariant amino acid residues. 

Consensus pattern C-[LIFY](2)-x-N-[CS]-P-x-G [The two Cs are linked by a disulfide 
bond]. Sequences known to belong to this class detected by the pattern ALL. 

[ 1] Acher R., Chauvet J. Biochimie 70:1197-1207(1988). 

[ 2] Chauvet J., Michel G., Ouedraogo Y., Chou J., Chait B.T., Acher R. Int. J. Pept. Protein 
Res. 45:482-487(1995). 

895. 7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase (HPPK) 
All organisms require reduced folate cofactors for the synthesis of a variety of metabolites. 
Most microorganisms must synthesize folate de novo because they lack the active transport 
system of higher vertebrate cells which allows these organisms to use dietary folates. 
Enzymes involved in folate biosynthesis are therefore targets for a variety of antimicrobial 
agents such as trimethoprim or sulfonamides. 7 ? 8-dihydro-6-hydroxymethylpterin- 
pyrophosphokinase (EC 2.7.6.3) (HPPK) catalyzes the attachment of pyrophosphate to 6- 
hydroxymethyl-7,8-dihydropterin to form 6-hydroxymethyl-7,8-dihydropteridine 
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pyrophosphate. This is the first step in a three-step pathway leading to 7,8-dihydrofolate. 
Bacterial HPPK (gene folK or sulD) [1] is a protein of 160 to 270 amino acids. In the lower 
eukaryote Pneumocystis carinii, HPPK is the central domain of a multifunctional folate 
synthesis enzyme (gene fas) [2]. As a signature for HPPK, a conserved region located in the 
central section of these enzymes was selected. 

Consensus pattern [KRHD]-x-[GA]-[PSAE]-R-x(2)-D-[LIV]-D-[LIVM](2) Sequences 
known to belong to this class detected by the pattern ALL. Other sequence(s) detected in 
SWISS-PROTNONE. 

[ 1] Talarico T.L., Ray P.H., Dev I.K., Merrill B.M., Dallas W.S. J. Bacteriol. 174:5971- 
5977(1992). 

[ 2] Volpes F., Dyer M., Scaife J.G., Darby G., Stammers D.K., Delves C.J. Gene 112:213- 
218(1992). 

896. Metalloenzyme superfamily (Metalloenzyme) 

This family includes phosphopentomutase Swiss:P07651 and 2,3-bisphosphoglycerate- 
independent phosphoglycerate mutase, Swiss:P37689. This family is also related to 
alk phosphatase [1]. The alignment contains the most conserved residues that are probably 
involved in metal binding and catalysis. Number of members: 34 

[1] Galperin MY, Bairoch A, Koonin EV; Medline: 99180418 A superfamily of 
metalloenzymes unifies phosphopentomutase and cofactor- independent phosphoglycerate 
mutase with alkaline phosphatases and sulfatases." Protein Sci 1998;7:1829-1835. 

897. Penicillin amidase (Penicil_amidase) 

Penicillin amidase or penicillin acylase EC:3.5.1.11 catalyses the hydrolysis of 
benzylpenicillin to phenylacetic acid and 6-aminopenicillanic acid (6-APA) a key 
intermediate in the the synthesis of penicillins [1]. Also in the family is cephalosporin acylase 
Swiss:P07662 and Swiss:P29958 aculeacin A acylase which are involved in the synthesis of 
related peptide antibiotics. Number of members: 13 
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[1] Verhaert RM, Riemens AM, van der Laan JM, van Duin J, Quax WJ; Medline: 97438505 
Molecular cloning and analysis of the gene encoding the thermostable penicillin G acylase 

from Alcaligenes faecalis. Appl Environ Microbiol 1997;63:3412-3418. 

[2] Duggleby HJ, Tolley SP, Hill CP, Dodson EJ, Dodson G, Moody PC; Medline: 95115804 
Penicillin acylase has a single-amino-acid catalytic centre." Nature 1995;373:264-268. 

898. Phosphoribosyl-AMP cyclohydrolase (PRA-CH) 

This enzyme catalyses the third step in the histidine biosynthetic pathway. It requires Zn ions 
for activity. Number of members: 13 

[1] D'Ordine RL, Klem TJ, Davisson VJ; Medline: 99129952 Nl-(5'- 
phosphoribosyl)adenosine-5'-monophosphate cyclohydrolase: purification and 
characterization of a unique metalloenzyme. Biochemistry 1999;38:1537-1546. 

899. Phosphoribosyl-ATP pyrophosphohydrolase (PRA-PH) 

This enzyme catalyses the second step in the histidine biosynthetic pathway. Number of 
members: 32 

[1] Keesey JK Jr, Bigelis R, Fink GR; Medline: 79216449 The product of the his4 gene 
cluster in Saccharomyces cerevisiae. A Afunctional polypeptide." J Biol Chem 1979 Aug 
10;254:7427-7433. 

[2] Bruni CB, Carlomagno MS, Formisano S, Paolella G; Medline: 86310274 Primary and 
secondary structural homologies between the HIS4 gene product of Saccharomyces 
cerevisiae and the hisIE and hisD gene products of Escherichia coli and Salmonella 
typhimurium." Mol Gen Genet 1986;203:389-396. 

900. Prokaryotic membrane lipoprotein lipid attachment site (PstS) 

In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptide, which 
is cleaved by a specific lipoprotein signal peptidase (signal peptidase II). The peptidase 
recognizes a conserved sequence and cuts upstream of a cysteine residue to which a 
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glyceride-fatty acid lipid is attached [1]. Some of the proteins known to undergo 
processing currently include (for recent listings see [1,2,3]): 

- Major outer membrane lipoprotein (murein-lipoproteins) (gene lpp). 

- Escherichia coli lipoprotein-28 (gene nlpA). 

- Escherichia coli lipoprotein-34 (gene nlpB). 

- Escherichia coli lipoprotein nlpC. 

- Escherichia coli lipoprotein nlpD. 

- Escherichia coli osmotically inducible lipoprotein B (gene osmB). 

- Escherichia coli osmotically inducible lipoprotein E (gene osmE). 

- Escherichia coli pep tidogly can- associated lipoprotein (gene pal). 

- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 

- Escherichia coli copper homeostasis protein cutF (or nlpE). 

- Escherichia coli plasmids traT proteins. 

- Escherichia coli Col plasmids lysis proteins. 

- A number of Bacillus beta-lactamases. 

- Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 

- Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 

- Borrelia hermsii variable major protein 21 (gene vmp21) and 7 (gene vmp7). 

- Chlamydia trachomatis outer membrane protein 3 (gene omp3). 

- Fibrobacter succinogenes endoglucanase cel-3. 

- Haemophilus influenzae proteins Pal and Pep. 

- Klebsiella pullulunase (gene pulA). 

- Klebsiella pullulunase secretion protein pulS. 

- Mycoplasma hyorhinis protein p37. 

- Mycoplasma hyorhinis variant surface antigens A ? B, and C (genes vlpABC). 

- Neisseria outer membrane protein H.8. 

- Pseudomonas aeruginosa lipopeptide (gene IppL). 

- Pseudomonas solanacearum endoglucanase egl. 

- Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 

- Rickettsia 17 Kd antigen. 

- Shigella flexneri invasion plasmid proteins mxiJ and mxiM. 

- Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 

- Treponema pallidium 34 Kd antigen. 

- Treponema pallidium membrane protein A (gene tmpA). 
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- Vibrio harveyi chitobiase (gene chb). 

- Yersinia virulence plasmid protein yscJ. 

- Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper-binding 
protein. This is the first archaebacterial protein known to be modified in such a fashion). 
From the precursor sequences of all these proteins, a consensus pattern was derived and a set 
of rules to identify this type of post-translational modification. 

Consensus pattern {DERK}(6)-[LIVMFWSTAG](2)-[LIVMFYSTAGCQ]-[AGS]-C [C is 
the lipid attachment site] Additional rules: 1) The cysteine must be between positions 15 and 
35 of the sequence in consideration. 2) There must be at least one Lys or one Arg in the first 
seven positions of the sequence. Sequences known to belong to this class detected by the 
patternALL. Other sequence(s) detected in SWISS-PROTsome 100 prokaryotic proteins. 
Some of them are not membrane lipoproteins, but at least half of them could be. 

[ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
[ 2] Klein P., Somorjai R.L., Lau P.C.K. Protein Eng. 2:15-20(1988). 
[ 3] von Heijne G. Protein Eng. 2:531-534(1989). 

[ 4] Mattar S., Scharf B., Kent S.B.H., Rodewald K., Oesterhelt D., Engelhard M. J. Biol. 
Chem. 269:14939-14945(1994). 

901. Ribosome recycling factor (RRF) 

The ribosome recycling factor (RRF / ribosome release factor) dissociates the ribosome from 
the mRNA after termination of translation, and is essential bacterial growth [1]. Thus 
ribosomes are "recycled" and ready for another round of protein synthesis. Number of 
members: 27 

[1] Janosi L, Shimizu I, Kaji A; Medline: 94240115 Ribosome recycling factor (ribosome 
releasing factor) is essential for bacterial growth." Proc Natl Acad Sci U S A 1994;91:4249- 
4253. 



902. S-layer homology(SLH) 



Attorney No. 2750-1237P 

723 

S-layers are paracrystalline mono-layered assemblies of (glyco)proteins which coat the 
surface of bacteria [1]. Several S-layer proteins and some other cell wall proteins contain one 
or more copies of a domain of about 50-60 residues, which has been called SLH (for S-layer 
homology) [2]. There is strong evidence that this domain serves as an anchor to the 
peptidoglycan [3]. The SLH domain has been found in: 

- S-layer glycoprotein of Acetogenium kivui (3 copies). 

- S-layer 125 Kd protein of Bacillus sphaericus (3 copies). 

- S-layer protein of Bacillus anthracis (3 copies). 

- S-layer protein of Bacillus licheniformis (3 copies). 

- S-layer protein (HWP) from Bacillus brevis strain HPD31 (3 copies). 

- Middle cell wall protein (MWP) from Bacillus brevis strain 47 (3 copies). 

- S-layer protein (plOO) of Thermus thermophilus (1 copy). 

- Outer membrane protein Omp-alpha from Thermotoga maritima (1 copy). 

- Cellulosome anchoring protein (gene ancA), outer layer protein B (OlpB) and a further 
potential cell surface glycoprotein from Clostridium thermocellum (3 copies; the first copy is 
missing its N-terminal third which is appended to the end of the third copy; may have arisen 
by circular permutation). 

- Amylopullulanase (gene amyB) from Thermoanaerobacter thermosulfurogenes (3 copies) 

- Amylopullulanase (gene aapT) from Bacillus strain XAL-601 (3 copies). 

- Endoglucanase from Bacillus strain KSM-635 (3 copies). 

- Exoglucanase (gene xynX) from Clostridium thermocellum (3 copies). 

- Xylanase A (gene xynA) from Thermoanaerobacter saccharolyticum (2 copies; 3 copies if a 
frameshift is taken into account). 

- Protein involved in butirosin production (ButB) from Bacillus circulans (2 incomplete 
copies; 3 copies if three frameshifts are taken into account). 

- Two hypothetical proteins from Synechocystis strain PCC 6803 (1 copy each). 

- A hypothetical protein with sequence similarity to amylopullulanases found 3' of amylase 
gene from Bacillus circulans (fragment of 1 copy; 3 copies if two frameshifts are taken into 
account). 

SLH domains are found at the N- or C-termini of mature proteins. They occur in single copy 
followed by a predicted coiled coil domain, or in three contiguous copies. Structurally, the 
SLH domain is predicted to contain two alpha-helices flanking a beta strand. The SLH 
sequences are fairly divergent with an average identity of about 25%. It is however possible 
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to build a sequence pattern that starts at the second position of the domain and that spans 3/4 
of its length. 

Consensus pattern[LVFYT]-x-[DA]-x(2,5)-[DNGSATPHY]-[FYWPDA]-x(4)-[LIV]-x(2)- 
5 [GTALV]-x(4,6)-[LIVFYC]-x(2)-G-x-[PGSTA]-x(2,3)-[MFYA]-x- [PGAV]-x(3,10)- 
[LIVMA]-[STKR]-[RY]-x-[EQ]-x-[STALIVM] Sequences known to belong to this class 
detected by the pattern ALL. Other sequence(s) detected in SWISS-PROTNONE. 

[ 1] Beveridge T.J. Curr. Opin. Struct. Biol. 4:204-212(1994). 
10 [2] Lupas A., Engelhardt H., Peters J., Santarius U., Volker S., Baumeister W. J. Bacteriol. 
176:1224-1233(1994). 

[ 3] Lemaire M., Ohayon H., Gounon P., Fujino T., Beguin P. J. Bacteriol. 177:2451- 
2459(1995). 

15 

903. Queuine tRNA-ribosyltransferase (TGT) 

This is a family of queuine tRNA-ribosyltransferases EC:2.4.2.29, also known as tRNA- 
guanine transglycosylase and guanine insertion enzyme. Queuine tRNA-ribosyltransferase 
modifies tRNAs for asparagine, aspartic acid, histidine and tyrosine with queuine. It catalyses 

2 0 the exchange of guanine-34 at the wobble position with 7-aminomethyl-7-deazaguanine, and 

the addition of a cyclopentenediol moiety to 7-aminomethyl-7-deazaguanine-34 tRNA; 
giving a hypermodified base queuine in the wobble position [l,2].The aligned region contains 
a zinc binding motif C-x-C-x2-C-x29-H, and important tRNA and 7-aminomethyl- 
7deazaguanine binding residues [1]. Number of members: 27 

25 

[1] Romier C, Reuter K, Suck D, Ficner R; Medline: 96256303 Crystal structure of tRNA- 
guanine transglycosylase: RNA modification by base exchange." EMBO J 1996;15:2850- 
2857. 

[2] Garcia GA, Koch KA, Chong S; Medline: 93287116 tRNA-guanine transglycosylase 

3 0 from Escherichia coli. Overexpression, purification and quaternary structure." J Mol Biol 

1993;231:489-497. 

904. ThiC Family (ThiC) 
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ThiC is found within the thiamine biosynthesis operon. ThiC is involved in pyrimidine 
biosynthesis [2]. ThiC catalyzes the substitution of the pyrophosphate of 2-methyl-4-amino- 
5-hydroxymethylpyrimidine pyrophosphate by 4-methyl-5-(beta-hydroxyethyl)thiazole 
phosphate to yield thiamine phosphate [3]. Number of members: 12 

[1] Vander Horn PB, Backstrom AD, Stewart V, Begley TP; Medline: 93163063 Structural 
genes for thiamine biosynthetic enzymes (thiCEFGH) in Escherichia coli K-12." J Bacteriol 
1993;175:982-992. 

[2] Begley TP, Downs DM, Ealick SE, McLafferty FW, Van Loon AP, Taylor S, 
Campobasso N, Chiu HJ, Kinsland C, Reddick JJ, Xi J; Medline: 99311269 Thiamin 
biosynthesis in prokaryotes." Arch Microbiol 1999;171:293-300. 

[3] Zhang Y, Taylor SV, Chiu HJ, Begley TP; Medline: 97284509 Characterization of the 
Bacillus subtilis thiC operon involved in thiamine biosynthesis." J Bacteriol 1997;179:3030- 
3035. 



905. Putative tRNA binding domain (tRNA_bind) 

This domain is found in prokaryotic methionyl-tRNA synthetases, prokaryotic phenylalanyl 
tRNA synthetases the yeast GU4 nucleic-binding protein (G4pl or p42, ARC1) [2], human 
tyrosyl-tRNA synthetase [1], and endothelial-monocyte activating polypeptide II. G4pl binds 
specifically to tRNA form a complex with methionyl-tRNA synthetases [2]. In human 
tyrosyl-tRNA synthetase this domain may direct tRNA to the active site of the enzyme [2]. 
This domain may perform a 

common function in tRNA aminoacylation [1]. Number of members: 12 

[1] Kleeman TA, Wei D, Simpson KL, First EA; Medline: 97306356 Human tyrosyl-tRNA 
synthetase shares amino acid sequence homology with a putative cytokine." J Biol Chem 
1997;272:14420-14425. 

[2] Simos G, Segref A, Fasiolo F, Hellmuth K, Shevchenko A, Mann M, Hurt EC; Medline: 
97050848 The yeast protein Arclp binds to tRNA and functions as a cofactor for the 
methionyl-and glutamyl-tRNA synthetases." EMBO J 1996;15:5437-5448. 



906. UbiA prenyltransferase family signature (UbiA) 
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The following prenyltransferases are evolutionary related [1,2]: 

- Bacterial 4-hydroxybenzoate octaprenyltransferase (gene ubiA). 

- Yeast mitochondrial para-hydroxybenzoate-polyprenyltransferase (gene COQ2). 

- Protoheme IX f arnesyltransferase (heme O synthase) from yeast and mammals (gene 
COX10) and from bacteria (genes cyoE or ctaB). 

These proteins probably contain seven transmembrane segments. The best conserved region 
is located in a loop between the second and third of these segments and was used as a 
signature pattern. 

Consensus pattern N-x(3)-[DE]-x(2)-[LIF]-D-x(2)-[VM]-x-R-[ST]-x(2)-R-x(4)-G Sequences 
known to belong to this class detected by the pattern ALL. Other sequence(s) detected in 
SWISS-PROTNONE. 

[ 1] Melzer M., Heide L. Biochim. Biophys. Acta 1212:93-102(1994). 
[ 2] Mogi T., Saiki K., Anraku Y. Mol. Microbiol. 14:391-398(1994). 

907. Uncharacterized protein family UPF0044 signature (UPF0044) 

The following uncharacterized proteins have been shown [1] to be highly similar: 

- Bacillus subtilis hypothetical protein yqel. 

- Escherichia coli hypothetical protein yhbY and HI1333, the corresponding Haemophilus 
influenzae protein. 

- Methanococcus jannaschii hypothetical protein MJ0652. 

These are small proteins of 10 to 15 Kd. They can be picked up in the database by the 
following pattern. This pattern is located in the N-terminal part of these proteins. 

Consensus pattern L-[ST]-x(3)-K-x(3)-[KR]-[SGA]-x-[GA]-H-x-L-x-P-[LIV]-x(2)- [LIV]- 
[GA]-x(2)-G Sequences known to belong to this class detected by the patternALL. 

908. ATP synthase (C/AC39) subunit (vATP-synt_AC39) 

This family includes the AC39 subunit from vacuolar ATP synthase Swiss:P32366 [1], and 
the C subunit from archaebacterial ATP synthase [2]. The family also includes subunit C 
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from the Sodium transporting ATP synthase from Enterococcus hirae Swiss:P43456 [3]. 
Number of members: 12 

[1] Bauerle C, Ho MN, Lindorfer MA, Stevens TH; Medline: 93286119 The Saccharomyces 
5 cerevisiae VMA6 gene encodes the 36-kDa subunit of the vacuolar H(+)-ATPase membrane 
sector." J Biol Chem 1993;268:12749-12757. 

[2] Wilms R, Freiberg C, Wegerle E, Meier I, Mayer F, Muller V; Medline: 96324968 
Subunit structure and organization of the genes of the A1A0 ATPase from the Archaeon 
Methanosarcina mazei Gol." J Biol Chem 1996;271:18843-18852. 
10 [3] Takase K, Kakinuma S, Yamato I, Konishi K, Igarashi K, Kakinuma Y; Medline: 

94209269 Sequencing and characterization of the ntp gene cluster for vacuolar- type Na(+)- 
translocating ATPase of Enterococcus hirae." J Biol Chem 1994;269:11037-11044. 

15 909. ATP synthase (E/31 kDa) subunit (vATP-synt^E) 

This family includes the vacuolar ATP synthase E subunit [1], as well as the archaebacterial 
ATP synthase E subunit [2]. Number of members: 24 

[1] Foury F; Medline: 91009356 The 31 -kDa polypeptide is an essential subunit of the 
2 0 vacuolar ATPase in Saccharomyces cerevisiae." J Biol Chem 1990;265:18554-18560. 
[2] Wilms R, Freiberg C, Wegerle E, Meier I, Mayer F, Muller V; Medline: 96324968 
Subunit structure and organization of the genes of the A1A0 ATPase from the Archaeon 
Methanosarcina mazei Gol." J Biol Chem 1996;271:18843-18852. 

25 

910. (WW) 

The WW domain [1-4,E1] (also known as rsp5 or WWP) has been originally discovered as a 
short conserved region in a number of unrelated proteins, among them dystrophin, the gene 
responsible for Duchenne muscular dystrophy. The domain, which spans about 35 residues, is 
30 repeated up to 4 times in some proteins. It has been shown [5] to bind proteins with particular 
proline- motifs, [AP]-P-P-[AP]-Y, and thus resembles somewhat SH3 domains. It appears to 
contain beta-strands grouped around four conserved aromatic positions; generally Trp. The 
name WW or WWP derives from the presence of these Trp as well as that of a conserved Pro. 
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It is frequently associated with other domains typical for proteins in signal transduction 
processes. 

Proteins containing the WW domain are listed below. 
5 - Dystrophin, a multidomain cytoskeletal protein. Its longest alternatively spliced form 
consists of an N-terminal actin-binding domain, followed by 24 spectrin-like repeats, a 
cysteine-rich calcium-binding domain and a C- terminal globular domain. Dystrophin form 
tetramers and is thought to have multiple functions including involvement in membrane 
stability, transduction of contractile forces to the extracellular environment and organization 
10 of membrane specialization. Mutations in the dystrophin gene lead to muscular dystrophy of 
Duchenne or Becker type. Dystrophin contains one WW domain C-terminal of the spectrin- 
repeats. 

- Utrophin, a dystrophin-like protein of unknown function. 

- Vertebrate YAP protein is a substrate of an unknown serine kinase. It binds to the SH3 

1 5 domain of the Yes oncoprotein via a proline-rich region. This protein appears in alternatively 
spliced isoforms, containing either one or two WW domains [6], 

- Mouse NEDD-4 plays a role in the embryonic development and differentiation of the 
central nervous system. It contains 3 WW modules followed by a HECT domain. The human 
ortholog contains 4 WW domains, but the third WW domain is probably spliced resulting in 

2 0 an alternate NEDD-4 protein with only 3 WW modules [3]. 

- Yeast RSP5 is similar to NEDD-4 in its molecular organization. It contains an N-terminal 
C2 domain (see <PDOC00380>, followed by a histidine-rich region, 3 WW domains and a 
HECT domain. 

- Rat FE65, a transcription- factor activator expressed preferentially in liver. The activator 

2 5 domain is located within the N-terminal 232 residues of FE65, which also contain the WW 

domain. 

- Yeast ESS1/PTF1, a putative peptidyl prolyl cis-trans isomerase from family ppiC (see 
<PDOC00840>). A related protein, dodo (gene dod) exists in Drosophila and in mammals 
(gene PIN1). 

3 0 - Tobacco DB10 protein. The WW domain is located N- terminal to the region with similarity 

to ATP-dependent RNA helicases. 

- IQGAP, a human GTPase activating protein acting on ras. It contains an N- terminal 
domain similar to fly muscle mp20 protein and a C-terminal ras GTPase activator domain. 
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- Yeast pre-mRNA processing protein PRP40, Caenorhabditis elegans ZK1098.1 and fission 
yeast SpAC13C5.02 are related proteins with similarity to MY02- type myosin, each 
containing two WW-domains at the N-terminus. 

- Caenorhabditis elegans hypothetical protein C38D4.5, which contains one WW module, a 
PH domain (see <PDOC50003>) and a C-terminal phosphatidylinositol 3-kinase domain. 

- Yeast hypothetical protein YFLOlOc. 

For the sensitive detection of WW domains, a profile was developed which spans the whole 
homology region as well as a pattern. 

Consensus pattern W-x(9,ll)-[VFY]-[FYW]-x(6 ? 7)-[GSTNE]-[GSTQCR]-[FYW]-x(2)-P 
Sequences known to belong to this class detected by the pattern ALL. Other sequence(s) 
detected in SWISS-PROT8. Sequences known to belong to this class detected by the 
profileALL. 

[ 1] Bork P., Sudol M. Trends Biochem. Sci 19:531-533(1994). 

[ 2] Andre B., Springael J.Y. Biochem. Biophys. Res. Commun. 205:1201-1205(1994), 
[ 3] Hofmann K.O., Bucher P. FEBS Lett. 358:153-157(1995). 

[ 4] Sudol M., Chen H.L, Bougeret C, Einbond A., Bork P. FEBS Lett. 369:67-71(1995). 
[ 5] Chen H.I., Sudol M. Proc. Natl. Acad. Sci. U.S.A. 92:7819-7823(1995). 
[ 6] Sudol M., Bork P., Einbond A., Kastury K., Druck T., Negrini M, Huebner K., Lehman 
D. J. Biol. Chem. 270:14733-14741(1995). 

911. Xeroderma pigmentosum (XP) [1] (XPG_1) 

Xeroderma pigmentosum (XP) [1] is a human autosomal recessive disease, characterized by a 
high incidence of sunlight-induced skin cancer. People's skin cells with this condition are 
hypersensitive to ultraviolet light, due to defects in the incision step of DNA excision repair. 
There are a minimum of seven genetic complementation groups involved in this pathway: 
XP-A to XP-G. The defect in XP-G can be corrected by a 133 Kd nuclear protein called XPG 
(or XPGC) [2]. 

XPG belongs to a family of proteins [2,3,4,5,6] that are composed of two main subsets: 
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- Subset I, to which belongs XPG, RAD2 from budding yeast and radl3 from fission yeast. 
RAD2 and XPG are single-stranded DNA endonucleases [7,8]. XPG makes the 3 'incision in 
human DNA nucleotide excision repair [9]. 

- Subset 2, to which belongs mouse and human FEN-1, rad2 from fission yeast, and RAD27 
from budding yeast. FEN-1 is a structure-specific endonuclease. 

In addition to the proteins listed in the above groups, this family also includes: 

- Fission yeast exol, a 5'->3' double-stranded DNA exonuclease that could act in a pathway 
that corrects mismatched base pairs. 

- Yeast EXOl (DHS1), a protein with probably the same function as exol. 

- Yeast DIN7. 

Sequence alignment of this family of proteins reveals that similarities are largely confined to 
two regions. The first is located at the N-terminal extremity (N-region) and corresponds to 
the first 95 to 105 amino acids. The second region is internal (I-region) and found towards the 
C-terminus; it spans about 140 residues and contains a highly conserved core of 27 amino 
acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS]). It is possible that the 
conserved acidic residues are involved in the catalytic mechanism of DNA excision repair in 
XPG. The amino acids linking the N- and I-regions are not conserved; indeed, they are 
largely absent from proteins belonging to the second subset. 

Two signature patterns were developed for these proteins. The first corresponds to the central 
part of the N-region, the second to part of the I-region and includes the putative catalytic core 
pentapeptide. 

Consensus pattern [VI]-[KRE]-P-x-[FYIL]-V-F-D-G-x(2)-[PIL]-x-[LVC]-K Sequences 
known to belong to this class detected by the patternALL. Other sequence(s) detected in 
SWISS-PROTNONE. 

Consensus pattern [GS]-[LIVM]-[PER]-[FYS]-[LIVM]-x-A-P-x-E-A-[DE]-[PAS]- [QS]- 
[CLM] Sequences known to belong to this class detected by the patternALL. Other 
sequence(s) detected in SWISS-PROTNONE. 

[ 1] Tanaka K., Wood R.D. Trends Biochem. Sci. 19:83-86(1994). 
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[ 2] Scherly D., Nouspikel T., Corlet J., Ucla C, Bairoch A., Clarkson S.G. Nature 363:182- 
185(1993). 

[ 3] Carr A.M., Sheldrick K.S., Murray J.M., Al-Harithy R., Watts F.Z., Lehmann A.R. 
Nucleic Acids Res. 21:1345-1349(1993). 

[ 4] Murray J.M., Tavassoli M., Al-Harithy R., Sheldrick K.S., Lehmann A.R., Carr AM., 

Watts F.Z. Mol. Cell. Biol. 14:4878-4888(1994). 

[ 5] Harrington J.J., Lieber M.R. Genes Dev. 8:1344-1355(1994). 

[ 6] Szankasi P., Smith G.R. Science 267:1166-1169(1995). 

[ 7] Habraken Y., Sung P., Prakash L., Prakash S. Nature 366:365-368(1993). 

[ 8] O'Donovan A, Scherly D., Clarkson S.G., Wood R.D. J. Biol. Chem. 269:15965- 

15968(1994). 

[ 9] O'Donovan A., Davies A.A., Moggs J.G., West S.C., Wood R.D. Nature 371:432- 
435(1994). 

912. 5-formyltetrahydrofolate cyclo-ligase (5-FTHF_cyc-lig) 

5-formyltetrahydrofolate cyclo-ligase or methenyl-THF synthetase EC:6.3.3.2 catalyses the 
interchange of 5-formyltetrahydrofolate (5-FTHF) to 5-10-methenyltetrahydrofolate, this 
requires ATP and Mg2+ [1]. 5-FTHF is used in chemotherapy where it is clinically known as 
Leucovorin [2]. 
Number of members: 23 

[1] Dayan A, Bertrand R, Beauchemin M, Chahla D, Mamo A, Filion M, Skup D, Massie B, 
Jolivet J; Medline: 96096540 Cloning and characterization of the human 5,10- 
methenyltetrahydrofolate synthetase-encoding cDNA." Gene 1995;165:307-311. 
[2] Maras B, Stover P, Valiante S, Barra D, Schirch V; Medline: 94308074 Primary 
structure and tetrahydropteroylglutamate binding site of rabbit liver cytosolic 5,10- 
methenyltetrahydrofolate synthetase." J Biol Chem 1994;269:18429-18433. 

913. Cytosolic long-chain acyl-CoA fhioester hydrolase (Acyl-CoA hydro) 

This family consist of various cytosolic long-chain acyl-CoA thioester hydrolases including 
human and rat [1,2]. The aligned region is repeated with in the sequence of human and rat 
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cytosolic long-chain acyl-CoA thioester hydrolases of this family. Long-chain acyl-CoA 
hydrolases hydrolyse palmitoyl-CoA to CoA and palmitate, they also catalyse the hydrolysis 
of other long chain fatty acyl-CoA thioesters. Long-chain acyl-CoA hydrolases are present in 
all living organisms and they may provide a mechanism for the control of lipid metabolism 

[!]• 

Number of members: 24 

[lJYamada J, Furihata T, Iida N, Watanabe T, Hosokawa M, Satoh T, Someya A, Nagaoka I, 
Suga T; Medline: 97236308 Molecular cloning and expression of cDNAs encoding rat brain 
and liver cytosolic long-chain acyl-CoA hydrolases." Biochem Biophys Res Commun 
1997;232:198-203. 

[2] Broustas CG, Larkins LK, Uhler MD, Hajra AK; Medline: 96209964 Molecular cloning 
and expression of cDNA encoding rat brain cytosolic acyl-coenzyme A thioester hydrolase." 
J Biol Chem 1996;271:10470-10476. 

914. Agglutinin 

Lectin (probable mannose binding) 

Members of this family are plant lectins. Many if not all are mannose specific. 
Number of members: 87 

[1] Wright CS, Hester G; Medline: 97094989 The 2.0 A structure of a cross-linked complex 
between snowdrop lectin and a branched mannopentaose: evidence for two unique binding 
modes." Structure 1996;4:1339-1352. 

915. (ANF_RECEPTORS) 

Natriuretic peptides are hormones involved in the regulation of fluid and electrolyte 
homeostasis. These hormones stimulate the intracellular production of cyclic GMP as a 
second messenger. 

Currently, three types of natriuretic peptide receptors are known [1,2]. Two express guanylate 
cyclase activity: GC-A (or ANP-A) which seems specific to atrial natriuretic peptide (ANP), 
and GC-B (or ANP-B) which seems to be stimulated more effectively by brain natriuretic 
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peptide (BNP) than by ANP. The third receptor (ANP-C) is probably responsible for the 
clearance of ANP from the circulation and does not play a role in signal transduction. 

GC-A and GC-B are plasma membrane-bound proteins that share the following topology: an 
N-terminal extracellular domain which acts as the ligand binding region, then a 
transmembrane domain followed by a large cytoplasmic C- terminal region that can be 
subdivided into two domains: a protein kinase-like domain (see <PDOC00100>) that appears 
important for proper signalling and a guanylate cyclase catalytic domain (see 
<PDOC00425>). The topology of ANP-C is different: like GC-A and -B it possesses an 
extracellular ligand-binding region and a transmembrane domain, but its cytoplasmic domain 
is very short. 

A pattern was developed from the ligand-binding region of natriuretic peptide receptors based 
on a highly conserved region located in the N-terminal part of the domain. 

Consensus patternG-P-x-C-x-Y-x-A-A-x-V-x-R-x(3)-H-W Sequences known to belong to 
this class detected by the patternALL. Other sequence(s) detected in SWISS-PROTNONE. 

[ 1] Garbers D.L. New Biol. 2:499-504(1990). 

[ 2] Schulz S., Chinkers M., Garbers D.L. FASEB J. 2:2026-2035(1989). 
916. (Apocytochrome) 

Cytochrome c family heme-binding site signature 

In proteins belonging to cytochrome c family [1], the heme group is covalently attached by 
thioether bonds to two conserved cysteine residues. The consensus sequence for this site is 
Cys-X-X-Cys-His and the histidine residue is one of the two axial ligands of the heme iron. 
This arrangement is shared by all proteins known to belong to cytochrome c family, which 
presently includes cytochromes c, c', cl to c6, c550 to c556, cc3/Hmc, cytochrome f and 
reaction center cytochrome c. 

Consensus patternC-{CPWHF}-{CPWR}-C-H-{CFYW} Sequences known to belong to this 
class detected by the patternALL, except for four cytochrome c's which lack the first 
thioether bond. Other sequence(s) detected in SWISS-PROT454. 
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Note: some cytochrome c's have more than a single bound heme groupc4 has 2, c7 has 3, c3 
has 4, the reaction center has 4, and cc3/Hmc has 16 ! 

[ 1] Mathews F.S. Prog. Biophys. Mol. Biol. 45:1-56(1985). 

917. ATP-synt_A-c. ATP synthase Alpha chain, C terminal 

[1] Medline: 94344236. Structure at 2.8 A resolution of Fl-ATPase from bovine heart 
mitochondria. Abrahams JP, Leslie AG, Lutter R, Walker JE; Nature 1994;370:621-628. 
Number of members: 125 

918. (Basic) 

Myc-type, 'helix-loop-helix' dimerization domain signature 
HELIX_LOOP_HELIX 

A number of eukaryotic proteins, which probably are sequence specific DNA- binding 
proteins that act as transcription factors, share a conserved domain of 40 to 50 amino acid 
residues. It has been proposed [1] that this domain is formed of two amphipathic helices 
joined by a variable length linker region that could form a loop. This ! helix-loop-helix' (HLH) 
domain mediates protein dimerization and has been found in the proteins listed below 
[2,3,E1,E2]. Most of these proteins have an extra basic region of about 15 amino acid 
residues that is adjacent to the HLH domain and specifically binds to DNA. They are refered 
as basic helix-loop-helix proteins (bHLH), and are classified in two groups: class A 
(ubiquitous) and class B (tissue-specific). Members of the bHLH family bind variations on 
the core sequence 'CANNTG', also refered to as the E-box motif. The homo- or 
heterodimerization mediated by the HLH domain is independent of, but necessary for DNA 
binding, as two basic regions are required for DNA binding activity. The HLH proteins 
lacking the basic domain (Emc, Id) function as negative regulators since they form 
heterodimers, but fail to bind DNA. The hairy -related proteins (hairy, E(spl), deadpan) also 
repress transcription although they can bind DNA. The proteins of this subfamily act together 
with co-repressor proteins, like groucho, through their C-terminal motif WRPW. 
- The myc family of cellular oncogenes [4], which is currently known to contain four 
members: c-myc [E3], N-myc, L-myc, and B-myc. The myc genes are thought to play a role 
in cellular differentiation and proliferation. 
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- Proteins involved in myogenesis (the induction of muscle cells). In mammals MyoDl 
(Myf-3), myogenin (Myf-4), Myf-5, and Myf-6 (Mrf4 or herculin), in birds CMD1 (QMF-1), 
in Xenopus MyoD and MF25, in Caenorhabditis elegans CeMyoD, and in Drosophila 
nautilus (nau). 

- Vertebrate proteins that bind specific DNA sequences ('E boxes') in various 
immunoglobulin chains enhancers: E2A or ITF-1 (E12/pan-2 and E47/pan-l), ITF-2 (tcf4), 
TFE3, and TFEB. 

- Vertebrate neurogenic differentiation factor 1 that acts as differentiation factor during 
neurogenesis. 

- Vertebrate MAX protein, a transcription regulator that forms a sequence- specific DNA- 
binding protein complex with myc or mad. 

- Vertebrate Max Interacting Protein 1 (MXI1 protein) which acts as a transcriptional 
repressor and may antagonize myc transcriptional activity by competing for max. 

- Proteins of the bHLH/PAS superfamily which are transcriptional activators. In mammals, 
AH receptor nuclear translocator (ARNT), single-minded homologs (SIM1 and SIM2), 
hypoxia-inducible factor 1 alpha (HIF1A), AH receptor (AHR), neuronal pas domain proteins 
(NPAS1 and NPAS2), endothelial pas domain protein 1 (EPAS1), mouse ARNT2, and 
human BMAL1. In drosophila, single-minded (SIM), AH receptor nuclear translocator 
(ARNT), trachealess protein (TRH), and similar protein (SIMA). 

- Mammalian transcription factors HES, which repress transcription by acting on two types 
of DNA sequences, the E box and the N box. 

- Mammalian MAD protein (max dimerizer) which acts as transcriptional repressor and may 
antagonize myc transcriptional activity by competing for max. 

- Mammalian Upstream Stimulatory Factor 1 and 2 (USF1 and USF2), which bind to a 
symmetrical DNA sequence that is found in a variety of viral and cellular promoters. 

- Human lyl-1 protein; which is involved, by chromosomal translocation, in T- cell leukemia. 

- Human transcription factor AP-4. 

- Mouse helix-loop-helix proteins MATH-1 and MATH-2 which activate E box- dependent 
transcription in collaboration with E47. 

- Mammalian stem cell protein (SCL) (also known as tall), a protein which may play an 
important role in hemopoietic differentiation. SCL is involved, by chromosomal 
translocation, in stem-cell leukemia. 
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- Mammalian proteins Idl to Id4 [5]. Id (inhibitor of DNA binding) proteins lack a basic 
DNA-binding domain but are able to form heterodimers with other HLH proteins, thereby 
inhibiting binding to DNA. 

- Drosophila extra-macrochaetae (emc) protein, which participates in sensory organ 
patterning by antagonizing the neurogenic activity of the achaete- scute complex. Emc is the 
homolog of mammalian Id proteins. 

- Human Sterol Regulatory Element Binding Protein 1 (SREBP-1), a transcriptional activator 
that binds to the sterol regulatory element 1 (SRE-1) found in the flanking region of the 
LDLR gene and in other genes. 

- Drosophila achaete-scute (AS-C) complex proteins T3 (l'sc), T4 (scute), T5 (achaete) and 
T8 (asense). The AS-C proteins are involved in the determination of the neuronal precursors 
in the peripheral nervous system and the central nervous system. 

- Mammalian homologs of achaete-scute proteins, the MASH-1 and MASH-2 proteins. 

- Drosophila atonal protein (ato) which is involved in neurogenesis. 

- Drosophila daughterless (da) protein, which is essential for neurogenesis and sex- 
determination. 

- Drosophila deadpan (dpn), a hairy-like protein involved in the functional differentiation of 
neurons. 

- Drosophila delilah (dei) protein, which is plays an important role in the differentiation of 
epidermal cells into muscle. 

- Drosophila hairy (h) protein, a transcriptional repressor which regulates the embryonic 
segmentation and adult bristle patterning. 

- Drosophila enhancer of split proteins E(spl), that are hairy-like proteins active during 
neurogenesis, also act as transcriptional repressors. 

- Drosophila twist (twi) protein, which is involved in the establishment of germ layers in 
embryos. 

- Maize anthocyanin regulatory proteins R-S and LC. 

- Yeast centromere-binding protein 1 (CPF1 or CBF1). This protein is involved in 
chromosomal segregation. It binds to a highly conserved DNA sequence, found in centromers 
and in several promoters. 

- Yeast IN02 and IN04 proteins. 

- Yeast phosphate system positive regulatory protein PH04 which interacts with the 
upstream activating sequence of several acid phosphatase genes. 

- Yeast serine-rich protein TYE7 that is required for ty-mediated ADH2 expression. 
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- Neurospora crassa nuc-1, a protein that activates the transcription of structural genes for 
phosphorus acquisition. 

- Fission yeast protein escl which is involved in the sexual differentiation process. 

The schematic representation of the helix-loop-helix domain is shown here: 

xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx Amphipathic 

helix 1 Loop Amphipathic helix 2 

The signature pattern that had been developed to detect this domain spans completely the 
second amphipathic helix. 

Consensus pattern[DENSTAP]-[K^]-[LIVMAGSNT]-{FYWCPHKR}-[LIVMT]-[LIVM 
x(2)-[STAV]-[LIVMSTACKR]-x-[VMFYH]-[LIVMTA]-{P}-{P}- [LIVMRKHQ] 
Sequences known to belong to this class detected by the pattern the majority but far from all. 
Other sequence(s) detected in SWISS-PROT135. 

[ 1] Murre C, McCaw P.S., Baltimore D. Cell 56:777-783(1989). 
[ 2] Garrel J. ? Campuzano S. BioEssays 13:493-498(1991). 
[ 3] Kato G.J., Dang C.V. FASEB J. 6:3065-3072(1992). 

[ 4] Krause M., Fire A., Harrison S.W., Priess J., Weintraub H. Cell 63:907-919(1990). 
[ 5] Riechmann V., van Cruechten I., Sablitzky F. Nucleic Acids Res. 22:749-755(1994). 

919. (Beta-lactamase) 

Beta-lactamases classes -A ? -C ? and -D active site 

Beta-lactamases (EC 3.5.2.6) [1,2] are enzymes which catalyze the hydrolysis of an amide 
bond in the beta-lactam ring of antibiotics belonging to the penicillin/cephalosporin 
family. Four kinds of beta-lactamase have been identified [3]. Class-B enzymes are zinc 
containing proteins whilst class -A, C and D enzymes are serine hydrolases. The three 
classes of serine beta- 
lactamases are evolutionary related and belong to a superfamily [4] that also includes DD- 
peptidases and a variety of other penicillin-binding proteins (PBP's). All these proteins 
contain a Ser-x-x-Lys motif, where the serine is the active site residue. Although clearly 
homologous, the sequences of the three classes of serine beta-lactamases exhibit a large 
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degree of variability and only a small number of residues are conserved in addition to the 
catalytic serine. 

Since a pattern detecting all serine beta-lactamases would also pick up many unrelated 
sequences, it was decided to provide specific patterns, centered on the active site serine, for 
each of the three classes. 

Consensus pattern [FY]-x-[LIVMFY]-x-S-[TV]-x-K-x(4)-[AGLM]-x(2)-[LC] [S is the active 
site residue] Sequences known to belong to this class detected by the patternALL class-A 
beta-lactamases. Other sequence(s) detected in SWISS-PROT7. 

Consensus pattern F-E-[LIVM]-G-S-[LIVMG]-[SA]-K [The first S is the active site residue] 
Sequences known to belong to this class detected by the patternALL class-C beta-lactamases. 
Other sequence(s) detected in SWISS-PROTNONE. 

Consensus pattern [PA]-x-S-[ST]-F-K-[LIV]-[PAL]-x-[STA]-[LI] [S is the active site 
residue] Sequences known to belong to this class detected by the patternALL class-D beta- 
lactamases. Other sequence(s) detected in SWISS-PROTNONE. 

[ 1] Ambler R.P. Philos. Trans. R. Soc. Lond., B, Biol. ScL 289:321-331(1980). 

[ 2] Pastor N., Pinero D., Valdes A.M., Soberon X. MoL Microbiol. 4:1957-1965(1990). 

[ 3] Bush K. Antimicrob. Agents Chemother. 33:259-263(1989). 

[ 4] Joris B. ? Ghuysen J.-M., Dive G., Renard A., Dideberg O., Charlier P., Frere J.M., Kelly 
J.A., Boyington J.C., Moews P.C., Knox J.R. Biochem. J. 250:313-324(1988). 

920. Biotin protein ligase (BPL) 

Biotin is covalently attached at the active site of certain enzymes that transfer carbon dioxide 
from bicarbonate to organic acids to form cellular metabolites. Biotin protein ligase (BPL) is 
the enzyme responsible for attaching biotin to a specific lysine at the active site of biotin 
enzymes. Each organism probably has only one BPL. Biotin attachment is a two step 
reaction that results in the formation of an amide linkage between the carboxyl group of 
biotin and the epsilon-amino group of the modified lysine [2]. 
Number of members : 26 
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[1] Wilson KP, Shewchuk LM, Brennan RG, Otsuka AJ ? Matthews BW; Medline: 93028443 
Escherichia coli biotin holoenzyme synthetase/bio repressor crystal structure delineates the 
biotin- and DNA-binding domains." Proc Natl Acad Sci USA 1992;89:9257-9261. 
5 [2] Chapman-Smith A, Cronan JE Jr; Medline: 10470036 The enzymatic biotinylation of 
proteins: a post-translational modification of exceptional specificity." Trends Biochem Sci 
1999;24:359-363. 

921. (BRCA2_repeat) 

10 

The alignment covers only the most conserved region of the repeat. Respiratory-chain NADH 
dehydrogenase 30 Kd subunit signature 

[1] Bork P ? Blomberg N, Nilges M; Medline: 96241568 Internal repeats in the BRCA2 
1 5 protein sequence." Nat Genet 1996;13:22-23. 

Number of members: 63 

922. (C6) 

20 

This domain of unknown function is found in the C. elegans protein Swiss:Q19522. It is 
presumed to be an extracellular domain. The C6 domain contains six conserved cysteine 
residues in most copies of the domain. However some copies of the domain are missing 
cysteine residues 1 and 3 suggesting that these form a disulphide bridge. 

2 5 Number of members: 23 

923. Cadherin cytoplasmic region (Cadherin__C_term) 

Cadherins are vital in cell-cell adhesion during tissue differentiation. Cadherins are linked to 

3 0 the cytoskeleton by catenins. Catenins bind to the cytoplasmic tail of the cadherin. Cadherins 

cluster to form foci of homophilic binding units. A key determinant to the strength of the 
binding that it is mediated by cadherins is the juxtamembrane region of the cadherin. This 
region induces clustering and also binds to the protein pl20ctn [1]. 
Number of members: 59 
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[1] Yap AS, Niessen CM, Gumbiner BM; Medline: 98234411 The juxtamembrane region of 
the cadherin cytoplasmic tail supports lateral clustering, adhesive strengthening, and 
interaction with pl20ctn." J Cell Biol 1998;141:779-789. 

[2] Barth AI, Nathke IS, Nelson WJ; Medline: 97471931 Cadherins, catenins and APC 
protein: interplay between cytoskeletal complexes and signaling pathways." Curr Opin Cell 
Biol 1997;9:683-690. 

[3] Braga VM, Machesky LM, Hall A, Hotchin NA; Medline: 97327766 The small GTPases 
Rho and Rac are required for the establishment of cadherin-dependent cell-cell contacts." J 
Cell Biol 1997;137:1421-1431. 

924. Clathrin propeller repeat (Clathrin_propel) 

Clathrin is the scaffold protein of the basket-like coat that surrounds coated vesicles. The 
soluble assembly unit, a triskelion, contains three heavy chains and three light chains in an 
extended three-legged structure. Each leg contains one heavy and one light chain. The N- 
terminus of the heavy chain is known as the globular domain, and is composed of seven 
repeats which form a beta propeller [1]. 
Number of members: 61 

[1] ter Haar E, Musacchio A, Harrison SC, Kirchhausen T; Medline: 99043510 Atomic 
structure of clathrin: a beta propeller terminal domain joins an alpha zigzag linker." Cell. 
1998;95:563-573. 

925. Respiratory-chain NADH dehydrogenase 30 Kd subunit signature (complex l_30Kd) 

Respiratory-chain NADH dehydrogenase (EC 1.6.5,3) [1,2] (also known as complex I or 
NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex located in the 
inner mitochondrial membrane which also seems to exist in the chloroplast and in 
cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide 
subunits of this bioenergetic enzyme complex there is one with a molecular weight of 30 
Kd (in mammals) which has been found to be: 

- Nuclear encoded, as a precursor form with a transit peptide in mammals, and in Neurospora 
crassa. 
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-Mitochondrial encoded in Paramecium (protein PI), and in the slime mold Dictyostelium 
discoideum (ORF 209). 

- Chloroplast encoded in various higher plants (ORF 159). It is also present in bacteria: 

- In the cyanobacteria Synechocystis strain PCC 6803 (gene ndhJ). 

- Subunit C of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoC). 

- Subunit NQOS of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 

This protein, in its mature form, consists of from 157 to 266 amino acid residues. The 
best conserved region is located in the C-terminal section and can be used as a signature 
pattern. 

Consensus pattern E-R-E-x(2)-[DE]-[LIVMFY](2)-x(6)-[HK]-x(3)-[KRP]-x-[LIVM]- 
[LIVMYS] Sequences known to belong to this class detected by the patternALL. Other 
sequence(s) detected in SWISS-PROTNONE. 

[ 1] Ragan C.I. Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2]Weiss H. ? Friedrich T., Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991). 

926. Respiratory-chain NADH dehydrogenase 49 Kd subunit signature (complexl_49Kd) 

Respiratory-chain NADH dehydrogenase (EC 1.6.5.3) [1,2] (also known as complex I or 
NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex located in the 
inner mitochondrial membrane which also seems to exist in the chloroplast and in 
cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide 
subunits of this bioenergetic enzyme complex there is one with a molecular weight of 49 Kd 
(in mammals), which is the third largest subunit of complex I and is a component of the 
iron-sulfur (IP) fragment of the enzyme. It seems to bind a 4Fe-4S iron-sulfur cluster. The 49 
Kd subunit has been found to be: 

- Nuclear encoded, as a precursor form with a transit peptide in mammals, and in Neurospora 
crassa. 

- Mitochondrial encoded in protozoan such as Paramecium (ORF 400), Leishmania and 
Trypanosoma (MURF 3). 

- Chloroplast encoded in various higher plants (ORF 392). 
The 49 Kd subunit is highly similar to [3,4]: 

- Subunit D of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoD). 
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- Subunit NQ04 of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 

- Subunit 5 of Escherichia coli formate hydrogenlyase (gene hycE). 

- Subunit G of Escherichia coli hydrogenase-4 (gene hyfG). 

A highly conserved region was seleceted as signature pattern, located in the N-terminal 
section of this subunit. 

Consensus pattern [LIVMH]-H-[RT]-[GA]-x-E-K»[LIVMTN]-x-E-x-[KRQ] Sequences 
known to belong to this class detected by the patternALL. 

[ 1] Ragan CX Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T., Hofhaus G. ; Preis D. Eur. J. Biochem. 197:563-576(1991). 

[ 3] Fearnley I.M., Walker J.E. Biochim. Biophys. Acta 1140:105-134(1992). 

[ 4] Weidner U. ? Geier S., Ptock A., Friedrich T., Leif H., Weiss H. J. Mol. Biol. 233:109- 

122(1993). 

927. (COX2) 

Cytochrome c oxidase (EC 1.93.1) [1 ? 2] is an oligomeric enzymatic complex which is a 
component of the respiratory chain and is involved in the transfer of electrons from 
cytochrome c to oxygen. In eukaryotes this enzyme complex is located in the mitochondrial 
inner membrane; in aerobic prokaryotes it is found in the plasma membrane. The enzyme 
complex consists of 3-4 subunits (prokaryotes) to up to 13 polypeptides (mammals). 

Subunit 2 (CO II) transfers the electrons from cytochrome c to the catalytic subunit 1. It 
contains two adjacent transmembrane regions in its N-terminus and the major part of the 
protein is exposed to the periplasmic or to the mitochondrial intermembrane space, 
respectively. CO II provides the substrate- binding site and contains a copper center called 
Cu(A) ? probably the primary acceptor in cytochrome c oxidase. An exception is the 
corresponding subunit of the cbb3-type oxidase which lacks the copper A redox-center. 
Several bacterial CO II have a C-terminal extension that contains a covalently bound heme c. 

It has been shown [3,4] that nitrous oxide reductase (EC 1.7.99.6) (gene nosZ) of 
Pseudomonas has sequence similarity in its C-terminus to CO II. This enzyme is part of the 
bacterial respiratory system which is activated under anaerobic conditions in the presence of 
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nitrate or nitrous oxide. NosZ is a periplasmic homodimer that contains a dinuclear copper 
center, probably located in a 3- dimensional fold similar to the cupredoxin-like fold that has 
been suggested for the copper-binding site of CO II [3]. 

The dinuclear purple copper center is formed by 2 histidines and 2 cysteines [5]. This region 
was used as a signature pattern. The conserved valine and the conserved methionine are said 
to be involved in stabilizing the copper-binding fold by interacting with each other. 

Consensus pattern V-x-H-x(33,40)-C-x(3)-C-x(3)-H-x(2)-M [The two C's and two H's are 
copper ligands] Sequences known to belong to this class detected by the patternALL, except 
for Paramecium primaurelia as well as in some plants where the pattern ends with Thr; an 
RNA editing event at this position could change this Thr to Met. 

Note: cytochrome cbb(3) subunit 2 does not belong to this family. 

[ 1] Capaldi R.A., Malatesta F., Darley-Usmar V.M. Biochim. Biophys. Acta 726:135- 
148(1983). 

[ 2] Garcia-Horsman J .A., Barquera B., Rumbley J., Ma J. ? Gennis R.B. J. Bacteriol. 
176:5587-5600(1994). 

[ 3] van der Oost J. ? Lappalainen P., Musacchio A. ? Warne A., Lemieux L., Rumbley J. ? 
Gennis R.B., Aasa R., Pascher T., Malmstrom B.G., Saraste M. EMBO J. 11:3209- 
3217(1992). 

[ 4] Zumft W.G., Dreutsch A., Loechelt S., Cuypers H., Friedrich B., Schneider B. Eur. J. 
Biochem. 208:31-40(1992). 

928. Cytochrome C assembly protein (CytC_asm) 

This family consists of various proteins involved in cytochrome c assembly from 
mitochondria and bacteria; CycK from Rhizobium[3], CcmC from E. coli and Paracoccus 
denitrificans [2,1] and orf240 from wheat mitochondria [4]. The members of this family are 
probably integral membrane proteins with six predicted transmembrane helices. It has been 
proposed that members of this family comprise a membrane component of an ABC (ATP 
binding cassette) transporter complex. It is also proposed that this transporter is necessary for 
transport of some component needed for cytochrome c assembly. One member CycK 
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contains a putative heme-binding motif [3], orf240 also contains a putative heme-binding 
motif and is a proposed ABC transporter with c-type heme as its proposed substrate [4]. 
However it seems unlikely that all members of this family transport heme nor c-type 
apocytochromes because CcmC in the putative CcmABC transporter transports neither [1]. 
5 Number of members: 67 

[1] Page D, Pearce DA, Norris HA, Ferguson SJ; Medline: 97195802 The Paracoccus 
denitrificans ccmA, B and C genes: cloning and sequencing, and analysis of the potential of 
their products to form a haem or apo-c-type cytochrome transporter. MICROBIOLOGY 
10 1997;143:563-576. 

[2] Thoeny-meyer L, Fischer F, Kunzler P, Ritz D, Hennecke H; Medline: 95362656 
Escherichia coli genes required for cytochrome c maturation." J. BACTERIOL 
1995;177:4321-4326. 

[3] Delgado MJ, Yeoman KH, Wu G, Vargas C, Davies A, Poole RK, Johnston AWB, 
1 5 Downie JA; Medline: 95394794 Characterization of the cycHJKL genes involved in 

cytochrome c biogenesis and symbiotic nitrogen fixation in Rhizobium leguminosarum " J. 
BACTERIOL 1995;177:4927-4934. 

[4] Bonnard G, Grienenberger JM; Medline: 95124303 A gene proposed to encode a 
transmembrane domain of an ABC transporter is expressed in wheat mitochondria." MOL. 
2 0 GEN. GENET 1995;246:91-99. 

929. Cytochrome b559 subunits heme-binding site signature (cytochr_b559) 

Cytochrome b559 [1] is an essential component of photosystem II complex from oxygenic 

2 5 photosynthetic organisms. It is an integral thylakoid membrane protein composed of two 

subunits, alpha (gene psbE) and beta (gene psbF), each of which contains a histidine residue 
located in a transmembrane region. The two histidines coordinate the heme iron of 
cytochrome b559. 

3 0 The region around the heme-binding residue of both subunits is very similar and can be used 

as a signature pattern. 
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Consensus pattern[LIV]-x-[ST]-[LIVF]-R-[FYW]-x(2)-[IV]-H-[STGA]-[LIV]- [STGA]- 
[IV]-P [H is the heme iron ligand] Sequences known to belong to this class detected by the 
patternALL. Other sequence^) detected in SWISS-PROTNONE. 

[ 1] Pakrasi H.B., de Ciechi P., Whitmarsh J. EMBO J. 10:1619-1627(1991). 
930. Cytochrome b/b6 signatures (Cytochrome_b) 

In the mitochondrion of eukaryotes and in aerobic prokaryotes, cytochrome b is a component 
of respiratory chain complex III (EC 1.10.2.2) - also known as the bcl complex or ubiquinol- 
cytochrome c reductase. In plant chloroplasts and cyanobacteria, there is a analogous protein, 
cytochrome b6, a component of the plastoquinone-plastocyanin reductase (EC 1.10.99.1), 
also known as the b6f complex. 

Cytochrome b/b6 [1,2] is an integral membrane protein of approximately 400 amino acid 
residues that probably has 8 transmembrane segments. In plants and cyanobacteria, 
cytochrome b6 consists of two subunits encoded by the petB and petD genes. The sequence 
of petB is colinear with the N-terminal part of mitochondrial cytochrome b, while petD 
corresponds to the C-terminal part. Cytochrome b/b6 non-covalently binds two heme groups, 
known as b562 and b566. Four conserved histidine residues are postulated to be the ligands 
of the iron atoms of these two heme groups. 

Apart from regions around some of the histidine heme ligands, there are a few conserved 
regions in the sequence of b/b6. The best conserved of these regions includes an invariant P- 
E-W triplet which lies in the loop that separates the fifth and sixth transmembrane segments. 
It seems to be important for electron transfer at the ubiquinone redox site - called Qz or Qo 
(where o stands for outside) - located on the outer side of the membrane. 

A schematic representation of the structure of cytochrome b/b6 is shown below. 

+— Fe-b562 — + | +— Fe-b566-|-+ 1 1 1 1 

xxxxxxxxxxxHxHxxxxxxxxxxxxHxHxxxxxxxxxxPEWxxxxxxxxxxxxxxxxxx < 
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— Cytochrome-b > <— -Cytochrome-b6-petB- ><--Cytochrome- 

b6-petD > 

Two signature patterns were developed for cytochrome b/b6. The first includes the first 
conserved histidine of b/b6, which is a heme b562 ligand; the second includes the conserved 
PEW triplet. 

Consensus pattern [DENQ]-x(3)-G-[FYWMQ]-x-[LIVMF]-R-x(2)-H [H is a heme b562 
ligand] Sequences known to belong to this class detected by the patternALL, except for 5 
sequences. 

Consensus pattern P-[DE]-W-[FY]-[LFY](2) Sequences known to belong to this class 
detected by the patternALL, except for Odocoileus hemionus (mule deer) and Paramecium 
tetraurelia cytochrome b. 

[ 1] Howell N. J. Mol. Evol. 29:157-169(1989). 

[ 2] Esposti M.D., de Vries S., Crimi M., Ghelli A., Patarnello T. ? Meyer A. Biochim. 
Biophys. Acta 1143:243-271(1993). 

931. Phorbol esters / diacylglycerol binding domain (DAGJPE-bind) 

Diacylglycerol (DAG) is an important second messenger. Phorbol esters (PE) are analogues 
of DAG and potent tumor promoters that cause a variety of physiological changes when 
administered to both cells and tissues. DAG activates a family of serine/threonine protein 
kinases, collectively known as protein kinase C (PKC) [1]. Phorbol esters can directly 
stimulate PKC. The N- terminal region of PKC, known as CI, has been shown [2] to bind PE 
and DAG in a phospholipid and zinc-dependent fashion. The CI region contains one or two 
copies (depending on the isozyme of PKC) of a cysteine-rich domain about 50 amino-acid 
residues long and essential for DAG/PE-binding. Such a domain has also been found in the 
following proteins: 

- Diacylglycerol kinase (EC 2.7.1.107) (DGK) [3], the enzyme that converts DAG into 
phosphatidate. It contains two copies of the DAG/PE-binding domain in its N-terminal 
section. At least five different forms of DGK are known in mammals. 
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- N-chimaerin. A brain specific protein which shows sequence similarities with the BCR 
protein at its C-terminal part and contains a single copy of the DAG/PE-binding domain at its 
N-terminal part. It has been shown [4,5] to be able to bind phorbol esters. 

- The raf/mil family of serine/threonine protein kinases. These protein kinases contain a 
single N-terminal copy of the DAG/PE-binding domain. 

- The imc-13 protein from Caenorhabditis elegans. Its function is not known but it contains a 
copy of the DAG/PE-binding domain in its central section and has been shown to bind 
specifically to a phorbol ester in the presence of calcium [6]. 

- The vav oncogene. Vav was generated by a genetic rearrangement during gene transfer 
assays. Its expression seems to be restricted to cells of hematopoeitic origin. Vav seems [5,7] 
to contain a DAG/PE-binding domain in the central part of the protein. 

- The Drosophila GTPase activating protein rotund. 

The DAG/PE-binding domain binds two zinc ions; the ligands of these metal ions are 
probably the six cysteines and two histidines that are conserved in this domain. A signature 
pattern was developed that spans completely the DAG/PE domain. 

Consensus pattern H-x-[LIVMFYW]-x(8 ? H)-C-x(2)-C-x(3)-[LIVMFC]-x(5 ; 10)- C-x(2)-C- 
x(4)-[HD]-x(2)-C-x(5,9)-C [All the C and H are involved in binding Zinc] Sequences known 
to belong to this class detected by the pattern ALL, except a few DGK's. 

[ 1] Azzi A., Boscoboinik D., Hensey C. Eur. J. Biochem. 208:547-557(1992). 

[ 2] Ono Y., Fujii T., Igarashi K., Kuno T., Tanaka C, Kikkawa U., Nishizuka Y. Proc. Natl. 

Acad. Sci. U.S.A. 86:4868-4871(1989). 

[ 3] Sakane F., Yamada K., Kanoh H., Yokoyama C, Tanabe T. Nature 344:345-348(1990). 
[ 4] Ahmed S., Kozma R. ? Monfries C, Hall C, Lim H.H., Smith P., Lim L. Biochem. J. 
272:767-773(1990). 

[ 5] Ahmed S., Kozma R. ? Lee J., Monfries C. ? Harden N., Lim L. Biochem. J. 280:233- 
241(1991). 

[ 6] Ahmed S., Maruyama I.N., Kozma R., Lee J., Brenner S. ? Lim L. Biochem. J. 287:995- 
999(1992). 

[ 7] Boguski M.S., Bairoch A. ? Attwood T.K., Michaels G.S. Nature 358:113-113(1992). 
932. 3-dehydroquinate synthase (DHQ_synthase) 
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[1] Barten R, Meyer TF; Medline: 98273626 Cloning and characterisation of the Neisseria 
gonorrhoeae aroB gene." Mol Gen Genet 1998;258:34-44. 

[2] Hawkins AR, Lamb HK; Medline: 96048023 The molecular biology of multidomain 
proteins. Selected examples." Eur J Biochem 1995;232:7-18. 

The 3-dehydroquinate synthase EC:4.6.1.3 domain is present in isolation in various bacterial 
3-dehydroquinate synthases and also present as a domain in the pentafunctional AROM 
polypeptide Swiss:P07547 [2]. 3-dehydroquinate (DHQ) synthase catalyses the formation of 
dehydroquinate (DHQ) and orthophosphate from 3-deoxy-D-arabino heptulosonic 7 
phosphate [1], This reaction is part of the shikimate pathway which is involved in the 
biosynthesis of aromatic amino acids. 
Number of members: 25 

933. Dihydrofolate reductase signature (DiHfolate_red) 

Dihydrofolate reductases (EC 1.5.1.3) [1] are ubiquitous enzymes which catalyze the 
reduction of folic acid into tetrahydrofolic acid. They can be inhibited by a number of 
antagonists such as trimethroprim and methotrexate which are used as antibacterial or 
anticancerous agents. A signature pattern was derived from a region in the N-terminal part of 
these enzymes, which includes a conserved Pro-Trp dipeptide; the tryptophan has been 
shown [2] to be involved in the binding of substrate by the enzyme. 

Consensus pattern[LVAGC]-[LIF]-G-x(4)-[LIVMF]-P-W-x(4 ? 5)-[DE]-x(3)-[FYIV]- 
x(3)-[STIQ] Sequences known to belong to this class detected by the patternALL, except for 
type II bacterial, plasmid-encoded, dihydrofolate reductases which do not belong to the same 
class of enzymes. 

[ 1] Harpers' Review of Biochemistry, Lange, Los Altos (1985). 

[ 2] Bolin J.T., Filman D J., Matthews D.A., Hamlin R.C, Kraut J. J. Biol. Chem. 257:13650- 
13662(1982). 



934. (DIL) 
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[1] Ponting CP; Medline: 95397417 AF-6/cno: neither a kinesin nor a myosin, but a bit of 
both." Trends Biochem Sci 1995;20:265-266. 

Number of members : 3 1 

935. (DNAgyraseBC) 

DNA topoisomerase II signature (cross-reference = TOPOISOMERASE_II) 

DNA topoisomerase I (EC 5.99.1.2) [1,2,3,4,E1] is one of the two types of enzyme that 
catalyze the interconversion of topological DNA isomers. Type II topoisomerases are ATP- 
dependent and act by passing a DNA segment through a transient double-strand break. 
Topoisomerase II is found in phages, archaebacteria, prokaryotes, eukaryotes, and in African 
Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three subunits 
(the product of genes 39, 52 and 60). In prokaryotes and in archaebacteria the enzyme, known 
as DNA gyrase, consists of two subunits (genes gyrA and gyrB [E2]). In some bacteria, a 
second type II topoisomerase has been identified; it is known as topoisomerase IV and is 
required for chromosome segregation, it also consists of two subunits (genes parC and parE). 
In eukaryotes, type II topoisomerase is a homodimer. 

There are many regions of sequence homology between the different subtypes of 
topoisomerase II. The relation between the different subunits is shown in the following 
representation: 

< — . About- 1400-residues — > 

[ Protein 39-* ][— -Protein 52 — ] Phage T4 

[ gyrB * ][ gyrA ] Prokaryote II 

Archaebacteria 

[— parE * ][ parD ] Prokaryote IV 

[ * ] Eukaryote and ASF 

'*': Position of the pattern. 

As a signature pattern for this family of proteins, a region was selected that contains a highly 
conserved pentapeptide. The pattern is located in gyrB, in parE, and in protein 39 of phage 
T4 topoisomerase. 



Attorney No. 2750-1237P 



750 



Consensus pattern [LIVMA]-x-E-G-[DN]-S-A-x-[STAG] Sequences known to belong to this 
class detected by the pattern ALL. 



[ 1] Sternglanz R. Curr. Opin. Cell Biol. 1:533-535(1990). 

[ 2] Bjornsti M.-A. Curr. Opin. Struct. Biol. 1:99-103(1991). 

[ 3] Sharma A., Mondragon A. Curr. Opin. Struct. Biol. 5:39-47(1995). 

[ 4] Roca J. Trends Biochem. ScL 20:156-160(1995). 



936. (DNAjopoisolIV) 

DNA topoisomerase II signature (cross-reference = TOPOISOMERASE_II) 

DNA topoisomerase I (EC 5.99.1.2) [1,2,3,4,E1] is one of the two types of enzyme that 
catalyze the interconversion of topological DNA isomers. Type II topoisomerases are ATP- 
dependent and act by passing a DNA segment through a transient double-strand break. 
Topoisomerase II is found in phages, archaebacteria, prokaryotes, eukaryotes, and in African 
Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three subunits 
(the product of genes 39, 52 and 60). In prokaryotes and in archaebacteria the enzyme, known 
as DNA gyrase, consists of two subunits (genes gyrA and gyrB [E2]). In some bacteria, a 
second type II topoisomerase has been identified; it is known as topoisomerase IV and is 
required for chromosome segregation, it also consists of two subunits (genes parC and parE). 
In eukaryotes, type II topoisomerase is a homodimer. 

There are many regions of sequence homology between the different subtypes of 
topoisomerase II. The relation between the different subunits is shown in the following 
representation: 



< _ About- 1400-residues > 

[ Protein 39-*— ][— -Protein 52 — ] Phage T4 

[ g yr B * ][ gyrA — ] Prokaryote II Archaebacteria 

[ parE * ][ parD- ] Prokaryote IV 

[ * ] Eukaryote and ASF 

'*': Position of the pattern. 
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As a signature pattern for this family of proteins, a region was selected that contains a highly 
conserved pentapeptide. The pattern is located in gyrB, in parE, and in protein 39 of phage 
T4 topoisomerase. 

Consensus pattern [LIVMA]-x-E-G-[DN]-S-A-x-[STAG] Sequences known to belong to this 
class detected by the patternALL. 

[ 1] Sternglanz R. Curr. Opin. Cell Biol. 1:533-535(1990). 

[ 2] Bjornsti M.-A. Curr. Opin. Struct. Biol. 1:99-103(1991). 

[ 3] Sharma A., Mondragon A. Curr. Opin. Struct. Biol. 5:39-47(1995). 

[ 4] Roca J. Trends Biochem. Sci. 20:156-160(1995). 

937. Prolyl oligopeptidase family serine active site (DPPIV_N_term) 

The prolyl oligopeptidase family [1,2,3] consist of a number of evolutionary related 
peptidases whose catalytic activity seems to be provided by a charge relay system similar to 
that of the trypsin family of serine proteases, but which evolved by independent convergent 
evolution. The known members of this family are listed below. 

- Prolyl endopeptidase (EC 3.4.21.26) (PE) (also called post-proline cleaving enzyme). PE is 
an enzyme that cleaves peptide bonds on the C-terminal side of prolyl residues. The sequence 
of PE has been obtained from a mammalian species (pig) and from bacteria (Flavobacterium 
meningosepticum and Aeromonas hydrophila); there is a high degree of sequence 
conservation between these sequences. 

- Escherichia coli protease II (EC 3.4.21.83) (oligopeptidase B) (gene prtB) which cleaves 
peptide bonds on the C-terminal side of lysyl and argininyl residues. 

- Dipeptidyl peptidase IV (EC 3.4.14.5) (DPP IV). DPP IV is an enzyme that removes N- 
terminal dipeptides sequentially from polypeptides having unsubstituted N-termini provided 
that the penultimate residue is proline. 

- Yeast vacuolar dipeptidyl aminopeptidase A (DPAP A) (gene: STE13) which is responsible 
for the proteolytic maturation of the alpha-factor precursor. 

- Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene: DAP2). 

- Acylamino-acid-releasing enzyme (EC 3.4.19.1) (acyl-peptide hydrolase). This enzyme 
catalyzes the hydrolysis of the amino-terminal peptide bond of an N-acetylated protein to 
generate a N-acetylated amino acid and a protein with a free amino-terminus. 
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A conserved serine residue has experimentally been shown (in E.coli protease II as well as in 
pig and bacterial PE) to be necessary for the catalytic mechanism. This serine, which is part 
of the catalytic triad (Ser, His, Asp), is generally located about 150 residues away from the C- 
terminal extremity of these enzymes (which are all proteins that contains about 700 to 800 
amino acids). 

Consensus pattern D-x(3)-A-x(3)-[LIVMFYW]-x(14)-G-x-S-x-G-G-[LIVMFYW](2) [S is 
the active site residue] Sequences known to belong to this class detected by the pattern ALL, 
except for yeast DPAP A. 

Note: these proteins belong to families S9A/S9B/S9C in the classification of peptidases 
[4,E1]. 

[ 1] Rawlings N.D., Polgar L., Barrett A J. Biochem. J. 279:907-911(1991). 
[ 2] Barrett A.J., Rawlings N.D. Biol. Chem. Hoppe-Seyler 373:353-360(1992). 
[ 3] Polgar L. ? Szabo E. Biol. Chem. Hoppe-Seyler 373:361-366(1992). 
[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

938. Deoxyhypusine synthase (DS) 

Eukaryotic initiation factor 5 A (eIF-5 A) contains an unusual amino acid, 
hypusine [N epsilon-(4-aminobutyl-2-hydroxy)lysine]. The first step in the 
post-translational formation of hypusine is catalysed by the enzyme 
deoxyhypusine synthase (DS) EC:1. 1.1.249. The modified version of eIF-5A, 
and DS, are required for eukaryotic cell proliferation [1]. 
Number of members: 9 

[1] Liao DI, Wolff EC, Park MH, Davies DR; Medline: 98154315 Crystal structure of the 
NAD complex of human deoxyhypusine synthase: an enzyme with a ball-and-chain 
mechanism for blocking the active site." Structure 1998;6:23-32. 
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939. (DUF21) 

Many of the sequences in this family are annotated as hemolysins, however this is due to a 
similarity to Swiss:Q54318 that does not contain this domain. This domain is found in the N- 
terminus of the proteins adjacent to two intracellular CBS domains CBS. 
Number of members: 42 

940. (DUF59) 

This family includes prokaryotic proteins of unknown function. The family also includes 
PhaH Swiss: 084984 from Pseudomonas putida. PhaH forms a complex with PhaF 
Swiss:Q84982, PhaG Swiss:084983 and Phal Swiss:084985, which hydroxylates 
phenylacetic acid to 2-hydroxyphenylacetic acid [!]• So members of this family may all be 
components of ring hydroxylating complexes. 
Number of members: 15 

[1] Olivera ER, Minambres B, Garcia B, Muniz C ? Moreno MA, Ferrandez A, Diaz E ? Garcia 
JL, Luengo JM; Medline: 98263372 Molecular characterization of the phenylacetic acid 
catabolic pathway in Pseudomonas putida U: the phenylacetyl-CoA catabolon." Proc Natl 
Acad Sci U S A 1998;95:6419-6424. 

941. (DUF82) 

The protein contains four conserved cysteines that may be involved in metal binding or 
disulphide bridges. 
Number of members: 4 

942. Riboflavin kinase / FAD synthetase (FAD_Synth) 

This family consists part of the bifunctional enzyme riboflavin kinase / FAD synthetase. 
These enzymes have both ATP: riboflavin 5 r -phospho transferase and ATP:FMN- 
adenylyltransferase activitys [1]. They catalyse the 5 '-phosphorylation of riboflavin to FMN 
and the adenylylation of FMN to FAD [1]. 
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CAUTION: It is not clear if this region of the enzymes catalyses either or both of the 
enzymatic reactions. 
Number of members: 27 

[1] Manstein DJ, Pai EF; Medline: 87057286 Purification and characterization of FAD 
synthetase from Brevibacterium ammoniagenes." J Biol Chem 1986;261:16169-16173. 

943. [2Fe-2S] binding domain (fer2_2) 

[1] Romao MJ, Archer M, Moura I, Moura JJ, LeGall J ? Engh R, Schneider M, Hof P, Huber 
R; Medline: 96072968 Crystal structure of the xanthine oxidase-related aldehyde oxido- 
reductase from D. gigas." Science 1995;270:1170-1176. 
Number of members: 53 

944. Filovirus glycoprotein (Filo__glycop) 

This family includes an extracellular region from the envelope glycoprotein of Ebola and 
Marburg viruses. This region is also produced as a separate transcript that gives rise to a non- 
structural, secreted glycoprotein, which is produced in large amounts and has an unknown 
function [1]. Processing of this protein may be involved in viral pathogenicity [2], 
Number of members: 23 

[1] Volchkov VE, Feldmann H, Volchkova VA, Klenk HD; Medline: 98245155 Processing 
of the Ebola virus glycoprotein by the proprotein convertase furin/ ? Proc Natl Acad Sci U S 
A 1998;95:5762-5767. 

[2] Sanchez A, Trappier SG, Mahy BW, Peters CJ ? Nichol ST; Medline: 96195018 The 
virion glycoproteins of Ebola viruses are encoded in two reading frames and are expressed 
through transcriptional editing." Proc Natl Acad Sci U S A 1996;93:3602-3607. 

945. Frataxin-like domain (Frataxin_Cyay) 

This family contains proteins that have a domain related to the globular C-terminus of 
Frataxin the protein that is mutated in Friedreich's ataxia. This domain is found in a family of 
bacterial proteins. The function of this domain is currently unknown. 
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[1] Gibson TJ, Koonin EV, Musco G, Pastore A, Bork P; Medline: 97084946 Friedreich's 
ataxia protein: phylogenetic evidence for mitochondrial dysfunction." Trends Neurosci 
1996;19:465-468. 

946. (GAF) 

Domain present in phytochromes and cGMP-specific phosphodiesterases. 
Number of members: 296 

[1] Aravind L, Ponting CP; Medline: 98094688 The GAF domain: an evolutionary link 
between diverse phototransducing proteins." Trends Biochem Sci 1997;22:458-459. 

947. Galaptin signature (Gal-bind Jectin) 

All vertebrates synthesize soluble galactoside-binding lectins [1,2,3] (also known as 
galectins, galaptins or S-lectin). These carbohydrate-binding proteins are developmentally 
regulated. Although their exact physiological role is not yet clear they seem to be involved in 
differentiation, cellular regulation and tissue construction. The sequence of galactoside- 
binding lectins from electric eel (electrolectin), conger eel (congerin), chicken and a number 
of mammalian species is known. These lectins are proteins of about 130 to 140 amino acid 
residues (14 Kd to 16 Kd). 

A number of other proteins are known to belong to this family: 

- Galectin-3 (also known as MAC-2 antigen; CBP-35 or IgE-binding protein), a 35 Kd lectin 
which binds immunoglobulin E and which is composed of two domains: a N-terminal domain 
that consist of tandem repeats of a glycine/ proline-rich sequence and a C-terminal galaptin 
domain. 

- Galectin-4 [4], which is composed of two galaptin domains. 

- Galectin-5. 

- Galectin-7 [5], a keratinocyte protein which could be involved in cell-cell and/or cell- 
matrix interactions necessary for normal growth control. 

- Galectin-8 [6], which is composed of two galaptin domains. 



Attorney No. 2750-1237P 

756 

- Galectin-9 [7], which is composed of two galaptin domains. 

- Human eosinophil lysophospholipase (EC 3.1.1.5) [8] (Charcot-Leyden crystal protein), a 
protein that may have both an enzymatic and a lectin activities. It forms hexagonal 
bipyramidal crystals in tissues and secretions from sites of eosinophil-associated 
inflammation. 

- Caenorhabditis elegans 32 Kd lactose-binding lectin [9]. This lectin is composed of two 
galaptin domains. 

- Caenorhabditis elegans lec-7 and lec-8. 

One of the conserved regions of these lectins contains a tryptophan that has been shown [10] 
to be essential to the binding of galactosides. This region was used as a signature pattern for 
these proteins. 

Consensus patternW-[GEK]-x-[EQ]-x-[KRE]-x(3,6)-[PCTF]-[LIVMF]-[NQEGSKV]-x- 
[GH]-x(3)-[DENKHS]-[LIVMFC] [W binds carbohydrate] Sequences known to belong to 
this class detected by the pattern ALL, except for pig galectin 4. 

[ 1] Barondes S.H., Gitt M.A., Leffler H., Cooper D.N.W. Biochimie 70:1627-1632(1988). 
[ 2] Hirabayashi J., Kasai K.-I. J. Biochem. 104:1-4(1988). 

[ 3] Barondes S.H., Castronovo V., Cooper D.N.W., Cummings R.D., Drickamer K., Feizi 
T., Gitt M.A., Hirabayashi J., Hughes C, Kasai K.-I., Leffler H., Liu F.-T., Lotan R., 
Mercurio A.M., Monsigny M., Pillair S., Poirer F., Raz A., Rigby P.W.J., Rini J.M., Wang 
J.L. Cell 76:597-598(1994). 

[ 4] Oda Y., Herrmann J., Gitt M., Turck C.W., Burlingame A.L., Barondes S.H., Leffler H. 
J. Biol. Chem. 268:5929-5939(1993). 

[ 5] Madsen P., Rasmussen H.H., Flint T., Gromov P., Kruse T.A., Honore B., Vorum H., 
Celis J.E. J. Biol. Chem. 270:5823-5829(1995). 

[ 6] Hadari Y.R., Paz K., Dekel R., Mestrovic T., Accili D., Zick Y. J. Biol. Chem. 270:3447- 
3453(1995). 

[ 7] Wada J., Kanwar Y.S. J. Biol. Chem. 272:6078-6086(1997). 

[ 8] Ackerman S.J., Corrette S.E., Rosenberg H.F., Bennett J.C., Mastrianni D.M., 

Nicholson-Weller A., Weller P.F., Chin D.T., Tenen D.G. J. Immunol. 150:456-468(1993). 

[ 9] Hirabayashi J., Satoh M., Kasai K.-I. J. Biol. Chem. 267:15485-15490(1992). 

[10] Abbott W.M., Feizi T. J. Biol. Chem. 266:5552-5557(1991). 
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948. (GARS) Phosphoribosylglycinamide synthetase signature (phosphoribosylamine glycine 
ligase) 

PROSITE: PDOC00164; cross-reference(s): PS00184 

[1] catalyzes the second step in the de novo biosynthesis of purine, the ATP-dependent 
addition of 5-phosphoribosylamine to glycine to form 5 'phosphoribosylglycinamide. 

In bacteria GARS is a monofunctional enzyme (encoded by the purD gene), in of a 
bifunctional enzyme (encoded by the ADE5/7 gene), in higher eukaryotes it is part, with 
AIRS and with phosphoribosylglycinamide formyltransferase (GART) of a trifunctional 
enzyme (GARS-AIRS-GART). 

The sequence of GARS is well conserved. A highly conserved octapeptide was 
selected as a signature pattern. 

Consensus patternR-F-G-D-P-E-x-[QM] 

Sequences known to belong to this class detected by the patternALL. 
[l]Aiba A., Mizobuchi K. J. Biol. Chem. 264:21239-21246(1989). 

949. GLTT - GLTT repeat (12 copies) 

This short repeat of unknown function is found in multiple copies in several C. elegans 
proteins. The repeat is five residues long and consists of XGLTT where X can be any amino 
acid. Number of members: 34. 

950. Glu_synthase - Conserved region in glutamate synthase 

This family represents a region of the glutamate synthase protein. This region is expressed as 
a seperate subunit in the glutamate synthase alpha subunit from archaebacteria, or part of a 
large multidomain enzyme in other organisms. The aligned region of these proteins contains a 
putative FMN binding site and Fe-S cluster. Number of members: 44. 

[1] Medline: 97082505. Sequence of the GLT1 gene from Saccharomyces cerevisiae reveals 
the domain structure of yeast glutamate synthase. Filetici P, Martegani MP, Valenzuela L, 
Gonzalez A, Ballario P; Yeast 1996;12:1359-1366. 

951. (Glyco_hydro_2) Glycosyl hydrolases family 2 signatures 



Attorney No. 2750-1237P 

758 

GLYCOS YL_HYDROL_F2_l ; PS00608; GLYCOSYL_HYDROLJF2_2 

It has been shown [1,2,E1] that the following glycosyl hydrolases can be, on the basis of 

sequence similarities, classified into a single family: 

-Beta-galactosidases (EC 3.2.1.23) from bacteria such as Escherichia coli (genes lacZ and 
ebgA), Clostridium acetobutylicum, Clostridium thermosulfurogenes, Klebsiella 
pneumoniae, Lactobacillus delbrueckii, or Streptococcus thermophilus and from the fungi 
Kluyveromyces lactis. 

-Beta-glucuronidase (EC 3.2.1.31) from Escherichia coli (gene uidA) and from mammals. 
One of the conserved regions in these enzymes is centered on a conserved glutamic acid 
residue which has been shown [3], in Escherichia coli lacZ, to be the general acid/base 
catalyst in the active site of the enzyme. This region has been used as a signature pattern. A 
highly conserved region located some sixty residues upstream from the active site glutamate 
has been selected as a second signature pattern. 

Consensus pattern N-x-[LIVMFYWD]-R-[STACN](2)-H-Y-P-x(4)-[LIVMFYWS](2)-x(3> 
[DN]-x(2)-G-[LIVMFYW](4) Sequences known to belong to this class detected by the 
pattern ALL. 

Consensus pattern [DENQLF]-[KRVW]-N-[HRY]-[STAPPV]-[SAC]-[LIVMFS](3)-W- 
[GS]-x(2 ? 3)-N-E [E is the active site residue] Sequences known to belong to this class 
detected by the pattern ALL, except for Rhizobium meliloti lacZ. 

[l]Henrissat B. Biochem. J. 280:309-316(1991). 

[2]Schroeder C J., Robert C, Lenzen G., McKay L.L., Mercenier A. J. Gen. Microbiol. 
137:369-380(1991). 

[3]Gebler J.C., Aebersold R., Withers S.G. J. Biol. Chem. 267:11126-11130(1992). 
952. (Glyco_hydro_3) Glycosyl hydrolases family 3 active site 

PROSITE: PDOC00621. PROSITE cross-reference(s)PS00775; GLYCOSYL_HYDROLJF3 
It has been shown [1,2] that the following glycosyl hydrolases can be, on the basis of 
sequence similarities, classified into a single family: 

-Beta glucosidases (EC 3.2.1.21) from the fungi Aspergillus wentii (A-3), Hansenula 
anomala, Kluyveromyces fragilis, Saccharomycopsis fibuligera,(BGLl and BGL2), 
Schizophyllum commune and Trichoderma reesei (BGL1). 
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-Beta glucosidases from the bacteria Agrobacterium tumefaciens (Cbgl), Butyrivibrio 
fibrisolvens (bglA), Clostridium thermocellum (bglB), Escherichia coli (bglX), Erwinia 
chrysanthemi (bgxA) and Ruminococcus albus. 
-Alteromonas strain 0-7 beta-hexosaminidase A (EC 3.2.1.52). 
-Bacillus subtilis hypothetical protein yzbA. 

-Escherichica coli hypothetical protein ycfO and HI0959, the corresponding Haemophilus 
influenzae protein. 

One of the conserved regions in these enzymes is centered on a conserved aspartic 
acid residue which has been shown [3], in Aspergillus wentii beta-glucosidase A3, to be 
implicated in the catalytic mechanism. This region was used as a signature pattern. 

Consensus pattem[LIVM](2)-[KR]-x-[EQK]-x(4)-G-[LIVMFT]-[LIVT]-[LIVMF]^ 
x(2)-[SGADNI] [D is the active site residue] 

Sequences known to belong to this class detected by the patternALL, 
[l]Henrissat B. Biochem. J. 280:309-316(1991), 

[2]Castle L.A., Smith K.D., Morris R.O. J. Bacteriol. 174:1478-1486(1992). 
[3]Bause E., Legler G. Biochim. Biophys. Acta 626:459-465(1980). 

953. GP120 - Envelope glycoprotein GP120 

The entry of HIV requires interaction of viral GP120 with Swiss:P01730 and a chemokine 
receptor on the cell surface. Number of members: 17891 

[l]Medline: 98303379. Structure of an HIV gpl20 envelope glycoprotein in complex with 
the CD4 receptor and a neutralizing human antibody. Kwong PD, Wyatt R, Robinson J, 
Sweet RW, Sodroski J, Hendrickson WA; Nature 1998;393:648-659. 

954. (GSPII_E) Bacterial type II secretion system protein E signature 
PROSITE: PDOC00567. PROSITE cross-reference(s) PS00662; T2SP__E 

A number of bacterial proteins, some of which are involved in a general secretion 
pathway (GSP) for the export of proteins (also called the type II pathway) [1,2], have been 
found to be evolutionary related. These proteins are listed below: 
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-The 'E f protein from the GSP operon of: Aeromonas (gene exeE); Erwinia (gene outE); 
Escherichia coli (gene yheG); Klebsiella pneumoniae (gene pulE); Pseudomonas aeruginosa 
(gene xcpR); Vibrio cholerae (gene epsE) and Xanthomonas campestris (gene xpsE). 
-Agrobacterium tumefaciens Ti plasmid virB operon protein 11. This protein is required for 
the transfer of T-DNA to plants. 

-Bacillus subtilis comG operon protein 1 which is required for the uptake of DNA by 
competent Bacillus subtilis cells. 

-Aeromonas hydrophila tapB, involved in type IV pilus assembly. 
-Pseudomonas protein pilB, which is essential for the formation of the pili. 
-Pseudomonas aeruginosa protein twitching mobility protein pilT. 
-Neisseria gonorrhoeae type IV pilus assembly protein pilF. 
-Vibrio cholerae protein tcpT, which is involved in the biosynthesis of the 
tcp pilus. 

-Escherichia coli protein hofB (hopB). 
-Escherichia coli hypothetical protein ygcB. 
-Escherichia coli hypothetical protein yggR. 

These proteins have from 344 (pilT and virBll) to 568 (tapB) amino acids, they are 
probably cytoplasmically located and, on the basis of the presence of a conserved P-loop 
region (see <PDOC00017>) ? probably bind ATP. A region that overlaps the 'B r motif of 
ATP -binding proteins was selected as a signature pattern. 

Consensus pattern[LIVM]-R-x(2)-P-D-x-[LIVM](3)-G-E-[LIVM]-R-D 

Sequences known to belong to this class detected by the patternALL, except for ygcB. 

[l]Salmond G.P.C., Reeves PJ. Trends Biochem. ScL 18:7-12(1993). 
[2]Hobbs M„ Mattick J.S. MoL Microbiol. 10:233-243(1993). 

955. (guanylate__cyc) Guanylate cyclases signature 

PROSITE: PDOC00425. PROSITE cross-reference(s) PS00452; 

GUANYLATE CYCLASES Guanylate cyclases (EC 4.6.1.2) [1 to 4] catalyze the 

formation of cyclic GMP (cGMP) from GTP. cGMP acts as an intracellular messenger, 
activating cGMP dependent kinases and regulating CGMP-sensitive ion channels. The role of 
cGMP as a second messenger in vascular smooth muscle relaxation and retinal photo- 
transduction is well established. Guanylate cyclase is found both in the soluble and particular 
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fraction of eukaryotic cells. The soluble and plasma membrane-bound forms differ in 
structure, regulation and other properties. 

Most currently known plasma membrane-bound forms are receptors for small 
polypeptides. The topology of such proteins is the following: they have a N-terminal 
extracellular domain which acts as the ligand binding region, then a transmembrane domain, 
followed by a large cytoplasmic C-terminal region that can be subdivided into two domains: a 
protein kinase-like domain that appears important for proper signalling and a cyclase catalytic 
domain. This topology is schematically represented below. 

+ xxxxx + + 

| Ligand-binding XXXXX Protein Kinase like | Cyclase | 

+ xxxxx + + 

Extracellular Transmembrane Cytoplasmic 

The known guanylate cyclase receptors are: 

-The sea-urchins receptors for speract and resact, which are small peptides that stimulate 
sperm motility and metabolism. 

-The receptors for natriuretic peptides (ANF). Two forms of ANF receptors with guanylate 
cyclase activity are currently known: GC-A (or ANP-A) which seems specific to atrial 
natriuretic peptide (ANP), and GC-B (or ANP-B) which seems to be stimulated more 
effectively by brain natriuretic peptide (BNP) than by ANP. 

-The receptor for Escherichia coli heat-stable enterotoxin (GC-C). The endogenous ligand 
for this intestinal receptor seems to be a small peptide called guanylin. 
-Retinal guanylate cyclase (retGC) which probably plays a specific functional role in the 
rods and/or cones of photoreceptors. It is not known if this protein acts as receptor, but its 
structure is similar to that of the other plasma membrane-bound GCs. 

The soluble forms of guanylate cyclase are cytoplasmic heterodimers. The two 
subunits, alpha and beta are proteins of from 70 to 82 Kd which are highly related. Two 
forms of beta subunits are currently known: beta-1 which seems to be expressed in lung and 
brain, and beta-2 which is more abundant in kidney and liver. 

The membrane and cytoplasmic forms of guanylate cyclase share a conserved domain 
which is probably important for the catalytic activity of the enzyme. Such a domain is also 
found twice in the different forms of membrane-bound adenylate cyclases (also known as 
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class-Ill) [5,6] from mammals, slime mold or Drosophila. A consensus pattern was derived 
from the most conserved region in that domain. 

Consensus pattemG-V-[LIVM]-x(0 5 l)-G-x(5)-[FY]-x-[LIVM]-[FYW]-[GS]-[DNTHKW]- 
[DNT]-[IV]-[DNTA]-x(5)-[DE] 

Sequences known to belong to this class detected by the patternALL, except for the sea 

urchin Arbacia punctulata resact receptor which lack this domain. 

Note this pattern will detect both domains of adenylate cyclases class-Ill . 

[l]Koesling D., Boehme E., Schultz G. FASEB J. 5:2785-2791(1991). 
[2]Garbers D.L. New Biol. 2:499-504(1990). 
[3]Garbers D.L. Cell 71:1-4(1992). 

[4] Yuen P.S.T., Garbers D.L. Annu. Rev. Neurosci. 15:193-225(1992). 
[5]Iyengar R. FASEB J. 7:768-775(1993). 

[6]Barzu O., Danchin A. Prog. Nucleic Acid Res. Mol. Biol. 49:241-283(1994). 

956. Hemolysin- type calcium-binding region signature (HemolysinCabinD) 

Gram-negative bacteria produce a number of proteins which are secreted into the growth 
medium by a mechanism that does not require a cleaved N-terminal signal sequence. These 
proteins, while having different functions, seem [1] to share two properties: they bind 
calcium and they contain a variable number of tandem repeats consisting of a nine amino acid 
motif rich in glycine, aspartic acid and asparagine. It has been shown [2] that such a domain 
is involved in the binding of calcium ions in a parallel beta roll structure. The proteins which 
are currently known to belong to this category are: 

- Hemolysins from various species of bacteria. Bacterial hemolysins are exotoxins that attack 
blood cell membranes and cause cell rupture. The hemolysins which are known to contain 
such a domain are those from: E. coli (gene hlyA), A. pleuropneumoniae (gene appA), A. 
actinomycetemcomitans and P. haemolytica (leukotoxin) (gene lktA). 

- Cyclolysin from Bordetella pertussis (gene cyaA). A multifunctional protein which is both 
an adenylate cyclase and a hemolysin. 

- Extracellular zinc proteases: serralysin (EC 3.4.24.40) from Serratia, prtB and prtC from 
Erwinia chrysanthemi and aprA from Pseudomonas aeruginosa. 

- Nodulation protein nodO from Rhizobium leguminosarum. 
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A signature pattern was derived from conserved positions in the sequence of the calcium- 
binding domain. 

Consensus pattern D-x-[LI]-x(4)-G-x-D-x-[LI]-x-G-G-x(3)-D Sequences known to belong to 
this class detected by the pattern ALL. 

Note: This pattern is found once in nodO and the extracellular proteases but up to 5 times in 
some hemolysin/cyclolysins. 

[ 1] Economou A., Hamilton W.D.O., Johnston A.W.B., Downie J.A. EMBO J. 9:349- 
354(1990). 

[ 2] Baumann U, Wu S., Flaherty K.M., McKay D.B. EMBO J. 12:3357-3364(1993). 

957. Hint module (Hint) 

This is an alignment of the Hint module in the Hedgehog proteins. It does not include any 
Inteins which also possess the Hint module. 
Number of members: 36 

[1] Hall TM, Porter JA ? Young KE, Koonin EV, Beachy PA, Leahy DJ; Medline: 97474313 
Crystal structure of a Hedgehog autoprocessing domain: homology between Hedgehog and 
self-splicing proteins." Cell 1997;91:85-97. 

958. Hydantoinase/oxoprolinase (Hydantoinase) 

This family includes the enzymes hydantoinase and oxoprolinase EC:3.5.2.9. Both reactions 
involve the hydrolysis of 5-membered rings via hydrolysis of their internal imide bonds [1]. 
Number of members: 14 

[1] Ye GJ, Breslow EB ? Meister A, Guo-jie GE$[corrected to Ye GJ]; Medline: 97113037 
The amino acid sequence of rat kidney 5-oxo-L-prolinase determined by cDNA cloning" 
[published erratum appears in J Biol Chem 1997 Feb 14;272(7):4646] J Biol Chem 
1996;271:32293-32300. 
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959. IMP dehydrogenase / GMP reductase signature (IMPDH_N) 

IMP dehydrogenase (EC 1.1.1.205) (IMPDH) catalyzes the rate-limiting reaction of de novo 
GTP biosynthesis, the NAD-dependent reduction of IMP into XMP [1]. Inhibition of IMP 
dehydrogenase activity results in the cessation of DNA synthesis. As IMP dehydrogenase is 
associated with cell proliferation, it is a possible target for cancer chemotherapy. Mammalian 
and bacterial IMPDHs are tetramers of identical chains. There are two IMP dehydrogenase 
isozymes in humans [2], 

GMP reductase (EC 1.6.6.8) catalyzes the irreversible and NADPH-dependent reductive 
deamination of GMP into IMP [3]. It converts nucleobase, nucleoside and nucleotide 
derivatives of G to A nucleotides, and maintains intracellular balance of A and G nucleotides. 

IMP dehydrogenase and GMP reductase share many regions of sequence similarity. One of 
these regions is centered on a cysteine residue thought [3] to be involved in binding IMP. 
This region was used as a signature pattern. 

Consensus pattern[LIVM]-[RK]-[LIVM]~G-[LIVM]-G-x-G-S-[LIVM]-C-x~T [C is the 
putative IMP-binding residue] Sequences known to belong to this class detected by the 
pattern ALL. 

[ 1] Collart F.R., Huberman K J. Biol. Chem. 263:15769-15772(1988). 

[ 2] Natsumeda Y. ? Ohno S. ? Kawasaki H., Konno Y., Weber G., Suzuki K. J. Biol. Chem. 

265:5292-5295(1990). 

[ 3] Andrews S.C., Guest J.R. Biochem. J. 255:35-43(1988). 

960. impB/mucB/samB family (IMS) 

These proteins are involved in UV protection (Swiss). 
Number of members: 38 

961. Type II intron maturase (Intron_maturas2) 
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Group II introns use intron-encoded reverse transcriptase, maturase and DNA endonuclease 
activities for site-specific insertion into DNA [2]. Although this type of intron is self splicing 
in vitro they require a maturase protein for 

splicing in vivo. It has been shown that a specific region of the aI2 intron is needed for the 
maturase function [1]. This region was found to be conserved in group II introns and called 
domain X [3]. 

Number of members: 335 

[1] Moran JV, Mecklenburg KL, Sass P, Belcher SM, Mahnke D, Lewin A, Perlman P; 
Medline: 94301788 Splicing defective mutants of the COXI gene of yeast mitochondrial 
DNA: initial definition of the maturase domain of the group II intron aI2. Nucleic Acids Res 
1994;22:2057-2064. 

[2] Guo H, Zimmerly S, Perlman PS, Lambowitz AM; Medline: 98031910 Group II intron 
endonucleases use both RNA and protein subunits for recognition of specific sequences in 
double-stranded DNA." EMBO J 1997;16:6835-6848. 

[3] Mohr G, Perlman PS, Lambowitz AM; Medline: 94077696 Evolutionary relationships 
among group II intron-encoded proteins and identification of a conserved domain that may be 
related to maturase function." Nucleic Acids Res 1993;21:4991-4997. 

962. LAGLIDADG endonuclease (Intron_maturase) 

[1] Heath PJ, Stephens KM, Monnat RJ Jr, Stoddard BL; Medline: 97331323 The structure 
of I-Crel, a group I intron-encoded homing endonuclease." Nat Struct Biol 1997;4:468-476. 
[2] Belfort M, Roberts RJ; Medline: 97402526 Homing endonucleases: keeping the house in 
order." Nucleic Acids Res 1997;25:3379-3388. 

[3] Dalgaard JZ, Klar AJ, Moser MJ, Holley WR, Chatterjee A, Mian IS; Medline: 98026854 
Statistical modeling and analysis of the LAGLIDADG family of site-specific endonucleases 
and identification of an intein that encodes a site-specific endonuclease of the HNH family." 
Nucleic Acids Res 1997;25:4626-4638. 

Number of members: 220 



963. Isopentenyl transferase (IPT) 
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Isopentenyl transferase / dimethylallyl transferase synthesizes isopentenyladensosine 5- 
monophosphate, a cytokinin that induces shoot formation on host plants infected with the Ti 
plasmid [1]. 

Number of members: 16 

[1] Canaday J, Gerad JC, Crouzet P, Otten L; Medline: 93101133 "Organization and 
functional analysis of three T-DNAs from the vitopine Ti plasmid pTiS4." Mol Gen Genet 
1992;235:292-303. 

964. Laminin EGF-like (Domains III and V) (lamininJEGF) 

This family is like EGF but has 8 conserved cysteines instead of 6. 
Number of members: 501 

[1] Engel J; Medline: 93041759 Laminins and other strange proteins." Biochemistry 
1992;31:10643-10651. 

965. Legume lectins signatures (lectinJegA) 

Leguminous plants synthesize sugar-binding proteins which are called legume lectins [1,2]. 
These lectins are generally found in the seeds. The exact function of legume lectins is not 
known but they may be involved in the attachment of nitrogen-fixing bacteria to legumes and 
in the protection against pathogens. Legume lectins bind calcium and manganese (or other 
transition metals). 

Legume lectins are synthesized as precursor proteins of about 230 to 260 amino acid 
residues. Some legume lectins are proteolytically processed to produce two chains: beta 
(which corresponds to the N-terminal) and alpha (C-terminal). The lectin concanavalin A 
(conA) from jack bean is exceptional in that the two chains are transposed and ligated (by 
formation of a new peptide bond). The N-terminus of mature conA thus corresponds to that 
of the alpha chain and the C-terminus to the beta chain. 

Two signature patterns were developed specific to legume lectins: the first is located in the C- 
terminal section of the beta chain and contains a conserved aspartic acid residue important for 
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the binding of calcium and manganese; the second one is located in the N-terminal of the 
alpha chain. 

Consensus pattern [LIV]-[STAG]-V-[DEQV]-[FLI]-D-[ST] [D binds manganese and 
calcium] Sequences known to belong to this class detected by the pattern ALL. 

Consensus pattern [LIV]-x-[EDO]-[FYWKR]-V-x-[LIVF]-G-[LF]-[ST] Sequences known to 
belong to this class detected by the pattern ALL. 

[ 1] Sharon N., Lis H. FASEB J. 4:3198-320(1990). 

[ 2] Lis FL, Sharon N. Annu. Rev. Biochem. 55:33-37(1986). 

966. Malate synthase signature (malate_synthase) 

Malate synthase (EC 4,1.3.2) catalyzes the aldol condensation of glyoxylate with acetyl-CoA 
to form malate - the second step of the glyoxylate bypass, an alternative to the tricarboxylic 
acid cycle in bacteria, fungi and plants. Malate synthase is a protein of 530 to 570 amino 
acids whose sequence is highly conserved across species [1]. As a signature pattern, a very 
conserved region was selected in the central section of the enzyme. 

Consensus pattern[KR]-[DENQ]-H-x(2)-G-L-N-x-G-x-W-D-Y-[LIVM]-F Sequences known 
to belong to this class detected by the pattern ALL. 

[ 1] Bruinenberg P.G., Blaauw M., Kazemier B., Ab G. Yeast 6:245-254(1990). 

967. MatK/TrnK amino terminal region (MatKJN) 

[1] Mohr G, Perlman PS, Lambowitz AM; Medline: 94077696 Evolutionary relationships 
among group II intron-encoded proteins and identification of a conserved domain that may be 
related to maturase function." Nucleic Acids Res 1993;21:4991-4997. 

Number of members: 495 



968. MOZ/SAS family (MOZ_SAS) 
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This region of these proteins has been suggested to be homologous to acety transferases [1]. 
However the similarity is not supported by standard sequence analysis. 
Number of members: 15 

[1] Kamine J, Elangovan B, Subramanian T, Coleman D, Chinnadurai G; Medline: 96182937 
Identification of a cellular protein that specifically interacts with the essential cysteine 
region of the HIV-1 Tat transactivator" Virology 1996;216:357-366. 
[2] Reifsnyder C, Lowell J, Clarke A, Pillus L; Medline: 96376969 Yeast SAS silencing 
genes and human genes associated with AML and HIV-1 Tat interactions are homologous 
with acetyltransferases" [see comments] [published erratum appears in Nat Genet 1997 
May;16(l):109] Nat Genet 1996;14:42-49. 

969. mRNA capping enzyme (mRNA_cap__enzyme) 

[1] Hakansson K, Doherty AJ, Shuman S, Wigley DB; Medline: 97304383 X-ray 
crystallography reveals a large conformational change during guanyl transfer by mRNA 
capping enzymes." Cell 1997;89:545-553. 

Number of members: 7 

970. DNA mismatch repair proteins mutS family signature (MutS__C) 

Mismatch repair contributes to the overall fidelity of DNA replication [1]. It involves the 
correction of mismatched base pairs that have been missed by the proofreading element of the 
DNA polymerase complex. The sequence of some proteins involved in mismatch repair in 
different organisms have been found to be evolutionary related [2,3]. One of these families is 
called mutS [4,E1], it consists of: 

- Prokaroytic protein mutS protein (also called hexA in Streptococcus pneumoniae). Muts is 
thought to carry out the mismatch recognition step of DNA repair. 

- Eukaryotic MSH1, which is involved in mitochondrial DNA repair. 

- Eukaryotic MSH2, which is involved in nuclear postreplication mismatch repair. MSH2 
heterodimerizes with MSH6. In man, MSH2 is involved in a form of familial hereditary 
nonpolyposis colon cancer (HNPCC). 
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- Eukaryotic MSH3, which is probably involved in the repair of large loops. 

- Eukaryotic MSH4, which is involved in meiotic recombination. 

- Eukaryotic MSH5, which is involved in meiotic recombination. 

- Eukaryotic MSH6 (also known as G/T mismatch binding protein), a DNA-repair protein 
that binds to G/T mismatches through heterodimerization with MSH2. 

- Prokaryotic protein mutS2 whose function is not yet known. 

- A coral (Sarcophyton glaucum) mitochondrial encoded mutS-like protein. 

As a signature pattern for this class of mismatch repair proteins a region rich in glycine and 
negatively charged residues was selected This region is found 

in the C-terminal section of these proteins; about 80 residues to the C-terminal of an ATP- 
binding site motif 'A' (P-loop) (see <PDOC00017>). 

Consensus pattem[ST]-[LIVMF]-x-[LIVM]-x-D-E-[LIVMFY]-[GC]-[RKH]-G-[GST]- x(4)- 
G Sequences known to belong to this class detected by the pattern ALL, except for mutS2. 

[ 1] Modrich P. Annu. Rev. Biochem. 56:435-466(1987). 

[ 2] Haber L.T., Walker G.C. EMBO J. 10:2707-2715(1991). 

[ 3] New L., Liu KL, Crouse G.F. Mol. Gen. Genet. 239:97-108(1993). 

[ 4] Eisen J.A. Nucleic Acids Res. 26:4291-4300(1998). 

971. MutS family, N-terminal putative DNA binding domain (MutSJN) 

This family consists of the N-terminal region of proteins in the mutS family of DNA 
mismatch repair proteins and is found associated with MutS_C located in the C-terminal 
region. The mutS family of proteins is named after the salmonella typhimurium MutS protein 
involved in mismatch repair; other members of the family included the eukaryotic MSH 
1,2,3,4,5 and 6 proteins. These have various roles in DNA repair and recombination. Human 
MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a 
mismatch binding protein [2]. The aligned region corresponds in part with domains Al ? A2 
(which may bind DNA) and B (which binds dsDNA in vitro) from T. thermophilus MutS as 
characterised in [1]. 
Number of members: 43 

972. Domain in Myosin and Kinesin Tails (MyTH4) 
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Domain present twice in myosin- Vila, and also present in 3 other myosins. 

[1] Chen ZY ? Hasson T, Kelley PM, Schwender BJ, Schwartz MF, Ramakrishnan M, 
Kimberling WJ, Mooseker MS, Corey DP; Medline: 97038686 Molecular cloning and 
domain structure of human myosin- Vila, the gene product defective in Usher syndrome IB." 
Genomics 1996;36:440-448. 

Number of members: 21 

973. Sodium and potassium ATPases beta subunits signatures (Na__K-ATPase) 

The sodium pump (Na+ ? K+ ATPase), located in the plasma membrane of all animal cells [1], 
is an heterotrimer of a catalytic subunit (alpha chain), a glycoprotein subunit of about 34 Kd 
(beta chain) and a small hydrophobic protein of about 6 Kd. The beta subunit seems [2] to 
regulate, through the assembly of alpha/beta heterodimers, the number of sodium pumps 
transported to the plasma membrane. 

Structurally the beta subunit is composed of a charged cytoplasmic domain of about 35 
residues, followed by a transmembrane region, and a large extracellular domain that contains 
three disulfide bonds and glycosylation sites. This structure is schematically represented in 
the figure below. 
+— + +--+ + + 1 1 1 1 1 1 

xxxxxxxxxxxxxxxxxxxxxxxxCxxxxCxCxxCxxxxxxxCxxxxxxxxxxxCxxxx 
**** **** <_Cyt-><TM>< Extracellular — > 

'C: conserved cysteine involved in a disulfide bond. 
'*': position of the patterns. 

Two isoforms of the beta subunit (beta-1 and beta-2) are currently known; they share about 
50% sequence identity. Gastric (K+, H+) ATPase (proton pump) responsible for acid 
production in the stomach consist of two subunits [3]; the beta chain is highly similar to the 
sodium pump beta subunits. Two signature patterns were developed for beta subunits. The 
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first is located in the cytoplasmic domain, while the second is found in the extracellular 
domain and contains two of the cysteines involved in disulfide bonds. 

Consensus pattern [FYW]-x(2)-[FYW]-x-[F^ 

Sequences known to belong to this class detected by the pattern ALL. 

Consensus pattern [RK]-x(2)~C-[RKQWI]-x(5)-L-x(2)-C-[SA]-G [The two Cs are involved 
in disulfide bonds] Sequences known to belong to this class detected by the patternALL, 
except for the beta subunit of the sodium pump of brine shrimp whose sequence is highly 
divergent in that region. 

[ 1] Horisberger J.D., Lemas V., Krahenbul LP. 5 Rossier B.C. Annu. Rev. Physiol. 53:565- 
584(1991). 

[ 2] McDonough A.A., Gerring K.> Farley R.A. FASEB J. 4:1598-1605(1990). 

[ 3] Toh B.-H. ? Gleeson P.A., Simpson R.J., Moritz R.L., Callaghan J.M. ? Goldkorn L, Jones 

C.M. ? Martinelli T.M., Mu F.-T., Humphris D.C., Pettitt J.M., Mori Y. ? Masuda T., 

Sobieszczuk P., Weinstock J., Mantamadiotis T., Baldwin G.S. Proc. Natl. Acad. Sci. U.S.A. 

87:6418-6422(1990). 

974. Respiratory-chain NADH dehydrogenase subunit 1 signatures (NADHdh) 

Respiratory-chain NADH dehydrogenase (EC 1.6.5.3) [1,2] (also known as complex I or 
NADH-ubiquinone oxidoreductase) is an oligomeric enzymatic complex located in the inner 
mitochondrial membrane which also seems to exist in the chloroplast and in cyanobacteria 
(as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide subunits of this 
bioenergetic enzyme complex there are fifteen which are located in the membrane part, seven 
of which are encoded by the mitochondrial and chloroplast genomes of most species. The 
most conserved of these organelle-encoded subunits is known as subunit 1 (gene ND1 in 
mitochondrion, and NDH1 in chloroplast) and seems to contain the ubiquinone binding site. 

The ND1 subunit is highly similar to subunit 4 of Escherichia coli formate hydrogenlyase 
(gene hycD), subunit C of hydrogenase-4 (gene hyfC). Paracoccus denitrificans NQ08 and 
Escherichia coli nuoH NADH-ubiquinone oxidoreductase subunits also belong to this family 
[3]. Two signature patterns were developed based on conserved regions of this subunit. 
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Consensus pattern G-[LIVMFYKRS]-[LIVMAGP]-Q-x-[LIVMFY]-x-D-[AGIM]- 
[LIVMFTA]- K-[LVMYST]-[LIVMFYG]-x-[KR]-[EQG] Sequences known to belong to this 
class detected by the patternALL, except for watermelon and Leishmania ND1. 

Consensus pattern P-F-D-[LIVMFYQ]-[STAGPVM]-E-[GAC]-E-x-[EQ]-[LIVMS]-x(2)-G 
Sequences known to belong to this class detected by the pattern ALL, except for 
Chlamydomonas reinhardtii and Pisaster ochraceus ND1, and tobacco NDHL 

[ 1] Ragan C.L Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T., Hofhaus G. ? Preis D. Eur. J. Biochem. 197:563-576(1991). 
[ 3] Weidner U., Geier S., Ptock A., Friedrich T., Leif H., Weiss H. J. Mol. Biol. 233:109- 
122(1993). 

975. Nickel-dependent hydrogenases large subunit signatures (NiFeSe Hases) 

Hydrogenases are enzymes that catalyze the reversible activation of hydrogen and which 
occur widely in prokaryotes as well as in some eukaryotes. There are various types of 
hydrogenases, but all of them seem to contain at least one iron-sulfur cluster. They can be 
broadly divided into two groups: hydrogenases containing nickel and, in some cases, also 
selenium (the [NiFe] and [NiFeSe] hydrogenases) and those lacking nickel (the [Fe] 
hydrogenases). 

The [NiFe] and [NiFeSe] hydrogenases are heterodimer that consist of a small subunit that 
contains a signal peptide and a large subunit. All the known large subunits seem to be 
evolutionary related [1]; they contain two Cys-x-x- Cys motifs; one at their N-terminal end; 
the other at their C-terminal end. These four cysteines are involved in the binding of nickel 
[2]. In the [NiFeSe] hydrogenases the first cysteine of the C-terminal motif is a 
selenocysteine which has experimentally been shown to be a nickel ligand [3]. Two patterns 
were developed which are centered on the Cys-x-x-Cys motifs. 

Alcaligenes eutrophus possess a NAD-reducing cytoplasmic hydrogenase (hoxS) [4]; this 
enzyme is composed of four subunits. Two of these subunits (beta and delta) are responsible 
for the hydrogenase reaction and are evolutionary related to the large and small subunits of 
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membrane-bound hydrogenases. The alpha subunit of coenzyme F420 hydrogenase (EC 
1.12.99.1) (FRH) from archaebacterial methanogens also belongs to this family. 

Consensus pattern R-G-[LIVMF]-E-x(15)-[QESM]-R-x-C-G-[LIVM]-C [The two C's are 
nickel ligands] Sequences known to belong to this class detected by the pattern ALL. 

Consensus pattern [FY]-D-P-C-[LIM]-[ASG]-C-x(2,3)-H [The two C's are nickel ligands] 
Sequences known to belong to this class detected by the pattern ALL. 

[ 1] Menon N.K., Robbins J., Peck H.D. Jr., Chatelus C.Y, Choi E.-S., Przybyla A.E. J. 
Bacteriol. 172:1969-1977(1990). 

[ 2] Volbeda A., Charon M.-H., Piras C, Hatchikian E.C., Frey M., Fontecilla-Camps J.C. 
Nature 373:580-587(1995). 

[ 3] Eidsness M.K., Scott R.A., Prickrill B., der Vartaninan D.V., LeGall J., Moura L, Moura 

J.J.G., Peck H.D. Jr. Proc. Natl. Acad. Sci. U.S.A. 86:147-151(1989). 

[ 4] Tran-Betcke A. ? Warnecke U., Boecker C, Zaborosch C, Friedrich B. J. Bacteriol. 

172:2920-2929(1990). 

976. NADH-Ubiquinone oxidoreductase (complex I), chain 5 C-terminus (oxidored_ql_C) 

This sub-family represents a carboxyl terminal extension of oxidored_ql. Only NADH- 
Ubiquinone chain 5 from chloroplasts are in this family. This sub-family is part of complex I 
which catalyses the transfer of two electrons from NADH to ubiquinone in a reaction that is 
associated with proton translocation across the membrane. 
Number of members: 572 

[1] Walker JE; Medline: 93110040 The NADH:ubiquinone oxidoreductase (complex I) of 
respiratory chains." Q Rev Biophys 1992;25:253-324. 

977. NADH-Ubiquinone oxidoreductase (complex I), chain 5 N-terminus (oxidored_ql_N) 

This sub-family represents an amino terminal extension of oxidored_ql. Only NADH- 
Ubiquinone chain 5 and eubacterial chain L are in this family. This sub-family is part of 
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complex I which catalyses the transfer of two electrons from NADH to ubiquinone in a 
reaction that is associated with proton translocation across the membrane. 
Number of members: 546 

[1] Walker JE; Medline: 93110040 The NADH:ubiquinone oxidoreductase (complex I) of 
respiratory chains." Q Rev Biophys 1992;25:253-324. 

978. oxidored_q2. NADH-UBIQUINONE OXIDOREDUCTASE CHAIN 4L (EC 1.6.5.3). 
ND4L OR NAD4L. Arabidopsis thaliana (Mouse-ear cress). Mitochondrion. OC Eukaryota; 
Viridiplantae; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; 
Rosidae; eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

CATALYTIC ACTIVITY: NADH + UBIQUINONE = NAD(+) + UBIQUINOL. 

[1] SEQUENCE FROM N.A. MEDLINE; 93156682. Brandt P., Sunkel S. ? Unseld M., 
Brennicke A. ? Knoop V.; "The nad4L gene is encoded between exon c of nad5 and orf25 in 
the Arabidopsis mitochondrial genome/ 1 ; Mol. Gen. Genet. 236:33-38(1992). 
[2] SEQUENCE FROM N.A. STRAIN=CV. COLUMBIA; MEDLINE; 97141919 Unseld 
M., Marienfeld J.R., Brandt P., Brennicke A.; "The mitochondrial genome of Arabidopsis 
thaliana contains 57 genes in 366,924 nucleotides."; Nat. Genet. 15:57-61(1997). 

979. oxidored_q4. Protein name NADH-PLASTOQUINONE OXIDOREDUCTASE CHAIN 
3, CHLOROPLAST. Synonym(s)EC 1.6.5.3. Gene name(s)NDHC OR NDH3 From Zea 
mays (Maize) Encoded on Chloroplast. Taxonomy Eukaryota; Viridiplantae; Embryophyta; 
Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; Zea. 

CATALYTIC ACTIVITY: NADH + PLASTOQUINONE = NAD(+) + 
PLASTOQUINOL. 

SIMILARITY: BELONGS TO THE COMPLEX I SUBUNIT 3 FAMILY. 

[1] SEQUENCE FROM NLA. MEDLINE; 89281491. Steinmueller K. ? Ley A.C., Steinmetz 
A.A., Sayre R.T., Bogorad L.; "Characterization of the ndhC-psbG-ORF157/159 operon of 
maize plastid DNA and of the cyanobacterium Synechocystis sp. PCC6803."; Mol. Gen. 
Genet. 216:60-69(1989). 

[2] SEQUENCE FROM N.A. MEDLINE; 95395841. Maier R.M. ? Neckermann K., Igloi 
G.L., Koessel H.; "Complete sequence of the maize chloroplast genome: gene content, 
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hotspots of divergence and fine tuning of genetic information by transcript editing."; J. Mol. 
Biol. 251:614-628(1995). 

980. PAC: PAC motif 

PAC motif occurs C-terminal to a subset of all known PAS motifs. It is proposed to 
contribute to the PAS domain fold [3]. Number of members: 181 

[1] Medline: 97446881 PAS domain S-boxes in archaea, bacteria and sensors for oxygen and 
redox. Zhulin IB, Taylor BL, Dixon R; Trends Biochem Sci 1997;22:331-333. 
[2] Medline: 95275818. 1.4 A structure of photoactive yellow protein, a cytosolic 
photoreceptor: unusual fold, active site, and chromophore. Borgstahl GE, Williams DR, 
Getzoff ED; Biochemistry 1995;34:6278-6287. 

[3] Medline: 98044337. PAS: a multifunctional domain family comes to light. Ponting CP, 
Aravind L; Curr Biol 1997;7:674-677. 

981. PARP: Poly(ADP-ribose) polymerase catalytic region. 

Poly(ADP-ribose) polymerase catalyses the covalent attachment of ADP-ribose units from 
NAD+ to itself and to a limited number of other DNA binding proteins, which decreases their 
affinity for DNA. Poly(ADP-ribose) polymerase is a regulatory component induced by DNA 
damage. 

The carboxyl-terminal region is the most highly conserved region of the protein. Experiments 
have shown that a carboxyl 40 kDa fragment is still catalytically active [2]. Number of 
members: 19 

[1] Medline: 96353841 Structure of the catalytic fragment of poly(AD-ribose) polymerase 
from chicken. Ruf A, Mennissier de Murcia J, de Murcia G, Schulz GE; Proc Natl Acad Sci 
USA 1996;93:7481-7485. 

[2] Medline: 93293867 The carboxyl-terminal domain of human poly(ADP-ribose) 
polymerase. Overproduction in Escherichia coli, large scale purification, and 
characterization. Simonin F, Hofferer L, Panzeter PL, Muller S, de Murcia G, Althaus FR; J 
Biol Chem 1993;268:13454-13461. 

982. PC_rep: Proteasome/cyclosome repeat 
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[1] Medline: 97348748 A repetitive sequence in subunits of the 26S proteasome and 20S 
cyclosome (anaphase-promoting complex). Lupas A, Baumeister W, Hofmann K; Trends 
Biochem Sci 1997;22:195-196. 
Number of members: 1 12 

983. Peptidase_Ml: Peptidase family Ml 

Members of this family are aminopeptidases. The members differ widely in specificity, 
hydrolysing acidic, basic or neutral N-terminal residues. This family includes leukotriene-A4 
hydrolase Swiss:P09960, this enzyme also has an aminopeptidase activity [1]. Number of 
members: 72 

[1] Medline: 95405261 Evolutionary families of metallopeptidases. Rawlings ND, Barrett AJ; 
Meth Enzymol 1995;248:183-228. 

984. Neutral zinc metallopeptidases, zinc-binding region signature (Peptidase__M8) 
PROSITE cross-reference(s) PS00142; ZINC_PROTEASE 

The majority of zinc-dependent metallopeptidases (with the notable exception of the 
carboxypeptidases) share a common pattern of primary structure [1,2,3] in the part of their 
sequence involved in the binding of zinc, and can be grouped together as a 
superfamily,known as the metzincins, on the basis of this sequence similarity. They can be 
classified into a number of distinct families [4,E1] which are listed below along with the 
proteases which are currently known to belong to these families. 
Family Ml 

- Bacterial aminopeptidase N (EC 3.4.11.2) (gene pepN). 

- Mammalian aminopeptidase N (EC 3.4.11.2). 

- Mammalian glutamyl aminopeptidase (EC 3.4.11.7) (aminopeptidase A). It may play a 
role in regulating growth and differentiation of early B-lineage cells. 

- Yeast aminopeptidase yscll (gene APE2). 

- Yeast alanine/arginine aminopeptidase (gene AAP1). 

- Yeast hypothetical protein YIL137c. 

- Leukotriene A-4 hydrolase (EC 3.3.2.6). This enzyme is responsible for the hydrolysis of 
an epoxide moiety of LTA-4 to form LTB-4; it has been shown that it binds zinc and is 
capable of peptidase activity. 
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Family M2 

- Angiotensin-converting enzyme (EC 3.4.15.1) (dipeptidyl carboxypeptidase I) (ACE) the 
enzyme responsible for hydrolyzing angiotensin I to angiotensin IL There are two forms 
of ACE: a testis-specific isozyme and a somatic isozyme which has two active centers. 
Family M3 

- Thimet oligopeptidase (EC 3.4.24.15), a mammalian enzyme involved in the cytoplasmic 
degradation of small peptides. 

- Neurolysin (EC 3.4.24.16) (also known as mitochondrial oligopeptidase M or microsomal 
endopeptidase). 

- Mitochondrial intermediate peptidase precursor (EC 3.4.24.59) (MIP). It is involved the 
second stage of processing of some proteins imported in the mitochondrion. 

- Yeast saccharolysin (EC 3.4.24.37) (proteinase yscD). 

-Escherichia coli and related bacteria dipeptidyl carboxypeptidase (EC 3.4.15.5) (gene 
dcp). 

- Escherichia coli and related bacteria oligopeptidase A (EC 3.4.24.70) (gene opdA or prlC). 

- Yeast hypothetical protein YKL134c. 
Family M4 

- Thermostable thermolysins (EC 3.4.24.27), and related thermolabile neutral proteases 
(bacillolysins) (EC 3.4.24.28) from various species of Bacillus. 

- Pseudolysin (EC 3.4.24.26) from Pseudomonas aeruginosa (gene lasB). 

- Extracellular elastase from Staphylococcus epidermidis. 

- Extracellular protease prtl from Erwinia carotovora. 

- Extracellular minor protease smp from Serratia marcescens. 

- Vibriolysin (EC 3.4.24.25) from various species of Vibrio. 

- Protease prtA from Listeria monocytogenes. 

- Extracellular proteinase proA from Legionella pneumophila. 

Family M5 

- Mycolysin (EC 3.4.24.31) from Streptomyces cacaoi. 
Family M6 

- Immune inhibitor A from Bacillus thuringiensis (gene ina). Ina degrades two classes of 
insect antibacterial proteins, attacins and cecropins. 



Attorney No. 2750-1237P 

778 

Family M7 

- Streptomyces extracellular small neutral proteases 
Family M8 

- Leishmanolysin (EC 3,4.24.36) (surface glycoprotein gp63), a cell surface protease from 
various species of Leishmania. 

Family M9 

- Microbial collagenase (EC 3.4.24.3) from Clostridium perfringens and Vibrio 
alginolyticus. 

Family Ml OA 

- Serralysin (EC 3.4.24.40), an extracellular metalloprotease from Serratia. 

- Alkaline metalloproteinase from Pseudomonas aeruginosa (gene aprA). 

- Secreted proteases A, B, C and G from Erwinia chrysanthemi. 

- Yeast hypothetical protein YIL108w. 

Family M10B 

- Mammalian extracellular matrix metalloproteinases (known as matrixins) [5]: MMP-1 (EC 
3.4.24.7) (interstitial collagenase), MMP-2 (EC 3.4.24.24) (72 Kd gelatinase), MMP-9 (EC 
3.4.24.35) (92 Kd gelatinase), MMP-7 (EC 3.4.24.23) (matrylisin), MMP-8 (EC 3.4.24.34) 
(neutrophil collagenase), MMP-3 (EC 3.4.24.17) (stromelysin-1), MMP-10 (EC 3.4.24.22) 
(stromelysin-2), and MMP-11 (stromelysin-3), MMP-12 (EC 3.4.24.65) (macrophage 
metalloelastase). 

- Sea urchin hatching enzyme (envelysin) (EC 3.4.24.12). A proteas that allows the 
embryo to digest the protective envelope derived from the egg extracellular matrix. 

- Soybean metalloendoproteinase 1. 

Family Mil 

- Chlamydomonas reinhardtii gamete lytic enzyme (GLE). 
Family M12A 

- Astacin (EC 3.4.24.21), a crayfish endoprotease. 

-MeprinA (EC 3.4.24.18), a mammalian kidney and intestinal brush border 
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metalloendopep tidase . 

- Bone morphogenic protein 1 (BMP-1), a protein which induces cartilage and bone 
formation and which expresses metalloendopeptidase activity. The Drosophila homolog 
ofBMP-1 is the dorsal-ventral patterning protein tolloid. 

- Blastula protease 10 (BP10) from Paracentrotus lividus and the related protein SpAN 
from Strongylocentrotus purpuratus. 

- Caenorhabditis elegans protein toh-2. 

- Caenorhabditis elegans hypothetical protein F42A10.8. 

- Choriolysins L and H (EC 3.4.24.67) (also known as embryonic hatching proteins LCE 
and HCE) from the fish Oryzias lapides. These proteases participates in the breakdown 
of the egg envelope, which is derived from the egg extracellular matrix, at the time of 
hatching. 

Family M12B 

- Snake venom metalloproteinases [6]. This subfamily mostly groups proteases that act in 
hemorrhage. Examples are: adamalysin II (EC 3.4.24.46), atrolysin C/D (EC 
3.4.24.42), atrolysin E (EC 3.4.24.44), fibrolase (EC 3.4.24.72), trimerelysin I (EC 
3.4.25.52) and II (EC 3.4.25.53). 

- Mouse cell surface antigen MS2. 

Family M13 

- Mammalian neprilysin (EC 3.4,24.11) (neutral endopeptidase) (NEP). 

- Endothelin-converting enzyme 1 (EC 3.4.24.71) (ECE-1), which process the precursor of 
endothelin to release the active peptide. 

- Kell blood group glycoprotein, a major antigenic protein of erythrocytes. The Kell protein 
is very probably a zinc endopeptidase. 

- Peptidase O from Lactococcus lactis (gene pepO). 

Family M27 

- Clostridial neurotoxins, including tetanus toxin (TeTx) and the various botulinum toxins 
(BoNT). These toxins are zinc proteases that block neurotransmitter release by 
proteolytic cleavage of synaptic proteins such as synaptobrevins, syntaxin and SNAP-25 
[7,8]. 
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Family M30 

- Staphylococcus hyicus neutral metalloprotease. 
Family M32 

- Thermostable carboxypeptidase 1 (EC 3.4.17.19) (carboxypeptidase Taq), an enzyme 
from Thermus aquaticus which is most active at high temperature. 

Family M34 

- Lethal factor (LF) from Bacillus anthracis, one of the three proteins composing the 
anthrax toxin. 

Family M35 

- Deuterolysin (EC 3.4.24.39) from Penicillium citrinum and related proteases from various 
species of Aspergillus. 

Family M36 

- Extracellular elastinolytic metalloproteinases from Aspergillus. 

From the tertiary structure of thermolysin, the position of the residues acting as zinc 
ligands and those involved in the catalytic activity are known. Two of the zinc ligands are 
histidines which are very close together in the sequence; C-terminal to the first histidine is 
a glutamic acid residue which acts as a nucleophile and promotes the attack of a water 
molecule on the carbonyl carbon of the substrate. A signature pattern which includes the 
two histidine and the glutamic acid residues is sufficient to detect this superfamily of 
proteins. 

Consensus pattern[GSTALIVN]-x(2)-H-E-[LIVMFYW]-{DEHRKP}-H-x- 

[LIVMFYWGSPQ] 

[The two H's are zinc ligands] [E is the active site residue] 

Sequences known to belong to this class detected by the patternALL, except 

for members of families M5, M7 amd Mil. 

Other sequence(s) detected in SWISS-PROT57; including Neurospora crassa 
conidiation-specific protein 13 which could be a zinc-protease. 
[l]Jongeneel C.V., Bouvier J., Bairoch A. FEBS Lett. 242:211-214(1989). 
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[2]Murphy G.J.P., Murphy G., Reynolds J.J. FEBS Lett. 289:4-7(1991). 

[3]Bode W., Grams F. ? Reinemer P., Gomis-Rueth F.-X., Baumann U. ? McKay D.B., 

Stoecker W. Zoology 99:237-246(1996). 

[4]Rawlings N.D., Barrett AJ. Meth. Enzymol. 248:183-228(1995). 
[5]Woessner J. Jr. FASEB J. 5:2145-2154(1991). 

[6]Hite LA., Fox J.W., Bjarnason J.B. Biol. Chem. Hoppe-Seyler 373:381-385(1992). 
[7]Montecucco C, Schiavo G. Trends Biochem. Sci. 18:324-327(1993). 
[8]Niemann H., Blasi J., Jahn R. Trends Cell Biol. 4:179-185(1994). 

985. PH04: Phosphate transporter family 

This family includes PHO-4 from Neurospora crassa which is a is a Na(+ )-phosphate 
symporter [1]. This family also contains the leukemia virus receptor Swiss:Q08344. Number 
of members: 41 

[1] Medline: 95249577 Repressible cation-phosphate symporters in Neurospora crassa. 
Versaw WK, Metzenberg RL; Proc Natl Acad Sci U S A 1995;92:3884-3887. 

986. Photosynthetic reaction center proteins signature (photoRC) 
PROSITE cross-reference(s): PS00244; REACTION__CENTER 

In the photosynthetic reaction center of purple bacteria, two homologous integral 
membrane proteins, L(ight) and M(edium), are known to be essential to the light-mediated 
water-splitting process. In the photosystem II of eukaryotic chloroplasts two related 
proteins are involved: the Dl (psbA) and D2 proteins (psbD). These four types of protein 
probably evolved from a common ancestor [see 1,2 for recent reviews]. 

A signature pattern was developed which include two conserved histidine residues. In L 
and M chains, the first histidine is a ligand of the magnesium ion of the special pair 
bacteriochlorophyll, the second is a ligand of a ferrous non-heme iron atom. In photosystem 
II these two histidines are thought to play a similar role. 

Consensus pattern[NQH]-x(4)-P-x-H-x(2)-[SAG]-x(ll)-[SAGC]-x-H-[SAG](2) 
[The first H is a magnesium ligand] [The second H is a iron ligand] 
Sequences known to belong to this class detected by the patternALL, except 
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for broad bean psbA which has Gin instead of the second His. 

[l]Michel H., Deisenhofer J. Biochemistry 27:1-7(1988). 
[2]Barber J. Trends Biochem. Sci. 12:321-326(1987). 

987. phytochrome: Phytochrome region 

This family contains a region specific to phytochrome proteins. Number of members: 
145 

988. PI3K_C2: C2 domain 

Phosphoinositide 3-kinase region postulated to contain a C2 domain. Outlier of C2 family. 
Number of members: 39 

[1] Medline: 97388296 Using structure to define the function of phosphoinositide 3-kinase 

family members. Domin J, Waterfield MD; FEBS Lett 1997;410:91-95. 

[2] Medline: 97398940 Phosphoinositide 3-kinases: a conserved family of signal transducers. 

Vanhaesebroeck B, Leevers SJ, Panayotou G, Waterfield MD; Trends Biochem Sci 

1997;22:267-272. 

989. PI3Ka: Phosphoinositide 3-kinase family, accessory domain (PIK domain) 
PIK domain is conserved in all PI3 and PI4-kinases. Its role is unclear but it has been 
suggested [2] to be involved in substrate presentation. 

Number of members: 47 

[1] Medline: 97388296 Using structure to define the function of phosphoinositide 3-kinase 
family members. Domin J, Waterfield MD; FEBS Lett 1997;410:91-95. 
[2] Medline: 94069320 Phosphatidylinositol 4-kinase: gene structure and requirement for 
yeast cell viability. Flanagan CA, Schnieders EA, Emerick AW, Kunisawa R, Admon A, 
Thorner J; Science 1993;262:1444-1448. 

990. P-II protein signatures 

PROSITE cross-reference(s): PS00496; PII_GLNB_UMP, PS00638; PII_GLNB_CTER 
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The P-II protein (gene glnB) is a bacterial protein important for the control of glutamine 
synthetase [1,2,3], In nitrogen-limiting conditions, when the ratio of glutamine to 2- 
ketoglutarate decreases, P-II is uridylylated on a tyrosine residue to form P-II-UMP. P-II- 
UMP allows the deadenylation of glutamine synthetase (GS), thus activating the enzyme. 
Conversely, in nitrogen excess, P-II-UMP is deuridylated and then promotes the adenylation 
of GS. P-II also indirectly controls the transcription of the GS gene (glnA) by preventing NR- 
II(ntrB) to phosphorylate NR-I (ntrC) which is the transcriptional activator of glnA. 
Once P-II is uridylylated, these events are reversed. 

P-II is a protein of about 110 amino acid residues extremely well conserved. The tyrosine 
which is urydylated is located in the central part of the protein. 

In cyanobacteria, P-II seems to be phosphorylated on a serine residue rather than being 
urydylated. 

In methanogenic archaebacteria, the nitrogenase iron protein gene (nifH) is followed by two 
open reading frames highly similar to the eubacterial P-II protein [4]. These proteins could 
be involved in the regulation of nitrogen fixation. 

In the red alga, Porphyra purpurea, there is a glnB homolog encoded in the chloroplast 
genome. 

Other proteins highly similar to glnB are: 

- Bacillus subtilis protein nrgB [5]. 

- Escherichia coli hypothetical protein ybal [6]. 

Two signature patterns were developed for P-II protein. The first one is a conserved 
stretch (in eubacteria) of six residues which contains the urydylated tyrosine, the other 
is derived from a conserved region in the C-terminal part of the P-II protein. 

Consensus pattern Y-[KR]-G-[AS]-[AE]-Y [The second Y is uridylated] 
Sequences known to belong to this class detected by the patternALL glnB's 
from eubacteria. 
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Consensus pattern[ST]-x(3)-G-[DY]-G-[KR]-[IV]-[FW]-[LIVM]-x(2)-[LIVM] 

[l]Magasanik B. Biochimie 71:1005-1012(1989). 

[2]Holtel A., Merrick M. Mol. Gen. Genet. 215:134-138(1988). 

[3]Cheah E., Carr P.D., Suffolk P.M., Vasuvedan S.G., Dixon N.E., Ollis D.L. Structure 
2:981-990(1994). 

[4]Sibold L., Henriquet M., Possot O., Aubert J.-P. Res. Microbiol. 142:5-12(1991). 
[5]Wray L.V. Jr., Atkinson M.R., Fisher S.H. J. Bacteriol. 176:108-114(1994). 
[6]Allikmets R., Gerrard B.C., Court D., Dean M.C. Gene 136:231-236(1993). 

991. PIP5K: Phosphatidylinositol-4-phosphate 5-Kinase 

This family contains a region from the common kinase core found in the type I 
phosphatidylinositol-4-phosphate 5-kinase (PIP5K) family as described in [1]. The family 
consists of various type I, II and III PIP5K enzymes. PIP5K catalyses the formation of 
phosphoinositol-4,5-bisphosphate via the phosphorylation of phosphatidylinositol-4- 
phosphate a precursor in the phosphinositide signaling pathway. Number of members: 33 

[1] Medline: 98204859. Type I phosphatidylinositol-4-phosphate 5-kinases. Cloning of the 
third isoform and deletion/substitution analysis of members of this novel lipid kinase family. 
Ishihara H, Shibasaki Y, Kizuki N, Wada T, Yazaki Y, Asano T, Oka Y; J Biol Chem 
1998;273:8741-8748. 

[2] Medline: 97115834 Type I phosphatidylinositol-4-phosphate 5-kinases are distinct 
members of this novel lipid kinase family. Loijens JC, Anderson RA; J Biol Chem 1996 
20;271:32937-32943. 

992. PolyA_pol: Poly A polymerase family 

This family includes nucleic acid independent RNA polymerases, such as Poly(A) 
polymerase, which adds the poly (A) tail to mRNA EC:2.7.7.19. This family also includes the 
tRNA nucleotidyltransferase that adds the CCA to the 3' of the tRNA 
EC:2.7.7.25. Number of members: 31 

[1] Medline: 93066242 Identification of the gene for an Escherichia coli poly(A) polymerase. 
Cao GJ, Sarkar N; Proc Natl Acad Sci U S A 1992;89:10380-10384. 
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993. Photosystem I psaA and psaB proteins signature (psaA__psaB) 
PROSITE cross-reference(s)PS00419; PHOTOSYSTEM_I_PSAAB 

Photosystem I (PSI) [1] is an integral membrane protein complex that uses light energy to 
mediate electron transfer from plastocyanin to ferredoxin. PSI is found in the chloroplast 
of plants and cyanobacteria. The electron transfer components of the reaction center of 
PSI are a primary electron donor P-700 (chlorophyll dimer) and five electron acceptors: AO 
(chlorophyll), Al (a phylloquinone) and three 4Fe-4S iron-sulfur centers: Fx ? Fa, and Fb. 

PsaA and psaB, two closely related proteins, are involved in the binding of P700, AO, Al, 
and Fx. psaA and psaB are both integral membrane proteins of 730 to 750 amino acids that 
seem to contain 11 transmembrane segments. The Fx 4Fe-4S iron-sulfur center is bound by 
four cysteines; two of these cysteines are provided by the psaA protein and the two others 
by psaB. The two cysteines in both proteins are proximal and located in a loop between 
the ninth and tenth transmembrane segments. A leucine zipper motif seems to be present [2] 
downstream of the cysteines and could contribute to dimerization of psaA/psaB. 

The signature pattern for these proteins is based on the perfectly conserved region that 
includes the two iron-sulfur binding cysteines. 

Consensus patternC-D-G-P-G-R-G-G-T-C [The two C's bind the iron-sulfur center] 

[l]Golbeck J.H. Biochim. Biophys. Acta 895:167-204(1987). 
[ 2]Webber A.N., Malkin R. FEBS Lett. 264:1-14(1990). 

994. PSBH: Photosystem II 10 kDa phosphoprotein 

This protein is phosphorylated in a light dependent reaction. 
Number of members: 20 

995. PsbJ 

This family consists of the photosystem II reaction center protein PsbJ from plants and 
Cyanobacteria. In Synechocystis sp. PCC 6803 PsbJ regulates the number of photosystem II 
centers in thylakoid membranes, it is a predicted 4kDa protein with one membrane spanning 
domain [1]. Number of members: 20 
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[1] Medline: 93131892. Genetic and immunological analyses of the cyanobacterium 
Synechocystis sp. PCC 6803 show that the protein encoded by the psbJ gene regulates the 
number of photosystem II centers in thylakoid membranes. Lind LK, Shukla VK, Nyhus KJ, 
Pakrasi HB; J Biol Chem 1993;268:1575-1579. 

996. PSBT: Photosystem II reaction centre T protein 

The exact function of this protein is unknown. It probably consists of a single transmembrane 
spanning helix. The Swiss:P37256 protein, appears to be (i) a novel photosystem II subunit 
and (ii) required for maintaining optimal photosystem II activity under adverse growth 
conditions [1]. Number of members: 17 

[1] Medline: 94298765. The chloroplast ycf8 open reading frame encodes a 
photosystem II polypeptide which maintains photosynthetic activity under adverse growth 
conditions. Monod C, Takahashi Y, Goldschmidt-Clermont M, Rochaix JD; EMBO J 
1994;13:2747-2754. 

997. PSI_8. PHOTOSYSTEM I REACTION CENTRE SUBUNIT VIII. Synonym(s)PSI-I. 
Gene name(s)PSAI. From Hordeum vulgare (Barley). Encoded on Chloroplast. Taxonomy 
Eukaryota; Viridiplantae; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; 
Liliopsida; Poales; Poaceae; Hordeum. 

MAY HELP IN THE ORGANIZATION OF THE PSAL SUBUNIT. BELONGS TO THE 
PSAI FAMILY. 

[1] SEQUENCE FROM N.A. MEDLINE; 90036933. Scheller H.V., Okkels J.S., Hoej P.B., 
Svendsen L, Roepstorff P., Moeller B.L.; "The primary structure of a 4.0-kDa photosystem I 
polypeptide encoded by the chloroplast psal gene."; J. Biol. Chem. 264:18402-18406(1989). 

998. PSI_PsaJ: Photosystem I reaction centre subunit IX / PsaJ 

This family consists of the photosystem I reaction centre subunit IX or PsaJ from various 
organisms including Synechocystis sp. (strain pec 6803), Pinus thunbergii (green pine) and 
Zea mays (maize). PsaJ Swiss:P19443 is a small 4.4kDa, chloroplastal encoded, hydrophobic 
subunit of the photosystem I reaction complex its function is not yet fully understood [1]. 
PsaJ can be cross-linked to PsaF Swiss:P12356 and has a single predicted transmembrane 
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domain it has a proposed role in maintaing PsaF in the correct orientation to allow for fast 
electron transfer from soluble donor proteins to P700+ [1]. Number of members: 18 

[1] Medline: 99238330. A large fraction of PsaF is nonfunctional in photosystem I complexes 
lacking the PsaJ subunit. Fischer N, Boudreau E, Hippler M, Drepper F, Haehnel W, Rochaix 
JD; Biochemistry 1999;38:5546-5552. 

[2] Medline: 93252282. Genes encoding eleven subunits of photosystem I from the 
thermophilic cyanobacterium Synechococcus sp. Muhlenhoff U, Haehnel W, Witt H, 
Herrmann RG; Gene 1993;127:71-78. 

999. PSII. Protein namePHOTOSYSTEM II P680 CHLOROPHYLL A APOPROTEIN. 
Synonym(s)CP-47 PROTEIN. Gene name(s)PSBB. From Hordeum vulgare (Barley), 
Encoded on Chloroplast. Taxonomy Eukaryota; Viridiplantae; Embryophyta; Tracheophyta; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; Hordeum. 

FUNCTION: THIS PROTEIN CONJUGATES WITH CHLOROPHYLL & 
CATALYZES THE PRIMARY LIGHT-INDUCED PHOTOCHEMICAL PROCESSES OF 
PHOTOSYSTEM II. SUBCELLULAR LOCATION: CHLOROPLAST THYLAKOID 
MEMBRANE. SIMILARITY: BELONGS TO THE PSBB / PSBC FAMILY. 

[1] SEQUENCE FROM N.A. STRAIN=CV. SABARLIS; MEDLINE; 89240047. Andreeva 
A.V., Buryakova A.A., Reverdatto S.V., Chakhmakhcheva O.G., Efimov V.A.; "Nucleotide 
sequence of the 5.2 kbp barley chloroplast DNA fragment, containing psbB-psbH-petB-petD 
gene cluster."; Nucleic Acids Res. 17:2859-2860(1989). 

[2] SEQUENCE FROM N.A. STRAIN=CV. SABARLIS; MEDLINE; 92207253. Efimov 
V.A., Andreeva A.V., Reverdatto S.V., Chakhmakhcheva O.G.; "Photosystem II of rye. 
Nucleotide sequence of the psbB, psbC, psbE, psbF, psbH genes of rye and chloroplast DNA 
regions adjacent to them."; Bioorg. Khim. 17:1369-1385(1991). 

[3] SEQUENCE OF 411-420. Hinz U.G.; "Isolation of the photosystem II reaction center 
complex from barley. Characterization by cicular dichroism spectroscopy and amino acid 
sequencing."; Carlsberg Res. Commun. 50:285-298(1985). 

1000. QRPTase. Quinolinate phosphoribosyl transferase. 

Quinolinate phosphoribosyl transferase (QPRTase) or nicotinate-nucleotide 

pyrophosphorylase EC:2.4.2.19 is involved in the de novo synthesis of NAD in both 
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prokaryotes and eukaryotes. It catalyses the reaction of quinolinic acid with 5- 
phosphoribosyl-l-pyrophosphate (PRPP) in the presence of Mg2+ to give rise to nicotinic 
acid mononucleotide (NaMN), pyrophosphate and carbon dioxide [1,2]. Number of members: 
26. 

[l]Medline: 97169443. A new function for a common fold: the crystal structure of quinolinic 
acid phosphoribosyltransferase. Eads JC, Ozturk D, Wexler TO, Grubmeyer C, Sacchettini 
JC; Structure 1997;5:47-58. 

[2]Medline: 96139309. The sequencing expression, purification, and steady-state kinetic 
analysis of quinolinate phosphoribosyl transferase from Escherichia coli. Bhatia R ? Calvo 
KC; Arch Biochem Biophys 1996;325:270-278. 

1001. R3H domain 

The name of the R3H domain comes from the characteristic spacing of the most conserved 
arginine and histidine residues. The function of the domain is predicted to be binding 
ssDNA. Number of members: 28 

[l]Medline: 99003905 The R3H motif: a domain that binds single-stranded nucleic acids. 
Grishin NV; Trends Biochem Sci 1998;23:329-330. 

1002. recF protein signatures (RecF) 

The prokaryotic protein recF [1,2] is a single-stranded DNA-binding protein which also 
probably binds ATP. RecF is involved in DNA metabolism; it is required for recombinational 
DNA repair and for induction of the SOS response. RecF is a protein of about 350 to 370 
amino acid residues; there is a conserved ATP-binding site motif A T (P-loop) in the N- 
terminal section of the protein as well as two other conserved regions, one located in the 
central section, and the other in the C-terminal section. Signature patterns were derived from 
these two regions. 

Consensus pattern [LIVM]-x(4)-[LIF]-x(6)-[LIF]-[LVF]-x-[GE]-[GSTAD]-[PA]- x(2)-R-R- 
x-[FYW]-[LIVMF]-D Sequences known to belong to this class detected by the pattern ALL. 
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Consensus pattern[LIVMFY](2)-x-D-x(2 ? 3)-[SA]-[EH]-L-D-x(2)-[KRH]-x(3)-L Sequences 
known to belong to this class detected by the patternALL, except for T. palidum recF. 

[ 1] Sandler S.J., Chackerian B., Li J.T., Clark A J. Nucleic Acids Res. 20:839-845(1992). 
[ 2] Alonso J.C., Fisher L.M.; MoL Gen. Genet. 246:680-686(1995). 

1003. RibD C-terminal domain (RibD_C) 

The function of this domain is not known, but it is thought to be involved in riboflavin 

biosynthesis. This domain is found in the C terminus of RibD/RibG Swiss:P25539, in 

combination with dCMP_cyt_deam ? as well as in isolation in some archaebacterial proteins 

Swiss:P95872. 

Number of members: 21 

1004. Ribosomal protein L16 signatures (Ribosomal_L16) 

Ribosomal protein L16 is one of the proteins from the large ribosomal subunit. In Escherichia 
coli, LI 6 is known to bind directly the 23S rRNA and to be located at the A site of the 
peptidyltransferase center. It belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities [1], groups: 

- Eubacterial L16. 

- Algal and plant chloroplast L16. 

- Cyanelle L16. 

- Plant mitochondrial LI 6. 

L16 is a protein of 133 to 185 amino-acid residues. As signature patterns, we 
selected two conserved regions in the central section of these proteins. 

Consensus pattern [KR](2)-x-[GSAC]-[HlQVA]-[LIVM]-W-[LIVM]-[KR]-[LIVM]- 
[LFY]-[AP] Sequences known to belong to this class detected by the pattern ALL. 

Consensus patternR-M-G-x-[GR]-K-G-x(4)-[FWKR] Sequences known to belong to this 
class detected by the patternALL. 

[ 1] Otaka E. ? Hashimoto T., Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 
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1005. Ribosomal protein L32e signature (Ribosomal_L32E) 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis 
of sequence similarities. One of these families consists of: 

- Mammalian L32 [1]. 
-Drosophila RP49 [2]. 

- Trichoderma harzianum L32 [3]. 

- Yeast L32e (YBL092w). 

- Archaebacterial L32e [4]. 

These proteins have 135 to 240 amino-acid residues. As a signature pattern, a stretch of about 
20 residues located in the N-terminal part of these proteins was seleced. 

Consensus patteniF-x-R-x(4H Sequences 
known to belong to this class detected by the pattern ALL. 

[ 1] Jacks CM., Powaser C.B., Hackett P.B. Gene 74:565-570(1988). 
[ 2] Aguade M. Mol. Biol. Evol. 5:433-441(1988). 

[ 3] Lora J.M., Garcia L, Benitez T., Llobell A., Pintor-Toro J .A. Nucleic Acids Res. 
21:3319-3319(1993). 

[ 4] Arndt E., Scholzen T., Kroemer W., Hatakeyama T., Kimura M. Biochimie 73:657- 
668(1991). 

1006. (Ribosomal_S3) Ribosomal protein S3 signature 

PROSITE: PDOC00474. PROSITE cross-reference(s) PS00548; RIBOSOMAL_S3 

Ribosomal protein S3 is one of the proteins from the small ribosomal subunit. 
In Escherichia coli, S3 is known to be involved in the binding of initiator Met-tRNA. It 
belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1], 
groups: 

-Eubacterial S3. 

-Algal and plant chloroplast S3. 

-Cyanelle S3. 

-Archaebacterial S3. 

-Plant mitochondrial S3. 
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-Vertebrate S3. 
-Insect S3. 

-Caenorhabditis elegans S3 (C23G10.3). 
-Yeast S3 (Rpl3). 

S3 is a protein of 209 to 559 amino-acid residues. A conserved region located in the C- 
terminal section was selected as a signature pattern. 

Consensus pattem[GSTA]-[KR]-x(6)-G-x-[LIV^ 

[LIV]-[DENQ]-x(7)-[LMT]-x(2)-G-x(2)-[GS]. Sequences known to belong to this class 
detected by the patternALL, except for some mitochondrial S3. 

[l]Otaka E. ; Hashimoto T. ? Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

1007. RimM - RimM 

The RimM protein is essential for efficient processing of 16S rRNA [1], The RimM protein 
was shown to have affinity for free ribosomal 30S subunits but not for 30S subunits in the 
70S ribosomes [1]. Number of members: 14. 

[l]Medline: 98083058. RimM and RbfA are essential for efficient processing of 16S rRNA in 
Escherichia colL Bylund GO, Wipemo LC, Lundberg LA, Wikstrom PM; J Bacteriol 
1998;180:73-82. 

1008. RNA_pol_A - RNA polymerase alpha subunit 

-!- RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes 
contain a single RNA polymerase compared to three in eukaryotes (not including 
mitochondrial and chloroplast polymerases). 

-!- Members of this family include: A subunit from eukaryotes, gamma subunit from 
cyanobacteria, beta' subunit from eubacteria, A' subunit from archaebacteria, B" from 
chloroplasts. Number of members: 139. 

[l]Medline: 97066998. Structural modules of the large subunits of RNA polymerase. 
Introducing archaebacterial and chloroplast split sites in the beta and beta' subunits of 
Escherichia coli RNA polymerase. Severinov K, Mustaev A, Kukarin A, Muzzin O, Bass I, 
Darst SA, Goldfarb A; J Biol Chem 1996;271:27969-27974. 
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1009. RuBisCOJarge - Ribulose bisphosphate carboxylase large chain active site 
PROSITE: PDOC00142; PROSITE cross-reference(s) PS00157; RUBISCOLARGE 

Ribulose bisphosphate carboxylase (EC 4.1.1.39) (RuBisCO) [1,2] catalyzes the 
initial step in Calvin r s reductive pentose phosphate cycle in plants as well as purple and green 
bacteria. It consists of a large catalytic unit and a small subunit of undetermined function. In 
plants, the large subunit is coded by the chloroplastic genome while the small subunit is 
encoded in the nuclear genome. Molecular activation of RuBisCO by C02 involves the 
formation of a carbamate with the epsilon-amino group of a conserved lysine residue. This 
carbamate is stabilized by a magnesium ion. One of the ligands of the magnesium ion is an 
aspartic acid residue close to the active site lysine [3]. A pattern was developed which 
includes both the active site residue and the metal ligand, and which is specific to RuBisCO 
large chains. 

Consensus patternG-x-[DN]-F-x-K-x~D-E [K is the active site residue] [The second D is a 
magnesium ligand]. Sequences known to belong to this class detected by the patternALL, 
except for Cheilopleuria biscuspis RuBisCO. 

[l]Miziorko H.M., Lorimer G.H. Annu. Rev. Biochem. 52:507-535(1983). 
[2]Akazawa T., Takabe T., Kobayashi H. Trends Biochem. Sci. 9:380-383(1984). 
[3]Andersson I., Knight S., Schneider G., Lindqvist Y. ? Lundqvist T., Branden C.-L, Lorimer 
G.H. Nature 337:229-234(1989). 

1010. Rve - Integrase core domain 

Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. 
Integrase is composed of three domains. The ammo-terminal domain is a zinc binding 
domain Integrase_Zn. This domain is the central catalytic domain. The carboxyl terminal 
domain that is a non-specific DNA binding domain integrase. The catalytic domain acts as an 
endonuclease when two nucleotides are removed from the 3 1 ends of the blunt-ended viral 
DNA made by reverse transcription. This domain also catalyses the DNA strand transfer 
reaction of the 3' ends of the viral DNA to the 5' ends of the integration site [1]. Number of 
members: 694. 
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[l]Medline: 95099322. Crystal structure of the catalytic domain of HIV-1 integrase: 
similarity to other polynucleotidyi transferases. Dyda F ? Hickman AB, Jenkins TM, 
Engelman A, Craigie R, Davies DR; Science 1994;266:1981-1986. 

1011. (SBP_bac_3) Bacterial extracellular solute-binding proteins, family 3 signature 
PROSITE: PDOC00798. PROSITE cross-reference(s) PS01039; SBP_BACTERLAL_3 

Bacterial high affinity transport systems are involved in active transport of solutes 
across the cytoplasmic membrane. The protein components of these traffic systems include 
one or two transmembrane protein components, one or two membrane-associated ATP- 
binding proteins (ABC transporters; see <PDOC00185>) and a high affinity periplasmic 
solute-binding protein. The later are thought to bind the substrate in the vicinity of the inner 
membrane, and to transfer it to a complex of inner membrane proteins for concentration into 
the cytoplasm. 

In gram-positive bacteria which are surrounded by a single membrane and have 
therefore no periplasmic region the equivalent proteins are bound to the membrane via an N- 
terminal lipid anchor. These homolog proteins do not play an integral role in the transport 
process per se, but probably serve as receptors to trigger or initiate translocation of the solute 
throught the membrane by binding to external sites of the integral membrane proteins of the 
efflux system. 

In addition at least some solute-binding proteins function in the initiation of sensory 
transduction pathways. 

On the basis of sequence similarities, the vast majority of these solute-binding 
proteins can be grouped [1] into eight families of clusters, which generally correlate with the 
nature of the solute bound. 

Family 3 groups together specific amino acids and opine-binding periplasmic proteins 
and a periplasmic homolog with catalytic activity: 

-Histidine-binding protein (gene hisJ) of Escherichia coli and related bacteria. An 
homologous lipoprotein exists in Neisseria gonorrhoeae. 

-Lysine/arginine/ornithine-binding proteins (LAO) (gene argT) of Escherichia coli and 
related bacteria are involved in the same transport system than hisJ. Both solute-binding 
proteins interact with a common membrane-bound receptor hisP of the binding protein 
dependent transport system HisQMP. 

-Glutamine-binding proteins (gene glnH) of Escherichia coli and Bacillus 
stearothermophilus . 
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-Glutamate-binding protein (gene gluB) of Corynebacterium glutamicum. 
-Arginine-binding proteins artl and artJ of Escherichia coli. 
-Nopaline-binding protein (gene nocT) from Agrobacterium tumefaciens. 
-Octopine-binding protein (gene occT) from Agrobacterium tumefaciens. 
-Major cell-binding factor (CBF1) (gene: peblA) from Campylobacter jejuni. 
-Bacteroides nodosus protein aabA. 

-Cyclohexadienyl/arogenate dehydratase of Pseudomonas aeruginosa, a periplasmic 

enzyme which forms an alternative pathway for phenylalanine biosynthesis. 

-Escherichia coli protein fliY. 

-Vibrio harveyi protein patH. 

-Escherichia coli hypothetical protein ydhW. 

-Bacillus subtilis hypothetical protein yckB* 

-Bacillus subtilis hypothetical protein yckK. 

The signature pattern is located near the N-terminus of the mature proteins. 

Consensus pattemG-[FYIL]-[DE]-[LIVMT]-[DE]-[LIVMF]-x(3)-[LIVMA]-[VAGC]-x(2)- 

[LIVMAGN] 

Sequences known to belong to this class detected by the patternALL. 
[l]Tam R. ? Saier M.H. Jr. Microbiol. Rev. 57:320-346(1993). 
1012. Sec7 - Sec7 domain 

The Sec7 domain is a guanine-nucleotide-exchange-factor (GEF)for the arf family [2]. 
Number of members: 32. 

[l]Medline: 98169075. Structure of the Sec7 domain of the Arf exchange factor. ARNO. 
Cherfils J, Menetrey J, Mathieu M, Le Bras G ? Robineau S, Beraud-Dufour S, Antonny B, 
Chardin P; Nature 1998;392:101-105. 

[2]Medline: 97100951. A human exchange factor for ARF contains Sec7- and pleckstrin- 
homology domains. Chardin P y Paris S, Antonny B, Robineau S, Beraud-Dufour S, Jackson 
CL ? Chabre M. Nature 1996;384:481-484. 



1013. SecA_protein. SecA protein, amino terminal region 
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SecA protein binds to the plasma membrane where it interacts with proOmpA to support 
translocation of proOmpA through the membrane. SecA protein achieves this translocation, 
in association with SecY protein, in an ATP dependent manner. SecA possesses the ATPase 
activity. The carboxyl terminus has similarity with the helicase carboxyl terminus. See 
Ribosomal_L5. Number of members: 45. 

[lJMedline: 98309858. Amino-terminal region of SecA is involved in the function of SecG 
for protein translocation into Escherichia coli membrane vesicles. Mori H, Sugiyama H, 
Yamanaka M, Sato K, Tagaya M, Mizushima S; J Biochem (Tokyo) 1998;124:122-129. 
[2]Medline: 89251629. SecA protein hydrolyzes ATP and is an essential component of the 
protein translocation ATPase of Escherichia coli. Lill R, Cunningham K, Brundage LA, Ito 
K, Oliver D, Wickner W; EMBO J 1989;8:961-966. 

1014. Seedstore_2S - 2S seed storage family 

Members of this family are composed of two chains (both included in the alignment), these 
are co-translated and later cleaved. The two chains are disulphide linked together. Number of 
members: 27. 

[l]Medline: 97121264. 1H NMR assignment and global fold of napin Bnlb, a representative 
2S albumin seed protein. Rico M, Bruix M, Gonzalez C, Monsalve RI, Rodriguez R; 
Biochemistry 1996;35:15672-15682. 

1015. Smr - Smr domain 

This family includes the Smr (Small MutS Related) proteins, and the C-terminal region of the 
MutS2 protein. It has been suggested that this domain interacts with the MutSl Swiss:P23909 
protein in the case of Smr proteins and with the N-terminal MutS related region of MutS2 
Swiss:P94545 [1]. Number of members: 14. 

[l]Medline: 10431172. Smr: a bacterial and eukaryotic homologue of the C-terminal region 
of the MutS2 family. Moreira D, Philippe H; Trends Biochem Sci 1999;24:298-300. 

1016. (SSF) Sodium:solute symporter family signatures and profile 

PROSITE: PDOC00429. PROSITE cross-reference(s)PS00456; NA_SOLUT_SYMP_l 
PS00457; NAJSOLUT_SYMP_2 PS50283; NA_SOLUTE__SYMP_3 
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It has been shown [1,2] that integral membrane proteins that mediate the intake of a 
wide variety of molecules with the concomitant uptake of sodium ions (sodium symporters) 
can be grouped, on the basis of sequence and functional similarities into a number of distinct 
families. One of these families is known as the sodiumrsolute symporter family (SSF) and 
currently consists of the following proteins: 
-Mammalian Na+/glucose co-transporter. 
-Mammalian Na+/myo-inositol co-transporter. 
-Mammalian Na+/nucleoside co-transporter. 
-Mammalian Na+/neutral amino acid co-transporter. 
-Escherichia coli Na+/proline symporter (gene putP). 
-Escherichia coli Na+/pantothenate symporter (gene panF). 
-Escherichia coli hypothetical protein yidK. 
-Escherichia coli hypothetical protein yjcG. 
-Bacillus subtilis hypothetical protein ywcA (ipa-31R). 

These integral membrane proteins are predicted to comprise at least ten membrane 
spanning domains. Two conserved regions were selected as signature patterns; the first one is 
located in the fourth transmembrane region and the second one in a loop between two 
transmembrane regions in the C-terrninal part of these proteins. 

Consensus pattern[GS]-x(2)-[LIY]-x(3)-[LIVMFYWSTAG](10)-[LIY]-[TAV]-x(2)-G-G- 
[LMF]-x-[SAP]. Sequences known to belong to this class detected by the patternALL. 
Consensus pattern[GAST]-[LIVM]-x(3)-[KR]-x(4)-G-A-x(2)-[GAS]-[LIVMGS]-[LIVMW]- 
[LIVMGAT]-G-x-[LIVMGA] Sequences known to belong to this class detected by the 
patternALL, except for E.coli yidK. 

Note this documentation entry is linked to both a signature pattern and a profile. As the 
profile is much more sensitive than the pattern, you should use it if you have access to the 
necessary software tools to do so. 

[l]Reizer J., Reizer A., Saier M.H. Jr. Res. Microbiol. 141:1069-1072(1991). 
[2]Reizer J., Reizer A., Saier M.H. Jr. Biochim. Biophys. Acta 1197:133-136(1994). 

1017. SurE - Survival protein SurE 

E. coli cells with the surE gene disrupted are found to survive poorly in stationary phase [1]. 
It is suggested that SurE may be involved in stress response. Yeast also contains a member of 
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the family Swiss:P38254. Swiss:P30887 can complement a mutation in acid phosphatase, 
suggesting that members of this family could be phosphatases. Number of members: 17. 

[l]Medline: 95014035. A new gene involved in stationary-phase survival located at 59 
minutes on the Escherichia coli chromosome. Li C, Ichikawa JK, Ravetto JJ, Kuo HC, Fu JC, 
Clarke S; J Bacteriol 1994;176:6015-6022. 

[2]Medline: 93046805. Complementation of Saccharomyces cerevisiae acid phosphatase 
mutation by a genomic sequence from the yeast Yarrowia lipolytica identifies a new 
phosphatase. Treton BY, Le Ball MT, Gaillardin CM; Curr Genet 1992;22:345-355. 

1018. Synuclein - Synuclein 

There are three types of synucleins in humans, these are called alpha, beta and gamma. 
Alpha synuclein has been found mutated in families with autosomal dominant Parkinson's 
disease. A peptide of alpha synuclein has also been found in amyloid plaques in Alzheimer's 
patients. Number of members: 12. 

[l]Medline: 98424410. The synuclein family. Lavedan C; Genome Res 1998;8:871-880. 

1019. (T-box) T-box domain signatures 

PROSITE: PDOC00972. PROSITE cross-reference(s) PS01283; TBOX_l PS01264; 
TBOX_2 

A number of eukaryotic DNA-binding proteins contain a domain of about 170 to 190 
amino acids known as the T-box domain [1,2,3] and which probably binds DNA. The T-box 
has first been found in the mice T locus (Brachyury) protein, a transcription factor involved 
in mesoderm differentiation. It has since been found in the following proteins: 
-Vertebrate and invertebrate homologs of the T protein. 
-Mammalian proteins TBX1 to TBX6. 

-Mammalian protein TBR1 which is expressed specifically in brain. 
-Xenopus laevis eomesodermin (eomes). 

-Xenopus laevis Vegt (or Antipodean), a transcription factor that activates the expression of 
wnt-8, eomes and Brachyury. 
-Chicken TbxT. 

-Drosophila protein optomotor-blind (omb). 

-Drosophila protein brachyenteron (byn) (also known as Trg), which is 
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required for the specification of the hindgut and anal pads. 
-Drosophila protein H15. 
-Caenorhabditis elegans protein tbx-12. 

-Caenorhabditis elegans hypothetical proteins F21H11.3, F40H6.4, T07C4.2, T07C4.6 and 
ZK177.10. 

Two conserved regions were selected as signature patterns for the T-domain. The first region 
corresponds to the N-terminal of the domain and the second one to the central part. 
Consensus patternL-W-x(2)-[FC]-x(3,4)-[NT]-E-M-[LIV](2)-T-x(2)-G-[RG]-[KRQ] 
Sequences known to belong to this class detected by the patternALL, except for C.elegans 
ZK177.10. 

Consensus pattem[LIVMYW]-H-[PADH]-[DE^ 

Sequences known to belong to this class detected by the patternALL, except for C.elegans 
tbx-12, ZK1 77.10 and Drosophila H15. 

[ljBollag R.J., Siegfried Z., Cebra-Thomas LA., Garvey N. ? Davison E.M., Silver L.M. Nat. 
Genet. 7:383-389(1994). 

[2] Agulnik S.I., Garvey N. ? Hancock S., Ruvinsky L, Chapman D.L., Agulnik I., Bollag R.J., 
Papaioannou V.E., Silver L.M. Genetics 144:249-254(1996). 
[3]Papaioannou V.E. Trends Genet. 13:212-213(1997). 

1020. Toprim - Toprim domain 

This is a conserved region from DNA primase. This corresponds to the Toprim domain 
common to DnaG primases, topoisomerases, OLD family nucleases and RecR proteins [1], 
Both DnaG motifs rV and V are present in the alignment, the DxD (V) motif may be involved 
in Mg2+ binding and mutations to the conserved glutamate (IV) completely abolish DnaG 
type primase activity [1]. DNA primase EG2.7.7.6 is a nucleotidyltransferase it synthesizes 
the oligoribonucleotide primers required for DNA replication on the lagging strand of the 
replication fork; it can also prime the leading stand and has been implicated in cell division 
[2]. Number of members: 133. 

[ljMedline: 98391745. Toprim-a conserved catalytic domain in type LA and II 
topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. Aravind L, 
Leipe DD, Koonin EV; Nucleic Acids Res 1998;26:4205-4213. 
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[2]Medline: 97368180. Cloning and analysis of the dnaG gene encoding Pseudomonas putida 
DNA primase. Szafranski P, Smith CL, Cantor CR; Biochim Biophys Acta 1997;1352:243- 
248. 

[3]Medline: 94124015. The Haemophilus influenzae dnaG sequence and conserved bacterial 
primase motifs. Versalovic J, Lupski JR; Gene 1993;136:281-286. 

1021. TraB - TraB family 

pADl is a hemolysin/bacteriocin plasmid originally identified in Enterococcus faecalis DS16. 
It encodes a mating response to a peptide sex pheromone, cADl, secreted by recipient 
bacteria. Once the plasmid pADl is acquired, production of the pheromone ceases--a trait 
related in part to a determinant designated traB. However a related protein is found in C. 
elegans Swiss:Q94217, suggesting that members of the TraB family have some more general 
function. Number of members: 12. 

[l]Medline: 94302142. Characterization of the determinant (traB) encoding sex pheromone 
shutdown by the hemolysin/bacteriocin plasmid pADl in Enterococcus faecalis. An FY, 
Clewell DB; Plasmid 1994;31:215-221. 

1022. (Transpo_mutator) Transposases, Mutator family, signature 
PROSITE: PDOC00770. PROSITE cross-reference(s) PS01007; 
TRANSPOSASE_MUTATOR 

Autonomous mobile genetic elements such as transposon or insertion sequences (IS) 
encode an enzyme, called transposase, required for excising and inserting the mobile element. 
On the basis of sequence similarities, transposases can be grouped into various families. One 
of these families has been shown [1,2 ? 3,E1] to consist of transposases from the following 
elements: 

-Mutator from Maize. 
-Isl201 from Lactobacillus helveticus. 
-Is905 from Lactococcus lactis. 
-M081 from Mycobacterium bovis. 
-Is6120 from Mycobacterium smegmatis. 
-Is406 from Pseudomonas cepacia. 
-IsRm3 from Rhizobium meliloti. 
-IsRm5 from Rhizobium meliloti. 
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-Is256 from Staphylococcus aureus. 
-IsT2 from Thiobacillus ferrooxidans. 

The maize Mutator transposase (MudrA) is a protein of 823 amino acids; the bacterial 
transposases listed above are proteins of 300 to 420 amino acids. These proteins contain a 
conserved domain of about 130 residues; a signature pattern was derived from the most 
conserved part of this domain. 

Consensus patternD-x(3)-G-[LIVMF]-x(6)-[STAV]-[LIVMFYW]-[PT]-x-[STAV]-x(2)- 
[QR]-x-C-x(2)-H. Sequences known to belong to this class detected by the patternALL. 

[l]Eisen J.A., Benito M-L, Walbot V. Nucleic Acids Res. 22:2634-2636(1994). 
[2]Guilhot C, Gicquel B. ? Davies J., Martin C. Mol. Microbiol. 6:107-113(1992). 
[3]Wood M.S., Byrne A., Lessie T.G. Gene 105:101-105(1991). 

1023. Transposase_8 - Transposase 

Transposase proteins are necessary for efficient DNA transposition. This family 
consists of various E. coli insertion elements and other bacterial transposases some of which 
are members of the IS3 family. Number of members: 58. 

[IJMedline: 97324595. Genetic organization and transposition properties of IS511. D. A. 
Mullin, D. L. Zies, A. H. Mullin, N. Caballera & B. Ely; Mol Gen Genet 1997;254:456-463. 
[2]Medline: 97128810. The use of an improved transposon mutagenesis system for DNA 
sequencing leads to the characterization of a new insertion sequence of Streptomyces lividans 
66. J. Fischer, H. Maier, P. Viell & J. Altenbuchner; Gene 1996;180:81-89. 
[3]Medline: 97074647. Identification and nucleotide sequence of Rhizobium meliloti 
insertion sequence ISRm6, a small transposable element that belongs to the IS3 family. S. 
Zekri & N. Toro; Gene 1996;175:43-48. 

1024. tRNA_int_endo - tRNA intron endonuclease 

Members of this family cleave pre tRNA at the 5 r and 3' splice sites to release the intron 
EC:3.L27.9. Number of members: 8. 
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[l]Medline: 97344075. Properties of H. volcanii tRNA intron endonuclease reveal a 
relationship between the archaeal and eucaryal tRNA intron processing systems. Kleman- 
Leyer K, Armbruster DW, Daniels CJ; Cell 1997;89:839-847. 

1025. Urease - Urease signatures 

PROSITE: PDOC00133PROSITE cross-reference(s) PS01120; UREASE J. PS00145; 
UREASE__2 

Urease (EC 3.5.1.5) is a nickel-binding enzyme that catalyzes the hydrolysis of urea 
to carbon dioxide and ammonia [1]. Historically, it was the first enzyme to be crystallized (in 
1926). It is mainly found in plant seeds, microorganisms and invertebrates. In plants, urease 
is a hexamer of identical chains. In bacteria [2] ? it consists of either two or three different 
subunits (alpha, beta and gamma). 

Urease binds two nickel ions per subunit; four histidine, an aspartate and a 
carbamated-lysine serve as ligands to these metals; an additional histidine is involved in the 
catalytic mechanism [3]. 

As signatures for this enzyme, a region that contains two histidine that bind one of the 
nickel ions and the region of the active site histidine was selected. 

Consensus pattern T-[AY]-[GA]-[GAT]-[LIVM]-D-x-H-[LIVM]-H-x(3)-P [The two H's bind 
nickel], Sequences known to belong to this class detected by the patternALL. 
Consensus pattern[LIVM](2)-[CT]-H-[HN]-L-x(3)-[LIVM]-x(2)-D-[LIVM]-x-F-A [H is the 
active site residue]. Sequences known to belong to this class detected by the patternALL. 

[l]Takishima K., Suga T., Mamiya G. Eur. J. Biochem. 175:151-165(1988). 

[2]Mobley H.L.T., Husinger R.P. Microbiol. Rev. 53:85-108(1989). 

[3]Jabri E., Carr M.B., Hausinger R.P., Karplus P.A. Science 268:998-1004(1995). 

1026. Urease_beta - Urease beta subunit. 

This subunit is known as alpha in Heliobacter. Number of members: 35. 

[l]Medline: 95273988. The crystal structure of urease from Klebsiella aerogenes. Jabri E, 
Carr MB, Hausinger RP, Karplus PA; Science 1995;268:998-1004. 



1027. UvrD-helicase - UvrD/REP helicase 
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The Rep family helicases are composed of four structural domains. The Rep family function 
as dimers. REP helicases catalyse ATP dependent unwinding of double stranded DNA to 
single stranded DNA. Swiss:P23478, Swiss:P08394 have large insertions near to the carboxy- 
terminus relative to other members of the family. Number of members: 52. 

[1] Medline: 97433075. Major domain swiveling revealed by the crystal structures of 
complexes of E. coli Rep helicase bound to single-stranded DNA and ADP. Korolev S, Hsieh 
J, Gauss GH, Lohman TM, Waksman G; Cell 1997;90:635-647. 

1028. V-type ATPase 116kDa subunit family (V_ATPase_sub_a) 

This family consists of the 116kDa V-type ATPase (vacuolar (H+)-ATPases) subunits, as 
well as V-type ATP synthase subunit i. The V-type ATPases family are proton pumps that 
acidify intracellular compartments in eukaryotic cells for example yeast central vacuoles, 
clathrin-coated and synaptic vesicles. They have important roles in membrane trafficking 
processes [1]. The 116kDa subunit (subunit a) in the V-type ATPase is part of the V0 
functional domain responsible for proton transport. The a subunit is a transmembrane 
glycoprotein with multiple putative transmembrane helices t has a hydrophilic amino 
terminal and a hydrophobic carboxy terminal [1,2]. It has roles in proton transport and 
assembly of the V-type ATPase complex [1,2]. This subunit is encoded by two homologous 
gene in yeast VPH1 and STV1 [2]. 
Number of members: 27 

[1] Forgac M; Medline: 99240666 Structure and properties of the vacuolar (H+)-ATPases." 
J Biol Chem 1999;274:12951-12954. 

[2] Forgac M; Medline: 99270697 Structure and properties of the clathrin-coated vesicle and 
yeast vacuolar V-ATPases. ?? J Bioenerg Biomembr 1999;31:57-65. 

1029. Viral (Superfamily 1) RNA helicase (Viraljielicasel) 
Number of members: 260 

[1] Koonin EV, Dolja W; Medline: 94094568 Evolution and taxonomy of positive-strand 
RNA viruses: implications of comparative analysis of amino acid sequences." Crit Rev 
Biochem Mol Biol 1993;28:375-430. 
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1030. Vesicular monoamine transporter (VMAT) 

This family consists of various vesicular amine transporters with 12 transmembrane helices. 
These included vesicular acetylcholine transporters (VAChT) [3], and vesicular monoamine 
transporters (VMATs) [1,2] isoforms 1 adrenal and 2 brain (VMAT1 and VMAT2). 

These proteins transport biogenic amines into synaptic vesicles or chromaffin granules [4]. 
VMATs pack monoamine neurotransmitters into secretary vesicles for regulated exocytotic 
release, they also protect against the parkinsonian neurotoxins MPP+ by transporting it into 
vesicles preventing it from acting on mitochondria [1]. 

Also in the family is C elegans UNO 17 a putative vesicular acetylcholine transporter 
mutations in UNC-17 cause impaired neuromuscular function, giving rise to jerky or 
uncoordinated movement, [4]. 
Number of members: 15 

[1] Krantz DE, Peter D, Liu Y, Edwards RH; Medline: 97197857 Phosphorylation of a 
vesicular monoamine transporter by casein kinase IL" J Biol Chem 1997;272:6752-6759. 
[2] Erickson JD, Varoqui H, Schafer MK, Modi W, Diebler MF, Weihe E, Rand J, Eiden LE, 
Bonner TI, Usdin TB; Medline: 94350930 Functional identification of a vesicular 
acetylcholine transporter and its expression from a 'cholinergic' gene locus." J Biol Chem 
1994;269:21929-21932. 

[3] Erickson JD, Schafer MK, Bonner TI, Eiden LE, Weihe E; Medline: 96209876 Distinct 
pharmacological properties and distribution in neurons and endocrine cells of two isoforms of 
the human vesicular monoamine transporter." Proc Natl Acad Sci U S A 1996;93:5166-5171. 
[4] Alfonso A, Grundahl K, Duerr JS, Han HP, Rand JB; Medline: 3342494 The 
Caenorhabditis elegans unc-17 gene: a putative vesicular acetylcholine transporter." Science 
1993;261:617-619. 

1031. WW/rsp5/WWP domain signature and profile. Cross-reference(s): PS01159; 
WW_DOMAIN_l; PS50020; WW DOMA1N 2 
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The WW domain [1-4,E1] (also known as rsp5 or WWP) has been originally discovered as a 
short conserved region in a number of unrelated proteins, among them dystrophin, the gene 
responsible for Duchenne muscular dystrophy. The domain, which spans about 35 residues, 
is repeated up to 4 times in some proteins. It has been shown [5] to bind proteins with 
particular proline-motifs, [AP]-P-P-[AP]-Y, and thus resembles somewhat SH3 domains. It 
appears to contain beta-strands grouped around four conserved aromatic positions; generally 
Trp. The name WW or WWP derives from the presence of these Trp as well as that of a 
conserved Pro. It is frequently associated with other domains typical for proteins in signal 
transduction processes. 

Proteins containing the WW domain are listed below. 

-Dystrophin, a multidomain cytoskeletal protein. Its longest alternatively spliced form 
consists of an N-terminal actin-binding domain, followed by 24 spectrin-like repeats, a 
cysteine-rich calcium-binding domain and a C-terminal globular domain. Dystrophin form 
tetramers and is thought to have multiple functions including involvement in membrane 
stability, transduction of contractile forces to the extracellular environment and organization 
of membrane specialization. Mutations in the dystrophin gene lead to muscular dystrophy of 
Duchenne or Becker type. Dystrophin contains one WW domain C-terminal of the spectrin- 
repeats. 

-Utrophin, a dystrophin-like protein of unknown function. 

-Vertebrate YAP protein is a substrate of an unknown serine kinase. It binds to the SH3 

domain of the Yes oncoprotein via a proline-rich region. This protein appears in alternatively 

spliced isoforms, containing either one or two WW domains [6]. 

-Mouse NEDD-4 plays a role in the embryonic development and differentiation of the 

central nervous system. It contains 3 WW modules followed by a HECT domain. The 

human ortholog contains 4 WW domains, but the third WW domain is probably spliced 

resulting in an alternate NEDD-4 protein with only 3 WW modules [3]. 

-Yeast RSP5 is similar to NEDD-4 in its molecular organization. It contains an N-terminal 

C2 domain (see <PDOC00380>), followed by a histidine-rich region, 3 WW domains and a 

HECT domain. 

-Rat FE65, a transcription-factor activator expressed preferentially in liver. The activator 
domain is located within the N-terminal 232 residues of FE65, which also contain the WW 
domain. 
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--Yeast ESS1/PTF1, a putative peptidyl prolyl cis-trans isomerase from family ppiC (see 
<PDOC00840>). A related protein, dodo (gene dod) exists in Drosophila and in mammals 
(gene PIN1). 

-Tobacco DB10 protein. The WW domain is located N-terminal to the region with 
5 similarity to ATP-dependent RNA helicases. 

--IQGAP, a human GTPase activating protein acting on ras. It contains an N-terminal 
domain similar to fly muscle mp20 protein and a C-terminal ras GTPase activator domain. 
-Yeast pre-mRNA processing protein PRP40, Caenorhabditis elegans ZK1098.1 and fission 
yeast SpAC13C5.02 are related proteins with similarity to MY02-type myosin, each 
1 0 containing two WW-domains at the N-terminus. 

-Caenorhabditis elegans hypothetical protein C38D4.5, which contains one WW module, a 
PH domain (see <PDOC50003>) and a C-terminal phosphatidylinositol 3-kinase domain. 
-Yeast hypothetical protein YFLOlOc. 

1 5 For the sensitive detection of WW domains, a profile was developed which spans the whole 
homology region as well as a pattern. 

Description of pattern(s) and/or profile(s): 
20 Consensus patternW-x(9,ll)-[VFY]-[FYW]-x(6 ? 7)-[GSTNE]-[GSTQCR]-[FYW]-x(2)-P. 
[ 1] Bork P., Sudol M. Trends Biochem. Sci. 19:531-533(1994). 

[ 2] Andre B., Springael J.Y. Biochem. Biophys. Res. Commun. 205:1201-1205(1994). 
[ 3] Hofmann K.O., Bucher P. FEBS Lett. 358:153-157(1995). 
25 [4] Sudol ML, Chen H.L, Bougeret C, Einbond A., Bork P. FEBS Lett. 369:67-71(1995). 
[ 5] Chen H.I., Sudol M. Proc. NatL Acad. Sci. U.S.A. 92:7819-7823(1995). 
[ 6] Sudol ML, Bork P., Einbond A., Kastury K., Druck T., Negrini M., Huebner 
K., Lehman D. J. Biol. Chem. 270:14733-14741(1995). 

3 0 1032. XPA protein signatures, cross-reference(s): XPAJ. PROSITE PS00752; 
PS00753;XPA_2. 

Xeroderma pigmentosum (XP) [1] is a human autosomal recessive disease, 
characterized by a high incidence of sunlight-induced skin cancer. People's 
skin cells with this condition are hypersensitive to ultraviolet light, due 
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to defects in the incision step of DNA excision repair. There are a minimum of 
seven genetic complementation groups involved in this pathway: XP-A to XP-G. 
XP-A is the most severe form of the disease and is due to defects in a 30 Kd 
nuclear protein called XPA (or XPAC) [2]. 

The sequence of the XPA protein is conserved from higher eukaryotes [3] to 
yeast (gene RAD14) [4]. XPA is a hydrophilic protein of 247 to 296 amino-acid 
residues which has a C4-type zinc finger motif in its central section. 

Two signature were developed patterns for XPA proteins. The first corresponds to the 
zinc finger region, the second to a highly conserved region located some 12 residues after the 
zinc finger region. 

Consensus patternC-x-[DE]-C-x(3)-[LIVMF]-x(l ? 2)-D-x(2)-L-x(3)-F-x(4)-C-x(2)-C 
Consensus pattern[LIVM](2)-T-[KR]-T-E^ 

[ 1] Tanaka K., Wood R.D. Trends Biochem. Sci. 19:83-86(1994). 

[ 2] Miura N., Miyamoto I., Asahina H., Satokata L, Tanaka K., Okada Y. J. Biol. Chem. 

266:19786-19789(1991). 

[ 3] Shimamoto T., Kohno K., Tanaka K., Okada Y. Biochem. Biophys. Res. Commun. 
181:1231-1237(1991). 

[ 4] Bankmann ML, Prakash L., Prakash S. Nature 355:555-558(1992). 

1033. YCF9 

This family consists of the hypothetical protein product of the YCF9 gene from 
chloroplasts and cyanobacteria. Number of members: 16 

1034. (DUF15) 

It is highly conserved between eubacteria and eukaryotes. 
Number of members: 30 

1035. Lumenal portion of Cytochrome b559, alpha (gene psbE) subunit. (cytochr_b559a) 
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This family is the lumenal portion of cytochrome b559 alpha chain, matches to this family 
should be accompanied by a match to the cytochr__b559 family also. The Prosite pattern 
pattern matches the transmembrane region of the cytochrome b559 alpha and beta subunits. 
Number of members : 1 6 



A. Asparaginase 2 

Asparaginase II (L-asparagine aminohydrolase II) is an extracellular protein that may be 
associated with the cell wall and whose expression is affected by the availability of nitrogen. 
Asparaginase II catalyzes the reaction of L-Asparagine + H 2 0 = L- Aspartate + NH 3 . As 
many leukemias have high requirements for aspartic acid, asparaginase II proteins are useful 
as reagents for screening compounds for activity as leukemia chemotherapy products. 
Asparaginase II protein can also be over- or under-expressed to alter amino acid content in 
plant tissues or to modify nitrogen fixation and/or nitrogen metabolism in plants. 

Ref: Bon et al. (1997) Appl Biochem Biotechnol 63-65: 203-12 

B. Chloroa b-bind 

Chlorophyll a-b binding proteins are located in the thylakoid membranes of the chloroplast 
and bind chlorophyll a and chlorophyll b, thereby triggering a chemical reaction 
(photosynthesis). These proteins are useful in controlling the rate, efficiency and/or output of 
photosynthesis. Overexpression of chlorophyll a-b binding proteins is expected to increase 
the rate of photosynthesis. 

Ref: Leutwiler et al. (1986) Nucleic Acids Res 14: 4051-64 
Brandt et al. (1992) Plant Mol Biol 19: 699-703 



C. DMRL synthase 
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DMRL Synthase (6,7-Dimethyl-8-Ribityllumazine Synthase) catalyzes the last step in 
riboflavin (Vitamin B 2 ) synthesis, condensing 5-amino-6-(l ? -D)-ribityl-amino-2,4(lH, 3H)- 
Pyrimidinedione with L-3,4-Dihydroxy-2-Butanone 4-Phosphate producing 6,7-Dimethyl-8- 
(l-D-Ribityl)Luminazine . The enzyme forms a homopentamer. Engineering of these 
proteins or those with homologous sequences/structures may allow control of the amounts of 
vitamin B 2 available in plants and/or accumulation of pigment, as well as altering reactions 
requiring hydrogen ion carriers/transmitters. 

Ref: Garcia-Ramirez et al. (1995) J Biol Chem 270: 23801-7 

D. E1_N 

These proteins are ATP-dependent DNA helicases that are required for initiation of viral 
DNA replication. They form a complex with the viral E2 protein. The E1-E2 complex binds 
to the replication origin that contains binding sites for both proteins. The majority of 
sequences known for this group of proteins are from various papillomaviruses, a type of 
double stranded DNA virus. In plants, the prototype double stranded DNA virus is 
Cauliflower Mosaic virus (CaMV). Manipulation of these proteins, especially to produce 
variant proteins that form non-productive complexes, enables production of plants that are 
resistant to infection by double stranded DNA viruses. 

Ref: Yang et aL (1993) PNAS USA 90: 5086-90 

Ustav and Stenlund (1991) EMBO J 10: 449-57 
Callaway et aL (1996) Mol Plant Microbe Interact 9: 810-8 

E. BF1_G 

Elongation Factor-1 is composed of four subunits: alpha, beta, delta and gamma. Gamma 
subunits are presumed to play a role in anchoring the complex to other cellular components. 
Studies of EF-1 genes in plants suggests that different forms of the EF-1 subunits may be 
expressed in particular organs or in response to stress. Manipulation of the activity of these 
proteins, either by altered expression level or by structural mutation, may result in the 
accumulation of a particular protein in a chosen organ or allow production of particular 
proteins during stress conditions. 
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Ref: Kinzy et al. (1994) NAR 22: 2703-7 

Dunn et al. (1993) Plant Mol Biol 23: 221-5 
Aguilar et al. (1991) Plant Mol Biol 17: 351-60 

F. ENV_polyprotein 

This family comprises the envelope or coat proteins known from a number of different 
retroviruses. In mammalian species, retroviruses are responsible for diseases such as 
leukemia and HIV. In plants, retroviruses are known in both monocot (e.g. Zeon-1) and dicot 
(e.g. Arabidopsis and tobacco) species and have been shown to induce mutant alleles at new 
loci. Engineering of plant ENV proteins may allow mobilization or targeting of endogenous 
or introduced retroviruses, in essence generating a new method for mutant production, gene 
tagging and the like. 

Ref: Mamoun et al (1990) J Virol 64: 4180-8 

Grandbastien et al. (1989) Nature 337: 376-80 
Wright and Voytas (1998) Genetics 149: 703-15 

n. Glycosyl_hvdr9 

Proteins having this domain (previously known as the glycosyl hydrolase family 5 domain) 
catalyze the endohydrolysis of 1,4-p-D-glucosidic linkages in cellulose. Numerous plant 
proteins with this domain exist and are expressed in an organ specific manner. They are 
involved in the fruit ripening process, in cell elongation and plant reproduction. Modulation 
of the activity of these proteins, either by over- or under-expression or by mutation of the 
polypeptide, could be used to affect post-harvest physiology (e.g. rate of ripening) or for 
engineering reproductive sterility. 

Ref: Giorda et al. (1990) Biochemistry 29: 7264-9 
Tucker et al. (1988) Plant Physiol 88: 1257-62 
Shani et al. (1997) 43: 837-42 
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Milligan and Gasser (1995) Plant Mol Biol 28: 691-711 

H. niycosyl_hvdrl4 

The P-amylases (family 14 of glycosyl hydrolases) catalyze the hydrolysis of 1,4-a- 
glucosidic linkages in polysaccharides and remove successive maltose units from the non- 
reducing ends of the chains. Mutants of p-amylase in Arabidopsis exhibited altered 
degradation of starch throughout the diurnal cycle. In addition, the mutant phenotypes 
indicated that these enzymes not only affect carbohydrate metabolism/catabolism, but also 
influence the amount of pigment stored within particular cells. Manipulation of the p-amylase 
genes enables control of plant pigmentation (for example, fibre pigment in cotton) as well as 
carbohydrate synthesis and degradation. 

Ref: Zeeman et al. (1998) Plant J 15: 357-65 

Hirano and Nakamura (1997) Plant Physiol 114: 5675-82 
Kitamoto et al. (1988) J Bacteriol 170: 5848-54 

L GlycosylJiydrl5 

Glycosyl hydrolases from family 15 (such as 1,4-Alpha-D-Glucan glucohydrolase,) catalyze 
the hydrolysis of terminal 1,4-linked alpha-D-glucose residues successively from the non- 
reducing ends of the chains resulting in the release of p^D-Glucose. In plants these proteins 
have been tied to the mobilization of the xyloglucan stored in the cotyledonary cell walls. 
Proteins such as these could be varied to affect the rate of plant growth (for example during 
germination), storage and/or use of glucose and other sugars by plant tissues and alteration of 
the properties, such as elasticity, of plant cell walls. 

Ref: Crombie et al. (1998) Plant J 15: 27-38 

Hata et al. (1991) Agric Biol Chem 55: 941-9 



J. Glycosyl_hydr20 



Attorney No. 2750-1237P 

811 

Members of the family 20 glycosyl hydrolases catalyze the hydrolysis of terminal non- 
reducing N-acetly-D-hexosamine residues in N-acetyl-p-D-hexosaminides. N-acetyl-P - 
glucosaminidase belongs to this family and exists in several different forms (consisting of 
various combinations of alpha and beta chains) depending on the organism. Family 20 
glycosyl hydrolases have been implicated in lysosomal storage diseases (such as Sandhoff 
disease) and glycogen storage disease in humans. These types of proteins are also 
responsible for the hydrolysis of chitin. In plants, these proteins could be useful in 
controlling carbohydrate catabolism, thereby influencing the amount of sugars available for 
storage and/or use in other metabolic pathways. In addition, it is possible that such proteins 
could be used to engineer an endogenous insect protection mechanism, e.g. by secretion of a 
chitin-hydrolyzing composition by the plant. 

Ref: Graham et al (1988) J Biol Chem 263: 16823-9 
O'Dowd et al. (1988) Biochemistry 27: 5216-26 

K. HMG box 

The HMG box is a novel type of DNA-binding domain found in a diverse group of proteins. 
Numerous plant proteins contain this domain, such as the HMGl/2-like proteins. The 
expression of some of these HMG proteins appears to be regulated by circadian rhythms and 
in a light dependent manner, occurring at higher levels in roots, for example and lower levels 
in light-grown tissues such as cotyledons. Generally, HMG proteins are thought to influence 
transcription regulation. In plants, HMGs are believed to have a role in maintaining patterns 
of circadian-regulated expression for other genes, suggesting that these proteins could be 
exploited to control growth and development. 

Ref: Laudet et al. (1993) Nucleic Acids Res 21: 2493-501 
Zheng et al. (1993) Plant Mol Biol 23: 813-23 
Grasser et al. (1993) Plant Mol Biol 23: 619-25 

L. IL2 
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Interleukin-2 (IL-2)is produced in mammals by T cells in response to antigenic or mitogenic 
stimulation and is crucial for proper regulation and functioning of the immune response. IL-2 
is capable of stimulating B cells, monocytes, lymphokine-activated killer cells, natural killer 
cells and glioma cells. Plant extracts have also been shown to stimulate the immune system 
(for example, mistletoe therapy for human cancer). It is known that IL-2 is involved in 
feedback inhibition pathways that impact the inflammatory response as well as the growth 
inhibition of tumor reactive T cells. Plant proteins containing IL-2-like sequences are useful 
as immunity-based therapeutics, acting in a manner similar to IL-2 in mammals. 

Ref: Heike et al. (1997) Scand J Immunol 45: 221-6 
Ariel et al. (1998) J Immunol 161: 2465-72 
Schink (1997) Anticancer Drugs 8 Suppl 1: S47-51 

M. Oxidored_FMN 

NADPH dehydrogenases catalyze the reaction NADPH + acceptor = NADP(-h) + reduced 
acceptor. One member of this family is yeast old yellow enzyme" (OYE) and is thought to 
be involved in oxylipin metabolism. A second yeast family member is a protein that binds 
estrogen binding protein (EBP) in addition to exhibiting oxidoreductase activity. An 
Arabidopsis homolog to OYE has been described and estrogen binding proteins in plants 
have been reported. Plant proteins from this class have the potential to be used to modify 
lipid metabolism/catabolism. These proteins may also have use as therapeutics for breast and 
prostate cancer, and other abnormal growth in steroid-sensitive tissues. 

Ref: Baker et al. (1998) Proc Soc Exp Biol Med 217: 317-21 
Schaller and Weiler (1997) J Biol Chem 272: 28066-72 
Mandani et al. (1994) PNAS USA 91: 922-6 

N. Oxidored q2 

The NADH-plastoquinone oxidoreductases catalyze the reaction NADH + plastoquinone = 
NAD(+) + plastoquinol. In plants these reactions occur in the chloroplast and are believed to 
participate in a chloroplast respiratory system. Here, the NDH complex is postulated to act as 
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a valve to remove excess reduction equivalents in the chloroplasts. Manipulation of these 
proteins may improve the rate or efficiency of photosynthesis. 

Ref: Burrows et al. (1998) EMBO J 17: 868-76 

Kofer et al (1998) Mol Gen Genet 258: 166-73 
Maier et al. (1995) J Mol Biol 251: 614-28 

O. PABP 

Polyadenylate binding proteins bind the poly (A) tail of mRNA. Plants, as exemplified by 
Arabidopsis, contain numerous PABP genes that are expressed in an organ-specific manner. 
For example, PABP2 is functional in roots and shoots, while PABP5 is expressed 
predominantly in immature flowers. The PABP proteins are implicated in numerous aspects 
of posttranscriptional regulation including mRNA turnover and translational initiation. 
Control of activity of PABP proteins provides the ability to control the expression of various 
genes in particular organs during development. 

Ref: Hilson et al (1993) Plant Physiol 103: 525-33 

Belostotsky and Meagher (1993) PNAS USA 90: 6686-90 

P. Parvo coat 

Parvoviruses are linear single-stranded DNA viruses that are encapsulated by three capsid 
proteins. Plants are susceptible to infection by single stranded DNA viruses such as Maize 
streak virus (MSV) and various Gemini viruses. The coat proteins in these plant viruses are 
critical to the virus life cycle within the plant. For example, the coat protein of MSV is 
thought to be involved in intra- and inter-cellular movement within the plant. Engineering of 
proteins having similarity to parvoviral coat proteins, especially to produce proteins that 
interfere with maturation of the virus particle, enables the production of plants having better 
resistance to natural plant single-stranded DNA viruses. 

Ref: Liu et al. (1997) J Gen Virol 78: 1265-70 
Rohde et al. (1990) Virology 176: 648-51 
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Plant serine/threonine protein kinases possessing this domain are expressed in all tissues and 
are known to undergo serine-specific autophosphorylation and specifically phosphorylate two 
ribosomal proteins, P14 and P16. During development, these proteins predominate during 
high metabolic activity in growing buds, root tips, leaf margins and germinating seeds. They 
are thought to be involved in the control of plant growth and development. In addition, two 
genes encoding proteins from this family have been described that help plant cells adapt 
during cold or high salt stresses. Consequently, engineering Pkinase C proteins provides a 
way to control general growth/development of the plant as well as a means to provide 
endogenous protection against environmental stresses. 

Ref: Zhang et al. (1994) J Biol Chem 269: 17586-92 

Mizoguchi et al. (1995) FEBS Lett 358: 199-204 

R. REV 

The REV proteins act post-transcriptionally to relieve negative repression of GAG and ENV 
production in retroviruses such as Human Immounodeficiency Virus type I (HIV-1). Plants 
contain retrovirus-like viruses such as pararetroviruses and retrotransposons (i.e. transposons 
having long terminal repeats). Plant retrotransposons in particular have been used to create 
mutations at various loci, thereby permitting gene isolation, gene tagging and the like. 
Manipulation of plant REV proteins enables control of transposition frequencies of 
corresponding transposable elements and provides a new tool for genetic engineering of 
plants. 

Ref: Sodroski et al. (1986) Nature 321: 412-7 

Franchini et al. (1989) PNAS USA 86: 2433-7 
Marquet et al. (1995) 77: 113-24 
Grandbastien et al. (1989) Nature 337: 376-80 
Wright and Voytas (1998) Genetics 149: 703-15 

S. RuBisCo small 
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Ribulose 1,5-bisphosphate carboxylase/oxygenase (RuBisCo) catalyzes the initial step in the 
C3 photosynthetic carbon reduction cycle, adding carbon dioxide to D-ribulose 1,5- 
bisphosphate to form two molecules of 3-phospho-D-glycerate. RuBisCo is comprised of 
two subunits, one large which is synthesized in the chloroplast, and one small which is 
synthesized in the cytoplasm and then transported in to the chloroplast. The expression of the 
small subunit of RuBisCo is light regulated. Manipulation of these proteins could increase 
the efficiency of photosynthesis or allow alterations in developmental timing. 

Ref: Giuliano et al. (1988) PNAS USA 85: 7089-93 
Dedonder et al. (1993) Plant Physiol 101: 801-8 

T. Sialyltransf 

Members of the CMP-N-acetylneuraminate-(3-galactosamide-a-2,3-sialyltransferase family 
catalyze the following reaction: 

CMP-N-acetylneuraminate + |3-D-galactosyl-l ? 3-N-acetyl-a-D-galactosaminyl-R = CMP + 
a-N-acetylneraminyl-2,3-p-D-galactosyl-l ? 3-N-acetyl-alpha-D-galactosaminyl-R. These 
proteins are though to be responsible for the synthesis of the sequence neurac-a-23-gal -p- 
1,3-galnac- found on sugar chains )-linked to threonine or serine and also as a terminal 
sequence on certain gangliosides in mammalian cells. In plants, glycosyltransferases in the 
Golgi apparatus synthesize cell wall polysaccharides and elaborate the complex glycans of 
glycoproteins. Engineering of plant sialyltransferases allows targeting of proteins to 
particular cellular locations or enables the making of changes in cell wall structure. 

Ref: Wee et al. (1998) Plant Cell 10: 1759-68 

Lee et al. (1994) J Biol Chem 269: 10028-33 

Kitagawa and Paulson (1994) J Biol Chem 269: 1394-401 

U. Signal 

Many plant proteins in this family contain sequences similar to those found in both 
components of the prokaryotic family of signal transducers known as the two-component 
systems. This suggests that activation may require a transfer of a phosphate group between 
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the transmitter domain and the receiver domain. One family member in Arabidopsis appears 
to be involved in ethylene (a plant hormone) signal transduction. Other proteins in this family 
appear to be involved in the regulation of gene transcription under conditions of 
environmental stress. Signal proteins can be exploited to affect plant growth and development 
and/or control plant responses to stress conditions such as cold ? nutrient availability, etc. 

Ref: Chang et aL (1993) Science 262: 539-44 
Nagaya et al. (1993) Gene 131: 119-124 
Gottfert et al. (1990) PNAS USA 87: 2680-4 

V. vMSA 

vMSA proteins are major surface antigens presenting on the envelope of various 
retroviruses. Surface antigens of retroviruses are often involved in tropism of the virus. 
Plants contain retrovirus-like viruses such as pararetroviruses and retrotransposons (i.e. 
transposons having long terminal repeats). Plant retrotransposons in particular have been 
used to create mutants at various loci, thereby permitting gene isolation, gene tagging and the 
like. Manipulation of plant vMSA proteins enables control of tropism of plant retroviruses 
that might be used for genetic engineering tools, thus enabling targeting of the virus to 
particular species and/or tissues of plants. 

Ref: Okamoto et aL (1988) J Gen Virol 69: 2575-83 
Grandbastien et al. (1989) Nature 337: 376-80 
Wright and Voytas (1998) Genetics 149: 703-15 

W. zf-CCCH 

This family of proteins is defined by having two CX(8)CX(5)CX(3)H-type zinc finger 
domains. These proteins cover a broad range of functions. For example, the COP1 protein 
acts as a repressor of photomorphogenesis in darkness; light stimuli abolish this suppressive 
action. In addition, COP1 protein can function as a negative transcriptional regulator capable 
of direct interaction with components of the G-protein signaling pathway. As a second 
example, a zf-CCCH protein identified in Arabidopsis appears to be involved in the 
resistance to DNA damage induced by UV light and chemical DNA-damaging agents. 
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Overexpression of this class of proteins permits production of plants that are better suited to 
adverse environments. Manipulation of expression of zf-CCCH proteins functioning as 
transcriptional regulators, such as COP1, enables manipulation of some signal transduction 
pathways. 

Ref: Pang et al. (1993) Nucleic Acids Res 21: 1647-53 
Deng et al. (1992) Cell 71: 791-801 

X. zf-RanBP 

Proteins falling within this category contain many X-X-F-G and X-F-X-F-G repeats, and may 
contain RANBPl-like or PPIase domains. Plant proteins having domains similar to these 
include PAS1 and GMSTL PAS1 has been shown to have dramatic developmental affects 
that appear to be correlated with both cell division and cell wall elongation. GMSTI has high 
identity to the yeast STI stress-inducible gene and has been shown to be heat inducible. 
Proteins such as these may be useful for controlling growth and form of development. 

Ref: Vittorioso et al. (1998) Mol Cell Biol 18: 3034-43 
Hernandez Torres et al. (1995) 27: 1221-6 

Y. Peptidase M48. 

Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cofactor 
and are located in the membranes of the endoplasmic reticulum. They function in NH 2 - 
terminal proteolytic processing, as shown for the yeast STE24 gene product. This gene is 
required for the correct processing of a-factor, a yeast pheromone. Family M48 peptidases 
also appear to be required for some prenylation reactions, mediating COOH-terminal CAAX 
processing. Prenylation reactions are believed to be involved in the regulation of protein- 
protein and protein-membrane interactions. As an example, RAS GTPase activity is 
regulated in part by localization to the inner side of the plasma membrane upon prenylation. 
In plants, proteins from this family could be involved in pollen-stigma interactions such as 
those mediating self-pollenation vs. outcrossing, or could be members of several secondary 
metabolism pathways. 
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Ref: Fujimura-Kamada et aL (1997) J Cell Biol. 136: 271-85. Tarn et al. (1998) J Cell 
Biol. 142: 635-49. 

Z. DNA Pol Viral N 

The DNA pol Viral N domain is located at the N-terminal region of DNA polymerase 
isolated from several retroid viruses such as the Cauliflower Mosaic Virus. The domain 
motif has also been found in numerous other species from humans to cyanobacteria. In these 
organisms, this motif seems to be associated with two types of sequences; retrotransposons 
and mitochondrial genes. In the mitochondrial sequences this domain is potentially involved 
in the self-splicing conducted by group II introns. Various manipulations of this gene in 
plants allows control of the numerous retrotransposons endogenous to plant genomes or 
allows engineering of mitochondrial function, especially to increase efficiency of energy 
utilization by cells. 

REF: Chapdelaine and Bonen (1991) Cell 65: 465-72 
Ferat and Miche (1993) Nature 364: 358-61 
Wilson et aL (1994) 368: 32-8 
Cambareri et al. (1994) 242: 658-65 
Gaardner et aL (1981) NAR 9: 2871-2888 
Cummings et al. (1990) Curr Genet 17: 375-402 
Hattori et aL (1986) Nature 321: 625-8 

Aa. Calpainjnhib 
This domain is found in calpastatin, an inhibitor protein specific for calpain. Calpain 
is a non-lysosomal calcium-dependent intracellular protease that appears to be involved in 
the dynamic changes of the cytoskeleton, especially actin-related structures, during early 
Drosophila embryogenesis [1]. Calpastatins co-exist in cells with calpains and the subcellular 
distribution of calpastatin is thought to be important to calpain regulation [2]. In plants 
calpains and calpastatins could be involved in embryogenesis and non-embryogenic organ 
reiteration. Mutations occurring in calpain inhibitor repeat domains would produce 
developmental abnormalities such as abnormal leaf, root or flower development. 



Refs 
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1 Emori Y and Saigo K (1994) J Biol Chem 269: 25 137-42. 

2 Mellgren RL, Lane RD, Mericle MT (1989) Biochim Biophys Acta 999: 71-77. 

Ab. chorismate_bind 
Chorismate binding domains are present in plant anthranilate synthase (AS) genes. AS 
genes catalyze the first step in the biosynthesis of tryptophan by converting chorismate and 
L-glutamine to anthranilate, pyruvate and L-glutamate. Some of these genes are involved in 
feedback inhibition by tryptophan [1] while some are feedback insensitive [2]. In 
Arabidopsis, two AS genes have overlapping, but different distributions. One of these AS 
genes is induced by wounding and bacterial pathogen infiltration [1]. Mutations in the 
chorismate binding domain would affect the production of tryptophan and could influence the 
plant's defense system. AS gene products can be used for in vitro synthesis of tryptophan 
and tryptophan derivatives. 

Refs 

1 Niyogi KK, Fink GR (1992) Plant Cell 4: 721-33. 

2 Song HS, Brotherton JE ? Gonzales RA, Wilholm JM (1998) Plant Physiol 117:533- 
43. 

Ac. latej3rotein_L2 
Papillomaviruses are encapsulated double stranded DNA viruses. Plants are susceptible to 
infection by double stranded DNA viruses such as Cauliflower Mosaic virus (CaMV). The 
coat proteins in these plant viruses are critical to the virus life cycle within the plant. For 
example, the coat protein of CaMV is thought to be involved in intra- and inter-cellular 
movement within the plant [1]. Engineering of proteins having similarity to papillomavirus 
coat proteins may enable the production of plants having better resistance to natural plant 
double stranded DNA viruses. 

Refs 

1 Thompson SR ? Melcher U (1993) J Gen Virol 74: 1141-8. 
Ad. Peptidase__M41 

Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cofactor 
and are integral membrane proteins. They seem to be involved in the degradation of carboxy- 
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terminal-tagged cytoplasmic proteins. In plants, these proteins are located in the thylakoid 
membranes of the chloroplasts, their expression is light regulated and they are thought to be 
involved in degradation of soluble stromal proteins and turn-over of thylkoid proteins [1]. 
Manipulation of expression and structure of these proteins would have effects on the 
efficiency of photosynthesis and the development of chloroplasts. 

Refs 

1 Lindahl M, Tabak s, Cseke L, Pichersky E, Andersson B, Adam Z (1996) J Biol 
Chem 271: 29329-34. 

Ae. UPF0051 

There is some evidence that, in plants, proteins in this family are involved in ATP synthesis 
in chloroplasts [1, 2]. Mutations in these proteins or altering their expression would affect 
the efficiency of photosynthesis and energy production. 

Refs 

1 Kostrzewa M, Zetsche K (1992) J Mol Biol 227: 961-70. 

2 Kostrzewa M, Zetsche K (1993) Plant Mol Biol 23: 67-76 

AL E7 

Papillomaviruses are encapsulated double stranded DNA viruses. The Papillomavirus early 
protein 7 (E7) is known as a potent immortalizing and transforming agent. Transformation by 
E7 is thought to be mediated by the physical association of E7 with cellular proteins 
regulating entry into the cell cycle [1]. The result is entry into the cell cycle and suppression 
of terminal differentiation in mammalian cells. Thus, engineering of proteins having 
similarity to papillomavirus E7 protein enables the production of plants having altered 
cellular proliferation characteristics and possibly altered morphology. For example, 
overexpression of E7-like proteins would be expected to result in proliferation of cells of the 
tissue in which the E7 protein is expressed, perhaps with suppression of differentiation 
events. Thus, for example, overexpression of E7-like proteins in meristem cells can result in 
taller plants and suppression of leafing and/or flowering. 



Refs 

1 Zwerschke W, Jansen-Durr P Adv Cancer Res 2000;78:1-29 
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Ag. Peptidase U7 

This protein is known to be an integral membrane protein in the cyanobacterium 
Synechocystis where it functions to digest cleaved signal peptides [1]. This activity is 
necessary to maintain proper secretion of mature proteins across the membrane. In higher 
plants this protein may be present in the plastid or chloroplast membranes where it would 
function by enabling protein movement into and out of the chloroplasts. Mutations in this 
protein would be expected to affect the development of plastids, including chloroplasts, or 
alter the energy transfer system within the chloroplasts, thereby affecting growth and 
development. 
Refs 

1 Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, 

Hirosawa M, Sugiura M, Sasamoto S ? Kimura T, Hosouchi T, Matsuno A, Muraki A, 
Nakazaki N, Naruo K, Okumura S, Shimpo S, Takeuchi C, Wada T, Watanabe A, 
Yamada M, Yasuda M, Tabata S (1996) DNA Res 3:109-36. 

Ah. 5'-3 ? Exonuclease 

The 5-3' exonuclease domain is one found in bacterial DNA polymerases I and in yeast DNA 
repair enzymes such as Exonuclease I. Yeast Exo I is involved in mitotic recombination and 
also includes a domain that interacts with the mismatch repair protein MSH2. The 5-3 ! 
exonuclease domain is also present in XPG DNA repair enzymes in humans and in yeast 
RAD9 protein. Defects in XPG proteins result in Xeroderma Pigmentosum. Thus defects in 
5-3 f exonuclease domain-containing proteins in plants are expected to lead to defects in DNA 
repair and corresponding high spontaneous and inducible mutation rates. Consensus 
sequence: 

IMKKKLLLVDGSSLAFRAFFALPPLTNSAGEPTNAWGFLKMLIKLffi 
FDAKAKTFRHELYEGYKAGRAP 

TPDELREQIPLIKELLDALGIPLLEVAGYEADDVIGTLAKIAEKEGYEVLIVTGDRDLL 
QLVSDHVTVIITKKGIAEFTL 

FTPEAVIEKYGLTPEQIIDYKALMGDSSDNIPGVKGIGEKTAAKLLQEYGSLEGIYANL 
DKLKGKKLREKLLAHKEDAKL 

SRDLATIKTDVPLDLTLDDLRLPDPDRDALDLLFDE 
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Ref: 

Fiorentini P. et al. RT. Mol. Cell. Biol. 17:2764-2773(1997). 

Tishkoff et al. Cancer Res. 0:0-0(1998). 

Macinnes M.A. et al. Mol. Cell. Biol. 13:6393-6402(1993). 
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Table A 



Pfarn 



3 5 exonudease 



Prosfte 



FuB Name 



3'-5' exonudease 



Accession number PF01612 
Definition: 3'-5' exonudease 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 
Source of seed members: Pfam-B 659 (release 4.1) 
Gathering cutoffs: -11 -11 
Trusted cutoffs: -1 0.70 -1 0.70 
Noise cutoffs: -24.50 -24.50 

HMM build command line: hmmbuild HMM SEED 
HMM build command line: hmmcalibrate -seed 0 HMM 
[11 

85137890 

Structure of large fragment of Escherichia 



Reference Number: 
Reference Medline: 
Reference Title: 
coli DNA 
Reference Title: 
Reference Author: 
Steitz TA; 

Reference Location: 
Reference Number: 
Reference Medline: 
Reference Title: 
coli DNA polymerase 
Reference Title: 
domains. 
Reference Author: 
IS; 

Reference Location: 
Reference Number: 
Reference Medline: 
Reference Title: 
Werner 

Reference Title: 
Reference Author: 
J; 

Reference Location: 
Reference Number: 
Reference Medline: 
Reference Title: 
heiicase. 

Reference Author: 
Blank A, Sopher BL, 
Reference Author: 
Reference Location: 
Reference Number: 
Reference Medline: 
Reference Title: 
gene product 
Reference Title: 
Reference Author: 
Kuromitsu J, Kitao S, 
Reference Author: 
Reference Location: 
Database Reference: 
PDBSUM] 

Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Comment: 



polymerase I complexed with dTMP. 
Ollis DL, Brick P, Hamlin R, Xuong NG, 

Nature 1985;313:762-766. 
[2] 

98060913 

The proofreading domain of Escherichia 

I and other DNA and/or RNA exonudease 

Moser MJ, Holley WR, Chatterjee A, Mian 

Nucleic Acids Res 1 997;25:51 1 0-51 1 8. 
[3] 

98361165 

Replication focus-forming activity 1 and the 

syndrome gene product 
Yan H, Chen CY, Kobayashi R, Newport 

Nat Genet 1998;19:375-378. 
[41 

97434221 

The Werner syndrome protein is a DNA 

Gray MD, Shen JC, Kamath-Loeb AS, 

Martin GM, Oshima J, Loeb LA; 
Nat Genet 1997;17:100-103. 
[5] 

97370026 

DNA helicase activity in Werner's syndrome 

synthesized in a baculovirus system. 
Suzuki N, Shimamoto A, Imamura O, 

Goto M, Furuichi Y; 

Nucleic Acids Res 1997;25:2973-2978. 
SCOP; 1dpi, fa; [SCOP-USA][CATH- 

INTERPRO; IPR002562; 
PDB, 1kfd ; 348; 518; 
PDB; 1 d8y A; 348; 518; 
PDB; 1d9d A; 348; 518; 
PDB; 1d9f A; 348; 518; 
PDB; 1kfsA; 348; 518; 
PDB; 1 kin A; 348; 518; 
PDB; 1krp A; 348; 518; 
PDB; 1ksp A; 348; 518; 
PDB; 1qsl A; 348; 518; 
PDB; 2kfn A; 348; 518; 
PDB; 2kfz A; 348; 518; 
PDB; 2kzm A; 348; 518; 



PDB; 2kzz A; 348; 518; 
This domain is responsible for the 3'-5' 
exonudease proofreading 

Comment: activity of E. coli DNA polymerase I (poll) 
and other enzymes, 
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Comment: it catalyses the hydrolysis of unpaired or 
mismatched nucleotides. 

Comment: This domain consists of the amino-terminal 
half of the Kienow fragment 

Comment: in E. coli poll it is also found in the Werner 
syndrome helicase 

Comment: (WRN), focus forming activity 1 protein 
(FFA-1) and ribonuclease D 
Comment: (RNaseD). 

Comment: Werner syndrome is a human genetic 
disorder causing premature aging; 

Comment: the WRN protein has helicase activity in the 
3' -5' direction [4,5]. 

Comment: The FFA-1 protein is required for formation 
of a replication foci 

Comment: and also has helicase activity; it is a 
homoiogue of the WRN 
Comment: protein [3]. 

prtmrnant- RMpcp n iq a T-^ 1 pvnm iHpfh^p involvpd in 

UOITiniSnt. niNdot? lj l& « o o caui luuicaoc n ivuiv^u lit 

tRNA processing. 

Comment: Also found in this family is the autoantigen 
PM/Scl thought to be 

Comment: involved in polymyositis-scleroderma 
overlap syndrome. 
Number of members: 41 


3HCDH 


PDOC00065 


3-hydroxyacyl-CoA 
dehydrogenase signature 


3-hydroxyacyi-CoA dehydrogenase (EC 1 .1 .1 .35) (HCDH) [1] is 
an enzyme involved 

in fatty acid metabolism, it catalyzes the reduction of 3- 
hydroxyacyl-CoA to 

3-oxoacyl-CoA. Most eukaryotic ceils have 2 fatty-acid beta- 
oxidation systems, 

one located in mitochondria and the other in peroxisomes. In 
peroxisomes 

3-hydroxyacyl-CoA dehydrogenase forms, with enoyl-CoA 
hydratase (ECH) and 

3,2-trans-enoyl-CoA isomerase (EC!) a multifunctional enzyme 
where the N- 

termina! domain bears the hydratase/isomerase activities and 
the C-terminal 

domain the dehydrogenase activity. There are two mitochondrial 
enzymes: one 

which is monofunctiona! and the other which is, like its 
peroxisomal 

counterpart, multifunctional. 

In Escherichia coii {gene fadB) and Pseud omonas fragi (gene 
faoA) HCDH is part 

of a multifunctional enzyme which also contains an ECH/ECl 
domain as well as a 

3-hydroxybutyryi-CoA epimerase domain [2]. 

The other proteins structurally related to HCDH are: 

- Bacterial 3-hydroxybutyryl-CoA dehydrogenase (EC 1.1.1.157) 
which reduces 

3-hydroxybutanoyl-CoA to acetoacetyl-CoA [3]. 

- Eye lens protein lambda-crystalfin [4], which is specific to 
lagomorphes 

(such as rabbit). 

There are two major region of similarities in the sequences of 
proteins of the 

HCDH family, the first one located in the N -terminal, corresponds 
to the NAD- 

binding site, the second one is located in the center of the 
sequence. We have 

chosen to derive a signature pattern from this central region. 

Description of pattern (s) and/or profile(s) 

Consensus pattern rDNEl-x(2HGA]-F-n_IVMF^-x-[NT1-R-x(3)- 
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Pfam 


Prosfte 


Full Nam© 


description 








[PA]-[LIVMFY](2)-x(5)-[LIVMFYCT|-{UVMFY]-x(2)-[GV] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

July 1998 / Pattern and text revised. 

References 

[13 

Birktoff J.J., Holden H.M., Hamlin R., Xuong N.-H., Banaszak L.J. 
Proc. Nat!. Acad. Set. U.S.A. 84:8262-8266(1987). 

[2] 

Nakahigashi K., Inokuchi H. 

Nucleic Acids Res. 18:4937-4937(1990). 

[31 

Muliany P., Clayton C.L, Pallen M.J., Slone R., Al-Saleh A., 
Tahpnnhrili ?i 

FEMS Microbiol. Lett. 124:61-67(1994). 
[4] 

Mulders J.W.M., HendriksW., Blankesteijn W.M., Bloemendal H., 
de Jong W.W. 

J. Biol. Chem. 263:15462-15466(1988). 


4HPPD_C 




4-hydroxyphenylpyruvate 
dioxygenase C terminal 
domain 


Accession number: PF01 626 

Definition: 4-hydroxyphenylpyruvate dioxygenase C 

terminal domain 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pf am-B_1 1 1 6 (release 4. 1 ) 

Gathering cutoffs: -35 -35 

Trusted cutoffs: -25.80 -25.80 

Noise cutoffs: -44.90 -44.90 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline- 93279307 

Reference Titie: Human 4-hydroxyphenylpyruvate 

dioxygenase. Primary 

Reference Title: structure and chromosomal localization of 
the gene. 

Reference Author: Ruetschi U, Dellsen A, Sahlin P, Stenman 
G, Rymo L, 

Reference Author: Lindstedt S; 

Reference Location: Eur J Biochem 1993;213:1081-1089. 
Database Reference INTERPRO; IPR002887; 
Comment: 4-Hydroxyphenylpyruvic acid dioxygenase 
(HPD) is an important enzyme 

Comment: in tyrosine catabolism in most organisms. A 
genetic deficiency in 

Comment: this enzyme in humans and mice leads to 
hereditary tyrosinemia type 3. 

Comment: The identity of the C-terminus of the HPD 
makes this part of the 

Comment: molecule a candidate for a functional role in 
the catalytic process 

Comment: [1]. This region is found as a separate 
protein Swiss:Q4971 7 that 

Comment: is somewhat different from HPD and may 
have a different but related 

Comment: protein function (Unpublished observation 
Bateman A). 

Number of members: 28 
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Pfam 


Proslte 


Full Nam© 


Dascriptson 


5__3_exonuclease 




5-3' exonuclease domain 


The 5'-3' exonuclease domain is one found in bacterial DNA 
polymerases I and in yeast DNA repair enzymes such as 
Exonuclease I. Yeast Exo I is involved in mitotic recombination 
and also includes a domain that interacts with the mismatch repair 
Drotein MSH2. The 5'-3' exonuclease domain is also present in 
XPG DNA repair enzymes in humans and in yeast RAD9 protein. 
Defects in XPG proteins result in Xeroderma Pigmentosum. Thus 
defects in 5' -3' exonuclease domain-containing proteins in plants 
are expected to lead to defects in DNA repair and corresponding 
high spontaneous and inducible mutation rates. Consensus 
sequence: 

I MKKKLLLVDGSSLAFRAFFALPPLTNSAGE PTNAVYG FLKMLI K 

LIEQEQPTHIAWFDAKAKTFRHELYEGYKAGRAP 

TPDELREQI PLl KELLD ALGI PLLEVAGYEADDVIGTLAKLAEKEG 

YEVLIVTGDRDLLQLVSDHVTVIITKKGiAEFTL 

FTPEAVIEKYGLTPEQIIDYKALMGDSSDNIPGVKGIGEKTAAKLL 

QEYGSLEGIYANLDKLKGKKLREKLLAHKEDAKL 

cpni ATiKTnvpi ni ~n nni ri pdpdrdaldllfde 

onL/|_r\ 1 lr\i UVrLULI LUULnLr ur unL/nuLiii-u isi — 

Ref: 

Fiorentini P. et al. RT. Mol. Cell. Biol. 17:2764-2773(1997). 

Tishkoff et al. Cancer Res. 0:0-0(1998). 

Macinnes M.A. et al. Mol. Cell. Biol. 13:6393-6402(1993). 


60s_ribosomal 




60s Acidic ribosomal 
protein 


Accession number: PF00428 

Definition: 60s Acidic ribosomal protein 

Author: Finn RD 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_J 51 (release 1 .0) 

Gathering cutoffs: 17 17 

Trusted cutoffs: 1 7.80 1 7.80 

Noise cutoffs: 9.30 9.30 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96282699 

Reference Title: Proteins P1 , P2, and P0, components of the 
eukaryotic 

Reference Title: ribosome stalk. New structural and 
functional aspects. 

Reference Author: Remacha M, Jimenez-Diaz A, Santos C, 
Briones E, Zambrano R, 

Reference Author: Rodriguez Gabriel MA, Guarinos E, 

Rallacta IP- 
DalicbLa 

Reference Location: Biochem Cell Biol 1995;73:959-968. 

Database Reference INTERPRO; IPR001 813; 

Database reference: PFAMB; PB00221 8; 

Comment: This family includes archaebactertal L1 2, 

eukaryotic P0, P1 and P2. 

Number of members: 109 


6PF2K 


PDOC00158 


Phosphoglycerate 
mutase famiiy 
phosphohistidine 
signature 


Phosphoglycerate mutase (EC 5.4.2.1) (PG AM) and 
bisphosphoglycerate mutase 

(EC 5.4.2.4) (BPGM) are structurally related enzymes which 
catalyze reactions 

involving the transfer of phospho groups between the three 
carbon atoms of 

phosphoglycerate [1 ,2]. Both enzymes can catalyze three 

different reactions, 

although in different proportions: 

- The isomerization of 2-phosphoglycerate (2-PGA) to 3- 
phosphoglycerate (3- 

PGA) with 2,3-diphosphoglycerate (2,3-DPG) as the primer of 
the reaction. 

- The synthesis of 2,3-DPG from 1 ,3-DPG with 3-PGA as a 
primer. 

- The degradation of 2,3-DPG to 3-PGA (phosphatase EC 
3.1 .3.1 3 activity). 

In mammals, PGAM is a dimenc protein. There are two isoforms 
of PGAM: the M 
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Pfarn 



Prosfte 



Futl Name 



Description 



(muscle) and B (brain) forms. In yeast, PGAM is a tetrameric 
protein. BPGM is 

a dimeric protein and is found mainly in erythrocytes where it 
plays a major 

role in regulating hemoglobin oxygen affinity as a consequence of 
controlling 

2,3-DPG concentration. 

The catalytic mechanism of both PGAM and BPGM involves 
the formation of a 
phosphohistidine intermediate [3]. 

The bifunctional enzyme 6-phosphofructo-2-kinase / fructose-2,6- 
bisphosphatase 

(EC 2.7.1 .105 and EC 3.1 .3.46) (PF2K) [4] catalyzes both the 
synthesis and the 

degradation of fructose-2,6-bisphosphate. PF2K is an important 
enzyme in the 

regulation of hepatic carbohydrate metabolism. Like 
PGAM/BPGM, the fructose- 

2,6-bisphosphatase reaction involves a phosphohistidine 
intermediate and the 

phosphatase domain of PF2K is structurally related to 
PGAM/BPGM. 

The bacterial enzyme alpha-ribazole-S'-phosphate phosphatase 
(gene cobC) which 

is involved in cobalamin biosynthesis also belongs to this family 



We built a signature pattern around the phosphohistidine residue. 



Description of pattern (s) and/or profile(s) 

Consensus pattern [LlVM]-x-R-H-G-[EG]-x(3)-N [H is the 
phosphohistidine residue] 

Sequences known to belong to this class detected by the pattern 
ALL ? except for Haemophilus influenzae PGAM. 
Other sequence(s) detected in SWISS-PROT 2. 

Note some organisms harbor a form of PGAM independent of 2,3- 
DPG, this enzyme is not related to the family described above [6]. 
Last update 

November 1 995 / Text revised. 

References 
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Le Boulch P., Joulin V., Garel M.-C, Rosa J., Cohen-Solal M. 
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White M.F., Fothergill-Giimore L.A. 
FEBS Lett. 229:383-387(1988). 

[3] 

Rose ZB. 

Meth. Enzymol. 87:43-51(1982). 
[41 

Bazan J.F., Fletterick R.J., Pilkis S.J. 

Proc. Natl. Acad. Sci. U.S.A. 86:9642-9646(1989). 

[5] 

OToole G.A., Trzebiatowski J.R., Escalante-Semerena J.C. 
J. Biol. Chem. 269:26503-26511(1994). 



Grana X., De Lecea L, El-Maghrabi M.R., Urena J.M., Caellas C, 
Carreras J., Puigdomenech P., Pilkis S.J., Climent F. 
J. Biol. Chem. 267:12797-12803(1992). 
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7tm_5 



7TM chemoreceptor 



Accession number: 
Def i n ition: 



PF01604 
7TM chemoreceptor 
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Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_942 {release 4.1 ) 

Gathering cutoffs: -46 -46 

Trusted cutoffs: -44.30 -44.30 

Noise cutoffs: -47.80 -47.80 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Mediine: 98248686 

Reference Title: Two targe families of chemoreceptor genes 
n the nematodes 

Reference Title: Caenorhabditis elegans and Caenorhabditis 
briggsae reveal 

Reference Title: extensive gene duplication, diversification, 
movement, and 

Reference Title: intron loss. 

Reference Author: Robertson HM; 

Reference Location: Genome Res 1 998; 8:449-463. 

Database Reference INTERPRO; IPR003003; 

Comment: This large family of proteins are related to 

7tm_1 . 

Comment: They are 7 transmembrane receptors. This 
family does not 

Comment: include all known members, as there are 
problems with 

Comment: overlapping specificity with 7tm_1 . 
Comment: This family is greatly expanded in the 
nematode worm C. 
Comment: elegans. 
Number of members: 1 80 


Aa trans 




Transmembrane amino 
acid transporter protein 


Accession number: PF01490 

Definition: Transmembrane amino acid transporter 
protein 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B^41 9 (release 4.0) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 50.80 1 50.80 

Noise cutoffs: 3.60 3.60 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98007977 

Reference Title: Identification and characterization of the 
vesicular GABA 

Reference Title: transporter. 

Reference Author: Mclntire SL, Reimer RJ, Schuske K, 

Edwards RH, Jorgensen 

Reference Author: EM; 

Reference Location: Nature 1 997;389: 870-876. 

Database Reference INTERPRO; IPR002422; 

Database reference: PFAMB; PB02091 2; 

Comment: This transmembrane region is found in 

many amino acid transporters 

Comment: including UNC-47 and MTR. UNC-47 
encodes a vesicular amino butyric acid 
Comment: (GABA) transporter, (VGAT). UNC-47 is 
predicted to have 10 transmembrane 

Comment: domains Swiss:P34579 [1]. MTR is a N 
system amino acid transporter system 
Comment: protein involved in methyltryptophan 
resistance Swiss:P38680. 

Comment: Other members of this family include proline 

transporters and amino 

Comment: acid permeases. 

Number of members: 50 


ABCJran 


PDOC00185 


ABC transporters family 
signature 


On the basis of sequence similarities a family of related 
ATP-binding 

proteins has been characterized [1 to 5]. These proteins are 
associated with a 

variety of distinct biological processes in both prokaryotes and 
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eukaryotes, 

but a majority of them are involved in active transport of smail 
hydrophiiic 

molecules across the cytoplasmic membrane. Ali these 
proteins share a 

conserved domain of some two hundred amino acid residues, 
which includes an 

ATP-binding site. These proteins are collectively known as ABC 
transporters. 

Proteins known to belong to this family are listed below 
(references are only 

Drovided for recently determined sequences). 
In prokaryotes: 

- Active transport systems components: alkylphosphonate 
uptake(phnC/phnK/ 

phnL); arabinose (araG); arginine (artP); dipeptide 
(dciAD;dppD/dppF); 

ferric enterobactin (fepC); ferrichrome (fhuC); galactoside 
(mgIA); 

glutamine (glnG); glycerol -3-phosphate (ugpC); glycine 
betaine/L-proline 

(proV); glutamate/aspatate (gltL); histidine (hisP); iron(lll) 
(sfuC), 

iron(lll) dicitrate (fecE); iactose (lacK); 
leucine/isoleucine/valine 

(braF/braG;livF/!ivG); maltose (malK); molybdenum (modC); 
nickel (nikD/ 

nikE); oligopeptide (amiE/amiF;oppD/oppF); peptide 
(sapD/sapF); phosphate 

(pstB); putrescine (potG); ribose (rbsA); 
sperm id ine/putrescine (potA); 

sulfate (cysA); vitamin B12 (btuD). 

- Hemolysin/leukotoxin export proteins hlyB, cyaB and IktB. 

- Colicin V export protein cvaB. 

- Lactococcin export protein IcnC [6], 

- Lantibiotic transport proteins nisT (nisin) and spaT (subtilin). 

- Extracellular proteases B and C export protein prtD. 

- Alkaline protease secretion protein aprD. 

- Beta-(1 ,2)-glucan export proteins chvA and ndvA. 

- Haemophilus influenzae capsule-polysacchande export protein 
bexA. 

- Cytochrome c biogenesis proteins ccmA (also known as cycV 
and helA). 

- Polysialic acid transport protein kpsT. 

- Cell division associated ftsE protein (function unknown). 

- Copper processing protein nosF from Pseudomonas stutzeri. 

- Nodulation protein nodi from Rhizobium (function unknown). 

- Escherichia coli proteins cydC and cydD. 

- Subunit A of the ABC excision nuclease (gene uvrA). 

- Erythromycin resistance protein from Staphylococcus 
epidermidis (gene 

msrA). 

- Tylosin resistance protein from Streptomyces fradiae (gene tlrC) 
[7]- 

- Heterocyst differentiation protein (gene hetA) from Anabaena 
PCC7120. 

- Protein P29 from Mycoplasma hyorhinis, a probable 
component of a high 

affinity transport system. 

- yhbG, a putative protein whose gene is linked with ntrA in 
many bacteria 

such as Escherichia coli, Klebsiella pneumoniae, 
Pseudomonas putida, 
Rhizobium meliloti and Thiobacillus ferrooxidans. 

- Escherichia coli and related bacteria hypothetical proteins 
yabJ, yadG } 

yagC, ybbA, ycjW, yddA, yehX, yejF, yheS, yhiG, yhiH, yjcW, 

yijK, yoji, 

yrbF and ytf R. 
In eukaryotes: 
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- The multidrug transporters (Mdr) (P-glycoprotein), a family of 
closely 

related proteins which extrude a wide variety of drugs out of the 
cell {for 
a review see [8]). 

- Cystic fibrosis transmembrane conductance regulator (CFTR), 
which is most 

probably involved in the transport of chloride ions. 
-Antigen peptide transporters 1 (TAP1 , PSF1, RING4, HAM- 
1 , mtpl ) and 2 

(TAP2, PSF2, RING11, HAM-2, mtp2), which are involved in 
the transport of 

antigens from the cytoplasm to a membrane-bound 
compartment for 

association with MHC class I molecules. 

- 70 Kd peroxisomal membrane protein (PMP70). 

- ALDP, a peroxisomal protein involved in X-linked 
adrenoleukodystrophy [9]. 

- Sulfonylurea receptor [10], a putative subunit of the B-ceil ATP- 
sensitive 

potassium channel. 

- Drosophila proteins white (w) and brown (bw), which are 
involved in the 

import of ommatidium screening pigments. 

- Fungal elongation factor 3 (EF-3). 

- Yeast STE6 which is responsible for the export of the a-factor 
pheromone. 

- Yeast mitochondrial transporter ATM1 . 

- Yeast MDL1 and MDL2. 

- Yeast SNQ2. 

- Yeast sporidesmin resistance protein (gene PDR5 or STS1 or 
YDR1). 

- Fission yeast heavy metal tolerance protein hmtl . This protein 
is probably 

involved in the transport of metal -bound phytochelatins. 

- Fission yeast brefeldin A resistance protein {gene bfrl or hba2). 

- Fission yeast leptomycin B resistance protein (gene pmdl). 

- mbpX, a hypothetical chloroplast protein from Liverwort. 

- Prestalk-specific protein tagB from slime mold. This protein 
consists of 

two domains: a N-terminal subitlase catalytic domain (see 
<£QGGQG125>) and 
a C-termmal ABC transporter domain. 

As a signature pattern for this class of proteins, we use a 
conserved region 

which is located between the 'A' and the 'B* motifs of the ATP- 
binding site. 

Consensus pattern 

[UVMFYC]-[SA]-[SAPGLVFYKQH]-G-[DENGMW]- 
[KRQASPCLIMFW]- [KRNQSTAVM]-[KRACLVM]-[LlVlvlFYPAN]- 
{PHY}-[LIVMFVv]- [SAGCLIVPHFYWHPHKRHP}- 
[LIVMFYWSTA] Sequences known to belong to this class 
detected by the pattern ALL, except for 25 sequences. Other 
sequence(s) detected in SWISS-PROT 42. Note the ATP- 
binding region is duplicated in araG, mdi, msrA, rbsA, tlrC, uvrA, 
yejF, Mdr's, CFTR, pmdl and in EF-3. In some of those proteins, 
the above pattern only detect one of the two copies of the domain. 
Note the proteins belonging to this family also contain one or two 
cripjact rt f the ATP-hinriing motifs 'A' and 'B' {see <PDOC0001 7>). 

July 1 998 / Text revised . [ 1 ] 
Higgins C.F., Hyde S.C., Mimmack M.M., Gileadi U., Gill D.R., 
Gallagher MP. 

J. Bioenerg. Biomembr. 22:571-592(1990). 
[2] 

Higgins C.F., Gallagher M.P., Mimmack M.M., Pearce S.R. 

BioEssays 8: 11 1 -11 6(1 988). 

[3] 

Higgins C.F., Hiles I.D., Salmond G.P.C., Gill D.R., Downie J.A., 
Evans i.J., Holland LB., Gray L., Buckels S.D., Bell A.W., 
Hermodson M.A. 
Nature 323:448-450(1986). 
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4] 

Doolittle R.F , Johnson M.S., Husain 1., van Houten B., Thomas 

Mature 323:451-453(1986). 
5] 

Blight M.A., Holland LB. 
viol. Microbiol. 4:873-880(1990). 
6] 

Stoddard G.W., Petzei J. P., van Belkum M.J., Kok J., McKay LL. 

<\ooi. Environ, Microbiol. 58:1952-1961(1992). 


Rosteck P.R. Jr., Reynolds P.A., Hershberger C.L. 

Sene 102:27-32(19911. 
[8] 

3ottesman M.M., Pastan 1. 

JLPiolJChem. 2mJ23.md233§mm-- 
[93 

Valle D., Gaertner J. 
Nature 361 : 682-683(1 993). 
[10] 

Aguilar-Bryan L. ? Nichols C.G., Wechsler S.W., Clement J. P. IV, 
Boyd A.E. Ill, Gonzalez G., Herrera-Sosa H., Nguy K., Bryan J. 3 
Nelson D.A. 

Science 268:423-426(1995). 


ABC2_membrane 


PDOC00692 


ABC-2 type transport 
system integral 
membrane proteins 
signature 


Integral membrane components of a number of bacterial active 
transport systems 

have been shown to be evolutionary related and to form a 

distinct family 

[1,2]. These proteins are: 

- Escherichia coli kpsM, involved in polysialic acid export. 

- Haemophilus influenzae bexB, involved in polyribosylribitol 
phosphate 

capsule polysaccharide export. 

- Salmonella typhi vexB, involved in translocation of the Vi 
polysaccharide. 

- Neisseria meningitidis ctrC, involved in poly neuraminic acid 
capsule 

polysaccharide export. 

- Rhizobiacae noduiation protein J (gene nodJ), probably 
involved in 

exporting a modified beta-1 ,4-linked N-acetylglucosamine 
oligosaccharide. 

- Streptomyces peucetius drrB, involved in exporting the 
antibiotics 

daunorubicin and doxorubicin. 

- Klebsiella pneumoniae O-antigen exprt system protein rfbA. 

- Yersinia enterocolitica O-antigen exprt system protein rfbD. 

- Escherichia coli hypothetical protein yadH. 

- Escherichia coli hypothetical protein yhhJ. 

The molecular size of these proteins is around 30 Kd. They are 
thought to 

contain six transmembrane regions. They either form 
homooligomeric channels or 

associate with another type of transmembrane protein to form 
heteroligomers. 

Transport systems in which they participate are energized by an 
ATP-binding 

protein that belongs to the ABC transporter family. The 
designation 'ABC-2' 

has been proposed [1] for these transport systems. 

As a signature pattern, we selected a conserved region 
located in the 

C-terminal section of these proteins. 
Description of pattern (s) and/or profile(s) 

Consensus pattern [LIMST]-x(2)-[LIMW]-x(2)-[LlMCA]-[GSTC]-x- 
[GSAIVj-x(6)~ [LIMGAl-[PGSNQl-x(9 l 12)-P-ILIMFTl-x-rHRSYl- 
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x(5)-[RQ] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 2. 
Last update 

November 1997 / Pattern and text revised. 

References 

f 11 

Reizer J., Reizer A., Saier M.H. Jr. 
Protein Sci. 1 :1 326-1 332(1 992). 

[2] 

Vazquez M., Santana O., Quinto C. 
Moi. Microbiol. 8:369-377(1993). 


ABC-3 




ABC 3 transport family 


Members of this family include receptors that mediate 
transmembrane signalling. These receptors can bind to a number 
of factors including: amphiregulin, epidermal growth factor, gp30, 
heparin-binding egf, insulin, insulin-like growth factor I and II, 
neuregulins, transforming growth factor-alpha and, and vaccinia 
virus growth 

Signal transduction is mediated by catalytic activity of 
tyrosine kinase, such as ATP + A protein tyrosine = ADP + protein 
tyrosine phosphate. Typically, such signal transduction have 
been implicated in metabolic and developmental changes, 
including cell fate and differentiation. Examples include 
instruction of follicle cells to follow a dorsal pathway of 
development rather than the default ventral pathway- may also 
bind the spitz protein. References describing these family 
members and their biological activities: 

Abbot et at., J. Biol. Chem. 267:1 0759-1 0763(1 992) ;Araki et al., 
J. Biol. Chem. 262:16186-16191(1987); Aroian et al., EMBO J. 
13:360-366(1994); Aroian et al., Nature 348:693-699(1990); 
Barbetti et al., Diabetes 41:408-415(1992); Bargmann et al., 
Nature 319:226-230(1986); Cama et al., J. Biol. Chem. 268:8060- 
8069(1993); Cama et al., J. Clin. Endocrinol. Metab. 73:894- 
901(1991); Carrera et at., Hum. Mol. Genet. 2:1437-1441(1993); 
Clifford et al., Genetics 137:531-550(1994); Cocozza et al., 
Diabetes 41:521-526(1992); Cooke et al., Biochem. Biophys. Res. 
Commun. 177:1113-1120(1991); Coussens eta!., Science 
230:1132-1139(1985); Dickens etaL, Biochem. Biophys. Res. 
Commun. 186:244-250(1992); Ebina et al., Cell 40:747- 
758(1985); Ebina et al., Proc. Natl. Acad. Sci. U.S.A. 84:704- 
708(1987); Ehsani et al., Genomics 15:426-429(1993); Elbein et 
al., Diabetes 42:429-434(1993); Elbein, Diabetes 38:737- 
743(1989); Fujita-Yamaguchi et al., Protein Seq. Data Anal. 1 :3- 
6(1987); Guliicketal., EMBO J. 11:43-48(1992); Harutaetal., 
Diabetes 42:1837-1844(1993); Hubbard et al., EMBO J. 16:5572- 
5581(1997). 

Hubbard et al., Nature 372:746-754(1994); Iwanishi et al., 
Diabetologia 36:414-422(1993); Kadowaki et al., J. Clin. Invest 
86:254-264(1990); Kadowaki et al., Science 240:787-790(1988); 
Kim et al., Diabetologia 35:261-266(1992); Kiinkhamer et al., 
EMBO J. 8:2503-2507(1989); Kusari et al., J. Biol. Chem. 
266:5260-5267(1991); Lai et al., Neuron 6:691-704(1991); Lax et 
al., Mol. Cell. Biol. 8:1970-1978(1988); Lebrun et al., J. Biol. 
Chem. 268:1 1272-1 1 277(1993); Lee et a!., Oncogene 8:3403- 
3410(1993); Lesokhin et al., Dev. Biol. 205:129-144(1999); Livneh 
et al., Cell 40:599-607(1985). 

Longo et al., Proc. Natl. Acad. Sci. U.S.A. 90:60-64(1993); 
McKeon et al., Mol. Endocrinol. 4:647-656(1990); Moller et al., J. 
Biol. Chem. 265:14979-14985(1990); Moller et al., Mol. 
Endocrinol. 4:1183-1191(1990); Odawara et al., Science 245:66- 
68(1 989); Raz et al., Genetics 1 29:1 91 -201 (1 991 ). 
Sakai et al., J. Mol. Biol. 256:548-555(1996); Schaeffer et al., 
Biochem. Biophys. Res. Commun. 189:650-653(1992); Schejter 
etaL, Cell 46:1091-1101(1986); SeinoetaL, Biochem. Biophys. 
Res. Commun. 159:312-316(1989); Seino et al., Diabetes 39:123- 
128(1990); Semba et al., Proc. Natl. Acad. Sci. U.S.A. 82:6497- 
6501(1985); Shier et al., J. Biol. Chem. 264:14605-14608(1989); 
Taira et al., Science 245:63-66(1989); Tewari et al., J. Biol. 
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Chem. 264:16238-16245(1989); Ullrich et ai., Nature 31 3:756- 
761 (1985). 

Ullrich et ai., EMBO J. 5:2503-2512(1986); van der Vorm et al. T 

niahta+ntnnia ^fi- 1 79-1 ~7A(A QQTi* van rie»r Vnrm pt al A Riol 

Lj/|ClL/"HJlUy Id OO. I / I 1 1 , VQI 1 UCI V Ul ill CI CU-, \J . 1— i IUI . 

Chem. 267:66-71(1992); Wadsworth et ai. ; Nature 314:178- 
180(1985); White et a!. s Cell 54:641-649(1988); Xu et al., J. Biol. 
Chem. 265:1 8673-1 8681 (1 990); Yamamoto et al„ Nature 
319:230-234(1986); and Yoshimasa et al., Science 240:784- 
787(1988). 


ACAT 




Sterol O-acyltransferase 


Accession number: PF01800 

Definition: Sterol O-acyltransferase 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pfam-B 1 454 (release 4.2) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 12.80 1 12.80 

Noise cutoffs: -128.10 -128.10 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98434592 

Reference Title: Characterization of two human genes 
encoding acyl coenzyme 

p e f eren ce Title: A:chotesterol acyltransferase-related 
enzymes. 

Reference Author: Oelkers P, Behari A, Cromley D f 
Billheimer JT, Sturley SL; 

Reference Location: J Biol Chem 1998;273:26765-26771 . 

Reference Number: [2] 

Reference Medline: 98434590 

Reference Title: Identification of a form of acyl- 

CoAxhoiesterol 

Reference Title: acyltransf erase specific to liver and 
intestine in non human 
Reference Title: primates. 

Reference Author: Anderson RA, Joyce C, Davis M, Reagan 

JW, Clark M, Shelness 

Reference Author: GS, Rudel LL; 

Reference Location: J Bioi Chem 1998;273:26747-26754. 
Reference Number: [3] 
Reference Medline: 962431 37 

Reference Title: Sterol esterification in yeast: a two-gene 
process. 

Reference Author: Yang H, Bard M, Bruner DA, Gleeson A, 
Deckelbaum RJ, 

Reference Author: Aljinovic G, Pohl TM, Rothstein R, Sturley 
SL; 

Reference Location: Science 1 996;272: 1 353-1 356. 
Database Reference INTERPRO; IPR002688; 
Comment: Sterol O-acyltransferases or acyl- 
coa: cholesterol acyltransf erase 

Comment: (ACAT) EC:2.3. 1 .26 is a transmembrane 
protein that catalyses the 

Comment: esterification of cholesterol to its cholesterol 
ester storage 

Comment: form. 
Number of members: 21 


ACPS 




4'-phosphopantetheinyl 
transferase superfamiiy 


Accession number: PF01 648 

Definition: 4'-phosphopantetheinyl transferase 

superfamiiy 

Author: Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pfam-B 1 679 (release 4.1 ) 

Gathering cutoffs: 0 0 

Trusted cutoffs: 0.60 0.60 

Noise cutoffs: -4.00 -4.00 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Mediine: 96027548 

Reference Title: Cloning, overproduction, and 
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characterization of the 

Reference Title: Escherichia coli holo-acyi carrier protein 
synthase. 

Reference Author: Lambalot RH, Walsh CT; 

Reference Location: J Biol Chem 1 995;270 '24658-24661 . 

Reference Number: [2] 

Reference Medline: 97144264 

Reference Title: A new enzyme superfamiiy - the 

phosphopantetheinyl 

Reference Title: transferases. 

Reference Author: Lambalot RH, Gehring AM, Flugel RS, 
Zuber P, LaCelle M, 

Reference Author: Marahiel MA, Reid R, Khosla C, Walsh 
CT; 

Reference Location: Chem Biol 1996;3:923-936. 

Reference Number: [3] 

Reference Medline- 1 0581 256 

Reference Title: Crystal structure of the surfactin 

synthetase-activating 

Reference Title: enzyme sfp: a prototype of the 4'~ 
phosphopantetheinyl 

Reference Title: transferase superfamiiy [In Process 
Citation] 

Reference Author: Reuter K, Mofid MR, Marahiel MA, Ficner 
R; 

Reference Location: EMBO J 1 999; 1 8:6823-6831 . 

Database Reference INTERPRO; IPR002582; 

Database reference: PFAMB; PB007908; 

Database reference: PFAMB; PB041384; 

Comment: Members of this fam ily transfers the 

Comment: 4'-phosphopantetheine (4 ! -PP) mojety from 

coenzyme A (CoA) to 

Comment: the invariant serine of pp-binding. This post- 
translational 

Comment: modification renders holo-ACP capable of 
acyl group activation 

Comment: via thioesterification of the cysteamine thiol 
of 4'-PP[1]. 

Comment: This superfamiiy consists of two subtypes: 
The ACPS type 

Comment: such as Swiss:P24224 and the Sfp type 
such as Swiss:P39135. 

Comment: The structure of the Sfp type is known [3], 
which shows the 

Comment: active site accommodates a magnesium ion. 
The most highly 

| Comment: conserved regions of the alignment are 
involved in binding 

Comment: the magnesium ion. 
Number of members: 46 


ACT 




ACT domain 


Accession number: PF01 842 

Definition: ACT domain 

Author: Bateman A 

Alignment method of seed: Manual 

Source of seed members: Bateman A 

Gathering cutoffs: 25 0 

Trusted cutoffs: 26.1 0 0.50 

Noise cutoffs: 24.50 24.50 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 95236205 

Reference Title: The aliosteric ligand site m the Vmax-type 
cooperative 

Reference Title: enzyme phosphoglycerate dehydrogenase. 
Reference Author: Schuiler DJ, Grant GA, Banaszak LJ; 
Reference Location: Nat Struct Biol 1995;2:69-76. 
Reference Number: [2] 
Reference Medline: 99241053 

Reference Title: Gleaning non-triviat structural, functional 
and 

Reference Title: evolutionary information about proteins by 
iterative 
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Reference Title: database searches. 
Reference Author: Aravind L, Koonin EV; 
Reference Location: J Mol Biol 1999;287:1023-1040. 
Database Reference: SCOP; 1 psd; fa; [SCOP-USA] [CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR002912; 

Database Reference PDB; 1 phz A; 35; 1 1 0; 

Database Reference PDB; 2phm A; 35; 1 1 0; 

Database Reference PDB; 1 psd A; 338; 41 0; 

Database Reference PDB; 1 psd B; 338; 41 0; 

Database reference: PFAMB; PB001 977; 

Database reference: PFAMB; PB008097; 

Database reference: PFAMB; PB01 0480; 

Database reference: PFAMB; PB01 1 031 ; 

Database reference: PFAMB; PB031 880; 

Database reference: PFAMB; PB038464; 

Database reference: PFAMB; PB040963; 

Database reference: PFAMB; PB041 51 8; 

Database reference: PFAMB; PB041 667; 

Comment: This family of domains generally have a 

regulatory role. 

Comment: ACT domains are linked to a wide range of 
metabolic 

Comment: enzymes that are regulated by amino acid 
concentration. 

Comment: Pairs of ACT domains bind specifically to a 
particular 

Comment: amino acid leading to regulation of the 
linked enzyme. 

Comment: The ACT domain is found in: 
Comment: D-3-phosphoglycerate dehydrogenase 
EC:1 .1 .1 .95 Swiss:P08328, 
Comment: which is inhibited by serine [1]. 
Comment: Aspartokinase EC:2.7.2.4 Swiss:P53553 } 
which is regulated by lysine. 

Comment: Acetolactate synthase small regulatory 
subunit Swiss: P00894, 

Comment: which is inhibited by valine. 

Comment: Phenyialanine-4-hydroxylase EC:1.14.16.1 

Swiss: P00439, which 

Comment: is regulated by phenylalanine. 
Comment: Prephenate dehydrogenase EC:4.2.1.51 
Swiss :P2 1203. 

Comment: formyttetrahydrofoiate deformylase 
EC:3.5.1 .10, Swiss:P37051 , 

Comment: which is activated by methionine and 
inhibited by glycine. 

Comment: GTP pyrophosphokinase EC:2.7.6.5 
Swiss:P11585. 

Number of members: 1 77 


Activirwecp 




Activin types 1 and II 
receptor domain 


Accession number: PF01 064 

Definition: Activin types I and ll receptor domain 

Author: Finn RD, Bateman A 

Alignment method of seed: Clustalw__manual 

Source of seed members: Pfam-B J338 (release 3.0) 

Gathering cutoffs: 22 22 

Trusted cutoffs: 23.10 23.10 

Noise cutoffs: 1 1 .30 21 .20 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Mediine: 9745471 4 

Reference Title: From receptor to nucleus: the Smad 
pathway. 

Reference Author: Baker JC, Harland RM; 

Reference Location: Curr Opin Genet Dev 1 997;7:467-473. 

Reference Number: [2] 

Reference Medline: 941 31 268 

Reference Title: The TGF-beta superfamily: new members, 
new receptors, and 

Reference Title: new genetic tests of function in different 
organisms. 

Reference Author: Kingsley DM; 
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Reference Location: Genes Dev 1 994;8:133-1 46. 

Reference Number: [3] 

Reference Medline: 93390967 

Reference Title: Activm receptor-like kinases: a novel 

subclass of 

Reference Title: cell-surface receptors with predicted 
serine/threonine 

Reference Title: kinase activity. 

Reference Author: ten Dijke P, ichijo H, Franzen P, Schulz P, 
Saras J, 

Reference Author: Toyoshima H, Heldin CH, Miyazono K; 

Reference Location: Oncogene 1993;8:2879-2887. 

Database Reference iNTEFRPFRO; IPR000472; 

Database reference: PFAMB; PB0241 1 2; 

Database reference: PFAMB; PB040755; 

Comment: This Ffam entry consists of both TGF-beta 

eceptor types. 

Comment: This is an alignment of the hydrophilic 
iysteine-nch 

Comment: iigand-binding domains, 

Comment: Both receptor types, (type I and II) posses a 

2 amino 

Comment: acid cysteine box, with the the consensus 
CCX{4-5}CN. 

Comment: The type I receptors also possess 7 
extracellular residues 

Comment: preceding the cysteine box. 
Number of members: 79 


Acyl-ACPJTE 




Acyl-ACP thioesterase 


Accession number: PF01643 

Definition: Acyl-ACP thioesterase 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_928 (release 4.1 ) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 91 .70 91 .70 

Noise cutoffs: -192.80 -1 92.80 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command hne: hmmcaiibrate -seed 0 HMM 

Reference Number. [1] 

Reference Medline: 96068671 

Reference Title: Modification of the substrate specificity of 
an acyi-acyl 

Reference Title: carrier protein thioesterase by protein 
engineering. 

Reference Author: Yuan L, Voelker TA, Hawkins DJ; 
Reference Location: Proc Natl Acad Sci USA 1 995; 92: 10639- 
10643. 

Reference Number: [2] 
Reference Medline: 92320297 

Reference Title: Fatty acid biosynthesis redirected to 
medium chains in 

Reference Title: transgenic oilseed plants. 

Reference Author: Voelker TA, Worrell AC, Anderson L, 

Bleibaum J f Fan C, 

Reference Author: Hawkins DJ, Radke SE, Davies HM; 
Reference Location: Science 1 992;257:72-74. 
Database Reference INTERPRO; IPR002864; 
Comment: This family consists of various acyl-acyl 
carrier protein (ACP) 

Comment: thioesterases (TE) these terminate fatty acyl 
group extension via 

Comment: hydrolyzing an acyl group on a fatty acid [1]. 
Number of members: 30 


Acyltransf erase 




Acyitransferase 


Accession number: PF01553 

Definition: Acyitransferase 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 28 (release 4.0) 

Gathering cutoffs: 8 8 

Trusted cutoffs: 1 4.40 1 4.40 

Noise cutoffs: 2.50 2.50 



Attorney No. 2750-1237P 



837 



Pfam 


Pr&sste 


Full Name 


Description 








HMM build command line: hmmbuild -F HMM SEED 

HMM build command iine: hmmcaiibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 9741 1131 

Reference Title: Barth syndrome may be due to an 

acyltransferase deficiency. 

Reference Author: Neuwald AF; 

Reference Location: Curr Biol 1 997; 7:465-466. 

Reference Number: [2] 

Reference Med iine: 96224398 

Reference Title: A novel X-linked gene, G4.5. is responsible 
for Barth 

Reference Title: syndrome. 

Reference Author: Bione S, D'Adamo P, Maestrini E, 
Gedeon AK, Bolhuis PA, 
Reference Author: Toniolo D; 
Reference Location: Nat Genet 1 996; 1 2:385-389. 
Database Reference INTERPRO; IPR0021 23; 
Database reference: PFAMB; PB009622; 
Database reference: PFAMB; PB00971 7; 
Database reference: PFAMB; PB033259; 
Database reference: PFAMB; PB041 1 02; 
Database reference: PFAMB; PB041638; 
Comment: This family contains acyltransferases 
involved in phospholipid 

Comment: biosynthesis and other proteins of unknown 
function [1]. This 

Comment: family also includes tafazzin Swiss:Q1 6635, 
the Barth syndrome 
Comment: gene [2]. 
Number of members: 74 


Adaptin_N 




Adaptin N terminal region 


Accession number: PF01602 

Definition: Adaptin N terminal region 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_491 (release 4.0) 

Gathering cutoffs: 12 12 

Trusted cutoffs: 1 5.50 1 5.50 

Noise cutoffs: 9.00 9.00 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcaiibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97409270 

Reference Title: Linking cargo to vesicle formation: receptor 
tail 

Reference Title: interactions with coat proteins. 
Reference Author: Kirchhausen T, Bontfacino JS, Riezman 
H; 

Reference Location: Curr Opin Cell Biol 1997;9:488-495. 
Reference Number: [2] 
Reference Medline: 89202379 

Reference Title: Structural and functional division into two 
domains of the 

Reference Title: large (1 00- to 1 1 5-kDa)chains of the 
clathrtn-assoctated 

Reference Title: protein complex AP-2. 

Reference Author: RAKirchhausen T, Nathanson KL, Matsui 

W, Vaisberg A, Chow 

Reference Author: EP, Burne C, Keen JH, Davis AE; 
Reference Location: Proc Natl Acad Sci U S A 1 989;86:261 2- 
2616. 

Database Reference INTERPRO; IPR002553; 

Database reference: PFAMB; PB040953; 

Comment: This family consists of the N terminal region 

of various alpha, 

Comment: beta and gamma subunits of the AP-1 , AP-2 
and AP-3 adaptor 

Comment: protein complexes. The adaptor protein (AP) 
complexes are involved in 

Comment: the formation of clathrin-coated pits and 
vesicles [1]. 

Comment: The N-terminal region of the various adaptor 
proteins (APs) is constant 
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Comment: by comparison to the C-termina! which is 
variable within members of the 

Comment: AP-2 family [2]; and it has been proposed 
that this constant region 

Comment: interacts with another uniform component of 
the coated vesicles [2], 
Number of members: 66 


ALAD 


PDOC00153 


Delta-am inolevulinic acid 
dehydratase active site 


Delta-aminolevulinic acid dehydratase (EC 4.2.1.24) (ALAD) [1] 
catalyzes the 

second step in the biosynthesis of heme, the condensation of two 
molecules of 

5-aminolevulinate to form porphobilinogen. The enzyme is an 
oligomer composed 

of eight identical subunits. Each of the subunits binds an atom of 
zinc or of 

magnesium (in plants). A lysine has been implicated in the 
catalytic mechanism 

[2]. The sequence of the region in the vicinity of the active site 
residue 

is conserved in ALAD from various prokaryotic and eukaryotic 
species. 

Description of pattern (s) and/or profile(s) 

Consensus pattern G-x-D-x-[LIVM](2)-[lV]-K-P-[GSA]-x(2)-Y [K is 
the active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1995 / Pattern and text revised. 

References 

[ 1] 

Li J.-M., Russeil C.S., Cosioy S.D. 
Gene 75:177-184(1989). 








[2] 

Gibbs P.N.B., Jordan P.M. 
Biochem. J. 236:447-451(1986). 


Aldolase 


PDOC00144 


KDPG and KHG 
aldolases active site 
signatures 


4-hydroxy-2-oxoglutarate aldolase (EC 4.1 .3.16) (KHG-aldolase) 
catalyzes the 

interconversion of 4-hydroxy-2-oxogiutarate into pyruvate and 
gtyoxylate. 

Phospho-2-dehydro-3-deoxygluconate aldolase (EC 4.1 .2.1 4) 
(KDPG-aidoiase) 

catalyzes the interconversion of 6-phospho-2-dehydro-3-deoxy- 
D-gluconate into 

pyruvate and glyceraldehyde 3-phosphate. 

These two enzymes are structurally and functionally related [1]. 
They are both 

homotrimeric proteins of approximately 220 amino-acid residues. 
They are class 

I aldolases whose catalytic mechanism involves the formation of 
a Schiff-base 

intermediate between the substrate and the epsi Ion-am ino 
group of a lysine 

residue. In both enzymes, an arginine is required for catalytic 
activity. 

We developed two signature patterns for these enzymes. The 
first one contains 

the active site arginine and the second, the lysine involved in 
the Schiff- 
base formation. 

Description of pattern (s) and/or profile(s) 
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Consensus pattern G-[LIVM3-x(3)-E-[L!V]-T-[LF]-R [R is the active 
site residue] 

Sequences known to belong to this class detected by the pattern 
ALL, except for Bacillus subtilis KDPG -aldolase which has Thr 
nstead of Arg in the active site. 
Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern G-x(3)-fLiVMF]-K-[LFl-F-P-[SA]-x(3)-G [K is 
involved in Schiff-base formation] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence (s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Patterns and text revised. 

References 

[1] 

Vlahos C J , Dekker E.E. 

J. Biol. Chem. 263:11683-11691(1988). 


AIpha_L„fucos 


PDOC00324 


Alpha-L-fucosidase 


Alpha-L-fucosidase (EC 3.2.1.51) [1] is a lysosomal enzyme 
responsible for 

hydrolyzing the alpha-1 ,6-linked fucose joined to the 
reducing-end 

N-acetylglucosamine of the carbohydrate moieties of 
glycoproteins. Deficiency 

of alpha-L-fucosidase results in the lysosomal storage disease 
fucosidosis. 

A cysteine residue is important for the activity of the enzyme. 
There is only 

one cysteine conserved between the sequence of mammalian 
alpha-L-fucosidase 

and that of the slime mold Dictyostelium discoideum. We have 
derived a pattern 

from the region around that conserved cysteine. 

Description of pattem(s) and/or profile(s) 

Consensus pattern P-x(2)-L-x(3)~K-W-E-x-C [C is the putative 
active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note these proteins belong to family 29 in the classification of 
glycosyl hydrolases [2,E1]. 
Last update 

November 1997 / Pattern and text revised. 

References 

[1] 

Fisher K.J., Aronson N.N Jr. 
Biochem. J. 264:695-701 (1989). 

[2] 

Hennssat B. 

Biochem. J. 280:309-316(1991). 
[E1] 

http://www.expasy.ch/cgi-bin/lists7glycosid.txt 


Amino__oxidase 




Flavin containing amine 
oxidase 


Accession number: PF01593 

Definition: Flavin containing amine oxidase 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_606 (release 4.1 ) 

Gathering cutoffs: -110-110 

Trusted cutoffs: -1 10.00 -1 10.00 

Noise cutoffs: -1 1 1 .80 -1 1 1 .80 

HMM buiid command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98258926 
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Reference Title: Maize poiyamine oxidase: primary structure 
from protein and 

Reference Title: cDNA sequencing. 

Reference Author: Tavladoraki P, Schinina ME, Cecconi F, 

Agostino SD, Manera 

Reference Author: F, ReaG, Mariottmi P, Federico R, 
Angelini R; 

Reference Location: FEBS Lett 1 998;426:62-66. 
Reference Number: [2] 
Reference Medline: 97306298 

Reference Title: A key amino acid responsible for substrate 
selectivity of 

Reference Title: monoamine oxidase A and B. 

Reference Author: Tsugeno Y, Ito A; 

Reference Location: J Biol Chem 1 997;272:1 4033-1 4036. 

Reference Number: [3] 

Reference Medline: 95287865 

Reference Title: Cloning, sequencing and heterologous 
expression of the 

Reference Title: monoamine oxidase gene from Aspergillus 
niger. 

Reference Author: Schilling B, Lerch K; 

Reference Location: Mol Gen Genet 1 995,247:430-438. 

Database Reference: SCOP; 1 b37; fa; [SCOP-USAJfCATH- 

PDBSUM] 

Database Reference INTERPRO; IPR002937; 
Database Reference PDB; 1 b37 A; 1 4; 455; 
Database Reference PDB; 1 b5q A; 1 4; 455; 
Database Reference PDB; 1 b37 B; 1 4; 455; 
Database Reference PDB; 1 b-37 C; 1 4; 455; 
Database Reference PDB; 1 b5q B; 1 4; 455; 
Database Reference PDB; 1 b5q C; 1 4; 455; 
Database reference: PFAMB; PB01 751 8; 
Database reference: PFAMB; PB024839; 
Database reference: PFAMB; PB040747; 
Comment: This family consists of various amine 
oxidases, including maze poiyamine 
Comment: oxidase (PAO) [1] and various flavin 
containing monoamine oxidases 

Comment: (MAO). The aligned region includes the 
flavin binding site of these 
Comment: enzymes. 

Comment: in vertebrates MAO plays an important role 
regulating the intracellular 

Comment: levels of amines via there oxidation; these 
include various 

Comment: neurotransmitters, neurotoxins and trace 
amines [2]. In lower eukaryotes 

Comment: such as aspergillus and in bacteria the main 
role of amine oxidases is 

Comment: to provide a source of ammonium [3]. 
Comment: PAOs in plants, bacteria and protozoa 
oxidase spermidine and spermine 

Comment: to an aminobutyral, diaminopropane and 
hydrogen peroxide and are 

L/Ommem. invuiveu in liic? ocuduunoni ui putyai i m iuo l'j- 
Comment: Other members of this family include 
tryptophan 2-monooxygenase, 

Comment: putrescine oxidase, corticosteroid binding 
proteins and antibacterial 
Comment: glycoproteins. 
Number of members: 58 


AN F receptor 


PDOC00430 


Natriuretic peptides 
receptors signature 


Natriuretic peptides are hormones involved in the regulation of 
fluid and 

electrolyte homeostasis. These hormones stimulate the 

intracellular proauciion 

of cyclic GMP as a second messenger. 

Currently, three types of natriuretic peptide receptors are known 
[1,2]. Two 

express guanylate cyclase activity: GC-A (or ANP-A) which 
seems specific to 

atrial natriuretic peptide (ANP), and GC-B (or ANP-B) which 
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seems to be 

stimulated more effectively by brain natriuretic peptide (BNP) 
man by ANP. 

The third receptor (AN P-C) is probably responsible for the 
clearance of ANP 

from the circulation and does not play a role in signal 
transduction. 

GC-A and GC-B are plasma membrane-bound proteins that 
share the following 

topology: an N-terminal extracellular domain which acts as the 
igand binding 

region, then a transmembrane domain followed by a large 
cytoplasmic C- 

termtnai region that can be subdivided into two domains: a protein 
kinase-iike 

domain (see <PDOC001 00>) that appears important for proper 
signalling and a 

guanylate cyclase catalytic domain (see <PDOC00425>). The 
topology of AN P-C is 

different: like GC-A and -B it possesses an extracellular 
ligand-binding 

region and a transmembrane domain, but its cytoplasmic domain 
is very short. 

We developed a pattern from the ligand-binding region of 
natriuretic peptide 

receptors based on a highly conserved region located in the N- 
terminal part of 
the domain. 

Description of pattern(s) and/or profiie(s) 

Consensus pattern G-P-x-C-x-Y-x-A-A-x-V-x-R-x(3)-H-W 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Last update 

May 1 991 / First entry. 

References 

r 1 1 

Garbers D.L. 

New Biol. 2:499-504(1990). 
[2] 

Schulz S., Chinkers M., Garbers D.L. 
FASEB J. 2:2026-2035(1989). 


ApocytochromeF 


PDOC00169 


Cytochrome c family 
heme-binding site 
signature 


In proteins belonging to cytochrome c family [1], the heme group 
is covalently 

attached by thioether bonds to two conserved cysteine residues. 
The consensus 

sequence for this site is Cys-X-X-Cys-His and the histidine 
residue is one of 

the two axial ligands of the heme iron. This arrangement is 
shared by all 

proteins known to belong to cytochrome c family, which 
presently includes 

cytochromes c, c\ ci to c6, c550 to c556, cc3/Hmc, cytochrome f 

and reaction 

center cytochrome c. 

r^csoorintirin r\f nattorn/Ql pnH/rtr nrnfiif^f ^\ 
USSCnpHUil \Jl pculeJIM^o^ elMU/ul yi ui ncr^o/ 

Consensus pattern C-{CPWHF}-{CPWR}~C-H-{CFYVV} 
Sequences known to belong to this class detected by the pattern 
ALL, except for four cytochrome c's which lack the first thioether 
bond. 

Other sequence(s) detected in SWISS-PROT 454. 
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Note: some cytochrome c's have more than a single bound heme 
group c4 has 2, c7 has 3, c3 has 4, the reaction center has 4, and 
PpQ/Hmr has 16 ' 
Last update 

June 1992 / Text revised. 

References 

[1] 

Mathews F.S. 

Prog. Biophys. Mol. Biol. 45:1-56(1985). 


arf 


PDOC00781 
PDOC00017 
PDOC01020 


ADP-ribosy!ation factors 
family signature; 
ATP/GTP-binding site 
motif A (P-ioop); 
ATP 

phosphoribosyltransferas 
e signature 
PROSiTE cross- 
reference^) 


ADP-ribosylation factors (ARF) [1 ,2,3,4] are 20 Kd GTP- 
binding proteins 

nvoived in protein trafficking. They may modulate vesicle 
budding and 

uncoating within the Golgi apparatus. ARF's also act as allosteric 
activators 

of cholera toxin ADP-ribosyltransferase activity. They are 
evolutionary 

conserved and present in all eukaryotes. At least six forms of ARF 
are present 

in mammals and three in budding yeast. The ARF family also 
includes proteins 

highly related to ARF's but which lack the cholera toxin cofactor 
activity, 

they are collectively known as ARL's (ARF-like). 

ARD1 is a 64 Kd mammalian protein of unknown biological 

function that contains 

an ARF domain at its C-terminal extremity. 

Proteins from the ARF family are generally included in the RAS 
'superfamily' 

of small GTP-binding proteins [5], but they are only slightly 
related to the 

other RAS proteins. They also differ from RAS proteins in that 
they lack 

cysteine residues at their C-termini and are therefore not 
subject to 

prenylation. The ARFs are N-terminally myristoylated (the ARLs 
have not yet 

been shown to be modified in such a fashion). 

As a signature pattern, we selected a conserved region in the C- 

terminai part 

of ARFs and ARL's. 

Description of pattern(s) and/or profiie(s) 

Consensus pattern [HRGT]-x-[FYWI]-x-[LIVM]~x(4)-A-x(2)-G-x(2)- 
[LIVM]-x(2)- [GSAHLIVMF]-x-[WKHLIVM] 
Sequences known to belong to this class detected by the pattern 
ALL, except for 4 sequences. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note proteins belonging to this family also contain a copy of the 
ATP/GTP- binding motif 'A 1 (P-loop) (see <PDOC00017>). 
Expert(s) to contact by email 
Kahn R.A. rkahn@bimcore.emory.edu 

Last update 

November 1997 / Pattern and text revised. 
Cell. Signal. 4.367-399(1993). References 
[1] 

Boman A.L., Kahn R.A. 

Trends Biochem. Sci. 20:147-150(1995). 

[2] 

Moss J., Vaughan M. 
[3] 

Moss J., Vauqhan M. 
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Prog Nucleic Acid Res. Mol. Biol. 45:47-65(1993). 
[4] 

Amor J. C, Harrison D.H., Kahn R.A., Ringe D. 
Nature 372:704-708(1994). 

[5] 

Valencia A , Chardin P., Wittinghofer A., Sander C. 
Biochemistry 30:4637-4648(1991). 

From sequence comparisons and crystal lographic data analysis it 
has been shown 

[1 ,2,3,4,5,6] that an appreciable proportion of proteins that bind 
ATP or GTP 

share a number of more or less conserved sequence motifs. The 
best conserved 

of these motifs is a glycine-rich region, which typically forms a 
flexible 

loop between a beta-strand and an alpha-helix. This loop interacts 
with one of 

the phosphate groups of the nucleotide. This sequence motif 
is generally 

referred to as the 'A 1 consensus sequence [1] or the 'P-loop' [5]. 

There are numerous ATP- or GTP-binding proteins in which the 
P-loop is found 

Wehstbeiow a number of protein families for which the 
relevance of the 

presence of such motif has been noted: 

- ATP synthase aipha and beta subunits {see <PDOC001 37>). 

- Myosin heavy chains. 

- Kinesin heavy chains and kinesin-like proteins (see 
<PDOC00343>). 

- Dynamins and dynamin-iike proteins (see <PDOC00362>). 

- Guanylate kinase {see <PDOC00670>). 

- Thymidine kinase (see <PDOC00524>). 

- Thymidylate kinase (see <PDOC01 034>). 

- Shikimate kinase (see <PDOC00868>). 

- Nitrogenase iron protein family (nifH/frxC) (see <PDOC00580>). 

- ATP-binding proteins involved in 'active transport' (ABC 
transporters) [7\ 

(see <PDOC00185>). 

- DNA and RNA helicases [8,9,10]. 

- GTP-binding elongation factors (EF-Tu, EF-1 alpha, EF-G, EF-2, 
etc.). 

- Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, Ypt1 , 
SEC4, etc.). 

- Nuclear protein ran (see <PDOC00859>). 

- ADP-ribosylation factors family (see <PDOC00781>). 

- Bacterial dnaA protein (see <PDOC00771>). 

- Bacterial recA protein (see <PDOC001 31 >) 

- Bacterial recF protein (see <PDOC00539>). 

- Guanine nucleotide-binding proteins alpha subunits (Gi, Gs, Gt, 
GO, etc.). 

- DNA mismatch repair proteins mutS family (See 
<PDOC00388>). 

- Bacterial type II secretion system protein E (see 
<PDOC00567>). 

Not all ATP- or GTP-binding proteins are picked-up by this motif. 
A number of 

proteins escape detection because the structure of their ATP- 
binding site is 

completely different from that of the P-loop. Examples of such 
proteins are 

the E1 -E2 ATPases or the glycolytic kinases. In other ATP- or 
GTP-binding 

proteins the flexible loop exists in a slightly different form; this is 
the 

case for tubulins or protein kinases. A special mention must be 
reserved for 

adenylate kinase, in which there is a single deviation from the 
P-loop 
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Dattem: in the last position Giy is found instead of Ser or Thr. 

Description of pattern (s) and/or profiie(s) 
Consensus pattern [AG]-x(4)-G-K-[ST| 

Sequences known to belong to this class detected by the pattern a 
majority. 

Other sequence(s) detected in SWISS-PROT in addition to the 
proteins listed above, the 'A' motif is also found in a number of 
other proteins. Most of these proteins probably bind a nucleotide, 
but others are definitively not ATP- or GTP-binding (as for 
example chymotrypsin, or human ferritin light chain) 
Expert(s) to contact by email 
Koonin E.V. koonin@ncbi.nlm.nih.gov 

Last update 

July 1 999 / Text revised. 

References 

[1] 

Walker J.E., Saraste M., Runswick M.J., Gay N.J. 
EMBO J. 1:945-951(1982). 

[2] 

Moller W. } Amons R. 
FEBSLett. 186:1-7(1985). 

[3] 

Fry D.C., Kuby S.A., Mildvan A.S. 

Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 

[4] 

Dever T.E., Glynias M.J., Merrick W.C. 

Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). 

[5] 

Saraste M., Sibbald P.R., Wittinghofer A. 
Trends Biochem. Sci. 15:430-434(1990). 

[6] 

Koonin E.V. 

J. Mol. Biol. 229:1165-1174(1993). 
[7] 

Higgins C.F., Hyde S.C., Mimmack M.M., Giieadi U., Gill D.R., 
Gallagher M.P. 

J. Bioenerg. Biomembr. 22:571-592(1990). 
[8] 

Hodgman T.C. 

Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 
[9] 

Under P., Lasko P., Ashburner M., Leroy P., Nielsen PJ, Nishi 
K., Schnier J., Sionimski P.P. 
Nature 337:121-122(1989). 

[10] 

Gorbalenya A.E., Koonin E.V., Donchenko A.P., Blinov V.M. 
Nucleic Acids Res. 17:4713-4730(1989). 

ATP phosphoribosyltransferase (EC 2.4.2.17) is the enzyme that 
catalyzes the 

first step in the biosynthesis of histidine in bacteria, fungi and 
plants. It 

is a protein of about 23 to 32 Kd. As a signature pattern we 
selected a region 

located in the C-terminal part of this enzyme. 
Description of pattern(s) and/or profile(s) 

Consensus pattern E-x(5)-G-x-[SAG]-x(2)-[IV]-x-D-[LIV]-x(2)-[STl- 
!G-x-T-rLMl 
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Sequences known to belong to this class detected by the pattern 
M_L. 

Other sequence(s) detected in SWISS-PROT NONE. 

_ast update 

July 1998/ First entry. 


ArgJ 




ArgJ family 


Accession number: PF01960 

Definition: ArgJ family 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 258.70 99.60 

Noise cutoffs: 7.107.10 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 93232760 

Reference Title: Primary structure, partial purification and 
regulation of 

Reference Title: key enzymes of the acetyl cycle of arginine 
biosynthesis in 

Reference Title: Bacillus stearothermophiius: dual function 
of ornithine 

Reference Title: acetyltransferase. 

Reference Author: Sakanyan V, Chariier D, Legrain C, 

Kochikyan A, Mett l t 

Reference Author: Pierard A, Glansdorff N; 

Ref erence Location: J Gen Microbiol 1993;139:393-402. 

Database Reference INTERPRO; IPR002813; 

Comment: Members of the ArgJ family catalyse the first 

EC:2.3.1.35 and 

Comment: fifth steps EC:2.3.1 .1 in arginine 
biosynthesis. 

Number of members: 22 


Armadillo_seg 




Armadillo/beta-catenin- 
like repeats 


Accession number: PF00514 

Definition: Armadillo/beta-catenin-like repeats 

Author: Bateman A, Chris Ponting, Joerg Schultz, Peer 

Bork 

Alignment method of seed: Manual 

Source of seed members: SMART 

Gathering cutoffs: 24 0 

Trusted cutoffs: 24.1 0 0.00 

Noise cutoffs: 20.70 20.20 

HMM build command iine: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97442350 

Reference Title: Three-dimensional structure of the armadillo 
repeat region 

Reference Title: of beta-catenin. 

Reference Author: Huber AH } Nelson WJ, Weis Wi; 

Reference Location: Cell 1 997;90:871 -882. 

Reference Number: [2] 

Reference Medline: 961 07551 

Reference Title: Signal transduction of beta-catenin. 

Reference Author: Gumbiner BM; 

Reference Location: Curr Opin Cell Biol 1 995;7:634-640. 

Reference Number: [3] 

Reference Medline: 97454713 

Reference Title: Armadillo and dTCF: a marriage made in 
the nucleus. 

Reference Author: Cavalio R, Rubenstein D, Peifer M; 
Reference Location: Curr Opin Genet Dev 1 997;7:459-466. 
Reference Number: [4] 
Reference Medline: 94082295 

Reference Title: Association of the APC tumor suppressor 
protein with 

Reference Title: catenins. 

Reference Author: Su LK, Vogelstein B, Kinzler KW; 
Reference Location: Science 1 993;262:1 734-1 737. 
Reference Number: [5] 
Reference Medline: 94082294 
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Reference Title: Association of the APC gene product with 
3eta-catenin. 

Reference Author: Rubinfeld B, Souza B, Albert l ; Muller 0, 
Chamberlain SH, 

Reference Author: Masiarz FR, Munemitsu S, Polakis P; 
Reference Location: Science 1993;262:1731-1734. 
Reference Number: [6] 
Reference Medline: 91 084846 

Reference Title: The segment polarity gene armadillo 
encodes a functionally 

Reference Title: modular protein that is the Drosophila 

homolog of human 

Reference Title: plakogiobin. 

Reference Author: Peifer M, Wieschaus E; 

Reference Location: Cell 1 990;63:1 167-1 1 76. 

Database Reference: SCOP; 3bct; fa; [SCOP-USA][CATH- 

PDBSUM] 

Database Reference: EXPERT; Chris.Ponting<©human- 
anatomy.oxford.ac.uk; 
Database reference: SMART; ARM; 
Database Reference INTERPRO; IPR000225; 
Database Reference PDB; 1 ee5 A; 41 7; 457; 
Database Reference PDB; 1 bk5 A; 41 7; 457; 
Database Reference PDB; 1 bk5 B; 41 7; 457; 
Database Reference PDB; 1 bk6 A; 41 7; 457; 
Database Reference PDB; 1 bk6 B; 41 7; 457; 
Database Reference PDB; 1 ee4 A; 41 7; 457; 
Database Reference PDB; 1 ee4 B; 41 7; 457; 
Database Reference PDB; 1 ejl I; 409; 449; 
Database Reference PDB; 1 ejy I; 409; 449; 
Database Reference PDB; 1 iai A; 409; 449; 
Database Reference PDB; 1 ee5 A; 246; 286; 
Database Reference PDB; 1 bk5 A; 246; 286; 
Database Reference PDB; 1 bk5 B; 246; 286; 
Database Reference PDB; 1 bk6 A; 246; 286; 
Database Reference PDB; 1 bk6 B; 246; 286; 
Database Reference PDB; 1 ee4 A; 246; 286; 
Database Reference PDB; 1ee4 B; 246; 286; 
Database Reference PDB; 1 ejl I; 241 ; 280; 
Database Reference PDB; 1 ejy I ; 241 ; 280; 
Database Reference PDB; 1 iai A; 241 ; 280; 
Database Reference PDB; 1 ee5 A; 288; 328; 
Database Reference PDB; 1 bk5 A; 288; 328; 
Database Reference PDB; 1bk5 B; 288; 328; 
Database Reference PDB; 1 bk6 A; 288; 328; 

Database Reference PDB; 1 bk6 B; 288; 328; 

Database Reference PDB; 1 ee4 A; 288; 328; 

Database Reference PDB; 1 ee4 B; 288; 328; 

Database Reference PDB; 1 ejl i; 282; 322; 

Database Reference PDB; 1ejy !; 282; 322; 

Database Reference PDB; 1 iai A; 282; 322; 

Database Reference PDB; 1 ejl I ; 1 51 ; 1 91 ; 

Database Reference PDB; 1 ejy I; 1 51 ; 1 91 ; 

Database Reference PDB; 1 iai A; 1 51 ; 1 91 ; 

Database Reference PDB; 1 ee5 A; 1 62; 202; 

Database Reference PDB; 1 bk5 A; 1 62; 202; 

Database Reference PDB; 1 bk5 B; 1 62; 202; 

Database Reference PDB; 1 bk6 A; 1 62; 202; 

Database Reference PDB; 1 bk6 B; 1 62; 202; 

Database Reference PDB; 1 ee4 A; 1 62; 202; 

Database Reference PDB; 1 ee4 B; 1 62; 202; 

Database Reference PDB; 1 ee5 A; 330; 370; 

Database Reference PDB; 1 bk5 A; 330; 370; 

Database Reference PDB; 1 bk5 B; 330; 370; 

Database Reference PDB; 1 bk6 A; 330; 370; 

Database Reference PDB; 1 bk6 B; 330; 370; 

Database Reference PDB; 1 ee4 A; 330; 370; 

Database Reference PDB; 1ee4 B; 330; 370; 

Database Reference PDB; 1 ejl I; 324; 364; 

Database Reference PDB; 1 ejy I; 324; 364; 

Database Reference PDB; 1 iai A; 324; 364; 

Database Reference PDB; 1 ee5 A; 372; 41 2; 

Database Reference PDB; 1 bk5 A; 372; 41 2; 

Database Reference PDB; 1 bk5 B; 372; 41 2; 
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Database Reference PDB; 1 bk6 A; 372; 41 2; 
Database Reference PDB; 1 bk6 B; 372; 412; 
Database Reference PDB; 1 ee4 A; 372; 41 2; 
Database Reference PDB; 1 ee4 B; 372; 41 2; 
Database Reference PDB; 1ejt I; 366; 406; 
Database Reference PDB; 1ejy I; 366; 406; 
Database Reference PDB; 1 ial A; 366; 406; 
Database Reference PDB; 1 eji I; 108; 149; 
Database Reference PDB; 1 ejy i; 108; 1 49; 
Database Reference PDB; 1 ial A; 1 08; 149; 
Database Reference PDB; 1 ee5 A; 1 1 9; 1 60; 
Database Reference PDB; 1 bk5 A; 1 1 9; 1 60; 
Database Reference PDB; 1 bk5 B; 1 1 9; 1 60; 
Database Reference PDB; 1 bk6 A; 1 1 9; 1 60; 
Database Reference PDB; 1 bk6 B, 1 1 9; 1 60; 
Database Reference PDB; 1 ee4 A; 1 1 9; 1 60; 
Database Reference PDB; 1 ee4 B; 1 1 9; 1 60; 
Database Reference PDB; 3bct ; 583; 623; 
Database Reference PDB; 2bct ; 583; 623; 
Database Reference PDB; 3bct ; 391 ; 429; 
Database Reference PDB; 2bct ; 391 ; 429; 
Database Reference PDB; 3bct ; 224; 264; 
Database Reference PDB; 2bct ; 224; 264; 
Database Reference PDB; 3bct ; 431 ; 473; 
Database Reference PDB; 2bct ; 431 ; 473; 
Database Reference PDB; 3bct ; 350; 390; 
Database Reference PDB; 2bct ; 350; 390; 
Database Reference PDB; 1 ejl I ; 1 93; 238; 
Database Reference PDB; 1 ejy I; 1 93; 238; 
Database Reference PDB; 1 ial A; 1 93; 238; 
Database Reference PDB; 1 ee5 A; 204; 244; 
Database Reference PDB; 1 bk5 A; 204; 244; 
Database Reference PDB; 1 bk5 B; 204; 244; 
Database Reference PDB; 1 bk6 A; 204; 244; 
Database Reference PDB; 1 bk6 B; 204; 244; 
Database Reference PDB; 1 ee4 A; 204; 244; 
Database Reference PDB; 1 ee4 B; 204; 244; 
Database Reference PDB; 1 ibr D; 399; 437; 
Database Reference PDB; 1 ibr B; 399; 437; 
Database Reference PDB; 1 qgk A; 399; 437; 
Database Reference PDB; 1 qgr A; 399; 437; 
Database reference; PFAMB; PB002221 ; 
Database reference: PFAMB; PB00261 7; 
Database reference: PFAMB; PB004638; 
Database reference: PFAMB; PB01 231 0; 
Database reference: PFAMB; PB040528; 
Database reference: PFAMB; PB041028; 
Comment: Approx. 40 amino acid repeat. Tandem 
repeats form super-helix of helices 

Comment: that is proposed to mediate interaction of 
beta-catenin with its ligands. 

Comment: CAUTION: This family does not contain all 
known armadillo repeats. 
Number of members: 597 




ATP synt_B_c 


PDOC00137 


ATP synthase alpha and 
beta subunits signature 


ATP synthase (proton-translocating ATPase) {EC 3.6.1 .34) [1 ,2] 
is a component 

of the cytoplasmic membrane of eubactena, the inner membrane 
of mitochondria, 

and the thylakoid membrane of chloroplasts. The ATPase 
complex is composed of 

an oligomeric transmembrane sector, called CF(0), and a catalytic 
core, called 

coupling factor CF(1). The former acts as a proton channel; the 
latter is 

composed of five subunits, alpha, beta, gamma, delta and 
epsilon. The 

sequences of subunits alpha and beta are related and both 
contain a 

nucleotide-binding site for ATP and ADP. The beta chain has 
catalytic 

activity, while the alpha chain is a regulatory subunit. 

Vacuolar ATPases [31 (V-ATPases) are responsible for acidifyinc 


) 
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a variety of 

ntracellular compartments in eukaryotic cells. Like F-ATPases, 
they are 

otigomeric complexes of a transmembrane and a catalytic 
sector. The sequence 

of the largest subunit of the catalytic sector (70 Kd) is related to 
that of 

F-ATPase beta subunit, while a 60 Kd subunit, from the same 
sector, is related 

to the F-ATPases alpha subunit [4]. 

Archaebacterial membrane-associated ATPases are composed 
of three subunits. 

The alpha chain is related to F-ATPases beta chain and the 
beta chain is 

related to F-ATPases alpha chain [4]. 

A protein highly similar to F-ATPase beta subunits is found [5] 
in some 

bacterial apparatus involved in a specialized protein export 
pathway that 

proceeds without signal peptide cleavage. This protein is 
known as flil in 

Bacillus and Salmonella, Spa47 (mxiB) in Shigella flexneri, 
HrpB6 in 

Xanthomonas campestris and yscN in Yersinia virulence 
plasmids. 

In order to detect these ATPase subunits, we took a segment of 
ten amino-acid 

residues, containing two conserved serines, as a signature 
pattern. The first 

serine seems to be important for catalysis - in the ATPase 
alpha chain at 

least - as its mutagenesis causes catalytic impairment. 
Description of pattern(s) and/or prof iie(s) 

Consensus pattern P-[SAP]-[LIV]-EDNH]-x{3}-S-x-S [The firsts is 
a putative active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL, except for the archaebacterium Sulfolobus acidocaldarius 
ATPase alpha chain where the first Ser is replaced by Gly. 
Other sequence(s) detected in SWISS-PROT 37. 

Note F-ATPase alpha and beta subunits, V-ATPase 70 Kd subunit 
and the archaebacterial ATPase alpha subunit also contain a 
copy of the ATP-binding motifs A and B (see <PDOC00017>). 
Last update 

November 1997 / Pattern and text revised. 

References 

[1] 

Futai M., Noumi T., Maeda M. 

Annu. Rev. Biochern. 58:111-136(1989). 

[2] 

Senior A.E. 

Physiol. Rev. 68:177-231(1988). 
[3] 

Nelson N. 

J. Bioenerg. Biomembr. 21:553-571(1989). 
[4] 

Gogarten J.P., Kibak H., Dittrich P., Taiz L, Bowman E.J., 
Bowman B.J , Manolson M.F., Poole R.J., DateT., OshimaT., 
Konishi J., Denda K., Yoshida M. 
Proc. Natl. Acad. Sci. U.S.A. 86:6661-665(1989). 

I 51 

Dreyfus G., Williams A.W., Kawagishi L, MacNab R.M. 
J. Bacteriol. 175:3131-3138(1993). 
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ATP-gua_Ptrans 


=>DOC00103 i 


<\TP:guanido > 
Dhosphotransferases 1 
active site r 
f 
i 
t 

1 


MP:guanido phosphotransferases are a family of structurally and 
unctionally 

elated enzymes [1 ,2] that reversibly catalyze the transfer of 
Phosphate 

between ATP and various phosphogens. The enzymes that 

belongs to this family 

are: 

- Creatine kinase (EC 27.3.2} (CK) [3,4], which plays an 
mportant role in 

energy metabolism of vertebrates. It catalyzes the reversible 
ransfer of 

high energy phosphate from ATP to creatine, generating 
phosphocreatine and 

ADP. There are at least four different, but very closely related, 
forms of 

CK. Two of the CK isozymes are cytosolic: the M (muscle) 
and B (brain) 

forms while the two others are mitochondrial. In sea urchin 
there is a 

flagellar isozyme, which consists of the triplication of a CK- 
domain. 

- Giycocyamine kinase (EC 2.7.3.1) (guanidoacetate kinase), an 
enzyme that 

catalyzes the transfer of phosphate from ATP to guanidoacetate. 

- Arginine kinase (EC 2.7.3.3), an enzyme that catalyzes the 
transfer of 

phosphate from ATP to arginine. 

- Taurocyamine kinase (EC 2.7.3.4), an annelid-specific enzyme 
that catalyzes 

the transfer of phosphate from ATP to taurocyamine. 

- Lombricine kinase (EC 2.7.3.5), an annelid-specific enzyme 
that catalyzes 

the transfer of phosphate from ATP to lombricine. 

- Smc74 [1], a cercaria-specific enzyme from Schistosoma 
mansoni. This enzyme 

consists of two CK-related duplicated domains. The substrate(s) 
specificity 
of Smc74 is not yet known. 

A cysteine residue is implicated in the catalytic activity of these 
enzymes. 

The region around this active site residue is highly conserved and 

can be used 

as a signature pattern. 

Description of pattern(s) and/or profile(s) 

Consensus pattern C-P-x<OJ)-[ST]-N-[IL]-G-T [C is the active site 
residue] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1 997 / Pattern and text revised. 

References 

[1] 

Stein L.D., Harn D.A., David J.R. 
J. Biol. Chem. 265:6582-6588(1990). 

[2] 

Strong SJ., Ellington W.R. 

Biochim. Biophys. Acta 1246:197-200(1995). 

[33 

Bessman S.-P., Carpenter C.L 

Annu. Rev. Biochem. 54:831-862(1985). 

[4] 

Haas R.C., Strauss A.W. 

J. Biol. Chem. 265:6921-6927(1990). 
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Definition: ATP synthase subunit D 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 304 (release 4.2) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 57.80 1 57.80 

Noise cutoffs: -79.90 -79.90 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96324968 

Reference Title: Subunit structure and organization of the 
genes of the A1 AO 

Reference Title: ATPase from the Archaeon Methanosarcina 
mazei Go1 . 

Reference Author: Wilms R, Freiberg C, Wegerle E, Meier I, 
Mayer F, Muller V; 

Reference Location: J Biol Chem 1 996;271 :1 8843-1 8852. 
Reference Number: [2] 
Reference Medline: 951 32627 

Reference Title: A bovine cDNA and a yeast gene (VMA8) 
encoding the subunit 

Reference Title: D of the vacuolar H(+) -ATPase. 
Reference Author: Nelson H, Mandiyan S, Nelson N; 
Reference Location : Proc Natl Acad Sci U S A 1 995;92:497- 
501. 

Database Reference INTERPRO; IPR002699; 
Comment: This is a family of subunit D form various 
ATP synthases 

Comment: including V-type H+ transporting and Nan- 
dependent. 

Comment: Subunit D is suggested to be an integral 
part of the 

Comment: catalytic sector of the V- ATPase [2]. 
Number of members: 21 


ATZ TRZ 




Chlorohydrolase 


Accession number: PF01685 

Definition: Chlorohydrolase 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 1 92 (release 4.1 ) 

Gathering cutoffs: -84 -84 

Trusted cutoffs: -74.80 -74.80 

Noise cutoffs: -94. 30 -94.30 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96326334 

Reference Title: Atrazine chlorohydrolase from 

Pseudomonas sp. strain ADP: 

Reference Title: gene sequence, enzyme purification, and 
protein 

Reference Title: characterization [published erratum appears 
in J Bacteriol 

Reference Title: 1 999 Jan; 1 81 (2) :695] 

Reference Author: de Souza ML, Sadowsky MJ, Wackett LP; 

Reference Location: J Bacteriol 1 996;1 78:4894-4900. 

Reference Number: [2] 

Reference Medline: 9601 1 356 

Reference Title: Cloning and expression of the s-triazine 
hydrolase gene 

Reference Title: (trzA) from Rhodococcus corallinus and 
development of 

Reference Title: Rhodococcus recombinant strains capable 
of deaikylating and 

Reference Title: dechlorinating the herbicide atrazine. 
Reference Author: Shao ZQ, Seffens W, Muibry W, Behki 
RM; 

Reference Location. J Bacteriol 1 995; 1 77:5748-5755. 

Database Reference INTERPRO; IPR002604; 

Database reference: PFAMB; PB034853; 

Database reference: PFAMB; PB040603; 

Comment: This family consist of chlorohydrolase from 

the ATZ/TRZ family; 
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Comment: these enzymes catalyse hydrolytic 
echlori nation of their substrates. 

Comment: Atrazine chlorohydrolase (AtzA) from 
5 seudomonas sp. Swiss:P72156 

Comment: catalyses the dechlorination of atrazine to 

yuiOXyall az.ll It; [1J. 

Comment: s-Triazme hydrolase (TrzA) form R. 
orailinus Swiss:P72156 

Comment: catalyses the deamination and dechlorination 
>f melamine and 

Comment: deethyisimazine to ammeline and N- 

ithyiammeline [1]. 

slumber of members: 29 


B56 




Protein phosphatase 2A 
regulatory B subunit (B56 
family) 


Accession number: PF01603 

Definition: Protein phosphatase 2A regulatory B subunit 
{B56 family) 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_984 (release 4.1 ) 

Gathering cutoffs: 1111 

Trusted cutoffs: 1 7.80 1 7.80 

Noise cutoffs: 5.50 5.50 

HMM build command line: hmmbuiid -f HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96064678 

Reference Title: Identification of a new family of protein 
phosphatase 2A 

Reference Title: regulatory subunits. 

Reference Author: McCnght B, Virshup DM; 

Reference Location: J Biol Chem 1 995;270:261 23-261 28. 

Database Reference INTERPRO; IPR002554; 

Comment: Protein phosphatase 2A (PP2A) is a major 

intracellular protein 

Comment: phosphatase that regulates multiple aspects 
of cell growth and metabolism. 

Comment: The ability of this widely distributed 

Comment: diverse array of substrates is largely 
controlled by the nature of its 

Comment: regulatory B subunit There are multiple 
families of B subunits (See also 

Comment: PR55), this family is called the B56 family 
[1]- 

Number of members: 34 


BacexporM 




Bacterial export proteins, 
family 1 


Accession number: PF01311 

Definition: Bacterial export proteins, family 1 

Author: Finn RD, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B J 442 (release 3.0) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 37.20 37.20 

Noise cutoffs: -95.00 -95.00 

HMM build command line: hmmbuiid -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 951 1 3771 

Reference Title: Caulobacter FliQ and FliR membrane 
proteins, required for 

Reference Title: flagellar biogenesis and cell division, belong 
to a family 

Reference Title: of virulence factor export proteins. 

Reference Author: Zhuang WY, Shapiro L; 

Reference Location: J Bacterid 1995;177:343-356. 

Database Reference INTERPRO; 1PR002010; 

Comment: This family includes the following members; 

Comment: FliR, MopE, SsaT, YopT, Hrp, HrcT and 

SpaR 

Comment: All of these members export proteins, that 
do not possess signal 

Comment: peptides, through the membrane. Although 
the proteins that these 
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Comment: exporters move may be different, the 
exporters are thought to 

Comment. function in similar ways [1]. 
Number of members: 29 


Band_41 


PDOC00566 


Band 4.1 family domain 
signatures and profile 


A number of cytoskeietai -associated proteins that associate 
with various 

proteins at the interface between the plasma membrane and the 
cytoskeleton 

contain a conserved N-terminai domain of about 150 amino-acid 
residues [1 ,2, 

3]. The proteins in which such a domain is known to exist are 
isted below. 

- Band 4.1, which links the spectrin-actin cytoskeleton of 
erythrocytes to 

the plasma membrane. Band 4.1 binds with a high affinity to 
glycophorin and 
with lower affinity to band 3 protein. 

- Ezrin (cytovillin or p81), a component of the undercoat of the 
microvilli 

plasma membrane. 

- Moesin, which is probably involved in binding major cytoskeletal 
structures 

to the plasma membrane. 

- Radixin, which seems to play a crucial role in the binding of 
the barbed 

end of actin filaments to the plasma membrane in the undercoat 
of the cell- 
to-cell adherens junction (AJ). 

- Talin, which binds with high affinity to vinculin and with low 
affinity to 

integrins. Talin is a high molecular weight (270 Kd) cytoskeletal 
protein 

concentrated in regions of cell-substratum contact and, in 
lymphocytes, of 
cell-cell contacts. 

- Filopodin, a slime mold protein that binds actin ans which is 
involved in 

the control of cell motility and chemotaxis. 

- Merlin (or schwannomin). Defects in this protein are the cause 
of type 2 

neurofibromatosis (NF2), a predisposition to tumors of the 
nervous system. 

- Protein NBL4. 

- Protein-tyrosine phosphatases PTPN3 (PTP-H1) and 
PTPN4 (PTP-MEG1). 

Structurally these two very similar enzymes are composed of 
a N-termina! 

band 4.1 -like domain followed by a central segment of unknown 
function and 

aC-terminal catalytic domain (see <PDOC00323>). They 
could act at 

junctions between the membrane and the cytoskeleton. 
-Protein-tyrosine phosphatases PTPN14 (PEZ or PTP36) and 
PTP-D1 , PTP-RL10 

and PTP2E. These phosphatases also consist of a N-terminal 
band 4.1 -like 

domain and a C-terminal catalytic domain. The central 
domain seems to 
contain a SH3-binding domain. 

- Caenorhabditis elegans protein phosphatase ptp-1 . 

Ezrin, moesin, and radixin are highly related proteins, but the 
other proteins 

in which this domain is found do not share any region of similarity 
outside of 

the domain. In band 4.1 this domain is known to be 
important for the 

interaction with glycophorin, an integral membrane protein. 

We have developed two signature patterns for this domain, one is 
based on the 



Attorney No. 2750-1237P 



853 



Ffam 


Prsslte j I 


Full Nam© 


description 








conserved positions found at the N -terminal extremity of the 
domain, the 

second is located in the C4erminat section. 
Description of pattern(s) and/or profile(s) 

Consensus pattern W-[LlVJ-x{3)-[KRQ]"X-[LiVM]-x{2)-[QH]-x(0 J 2)- 
[LIVMF]- x(6 > 8)-[LIVMF]-x(3 ) 5)-F-[FY]-x(2)-[DENS] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern [HYW]-x(9)-[DENQSTV]-[SA]-x(3}-[FY]- 
[UVM]-x(2)-[AC^-x(2)-[LM]-x(2)-[FY]-G-x-[DENQST]-[LIVMFYS] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Sequences known to belong to this class detected by the profile 
ALL. 

Other sequence(s) detected in SWISS-PROT 7. 

Note this documentation entry is linked to both a signature pattern 
and a profile. As the profile is much more sensitive than the 
pattern, you should use it if you have access to the necessary 
software tools to do so. 
Expert(s) to contact by email 
Rees J. jrees@vax.oxford.ac.uk 

Last update 

November 1997 / Patterns and text revised; profile added. 

References 

[1] 

Rees D.J.G., Ades S.A., Singer S.J., Hynes R.O. 
Nature 347:685-689(1990). 

[ 2] 

Funayama N., Nagafuchi A., Sato N. f Tsukita S., Tsukita S. 
J. Cell Biol. 115:1039-1048(1991). 

[3] 

Takeuchi K., Kawashima A., Nagafuchi A., Tsukita S. 
J. Celi Sci. 107:1921-1928(1994). 


biotinjipoyl 


PDOC00167; 
PDOC00168 


Biotin-requiring enzymes; 
2-oxo acid 
dehydrogenases 
acyltransferase 
component lipoyl binding 


Biotin, which plays a catalytic role in some carboxyl transfer 
reactions, is 

covalently attached, via an amide bond, to a lysine residue in 
enzymes 

requiring this coenzyme [1 ,2,3,4]. Such enzymes are: 

- Pyruvate carboxylase (EC 6.4.1.1). 

- Acetyl-CoA carboxylase (EC 6.4.1 .2). 

- Propionyl-CoA carboxylase (EC 6.4.1 .3). 

- Methylcrotonoyl-CoA carboxylase (EC 6.4.1 .4). 

- Geranoyl-CoA carboxylase (EC 6.4.1 .5). 

- Urea carboxylase (EC 6.3.4.6). 

- Oxaloacetate decarboxylase (EC 4.1 .1 .3). 

- Methylmalonyl-CoA decarboxylase (EC 4.1 .1 .41). 

- Glutaconyl-CoA decarboxylase (EC 4.1 .1 .70). 

- Methyimalonyi-CoA carboxyl-transferase (EC 2.1 .3.1) 
(transcarboxylase) . 

Sequence data reveal that the region around the biocytin 
(biotin-lysine) 

residue is well conserved and can be used as a signature pattern. 
Description of pattern(s) and/or profile(s) 

Consensus pattern [GN]-[DEQTR]-x-[LIVMFY]-x(2)-[LlVM]-x-[AIV]- 
M-K-rLMATI- x(3WLIVMl-x-[SAVl [K is the biotin attachment site! 
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Sequences known to belong to this class detected by the pattern 
\LL 

Dther sequence(s) detected in SWISS-PROT NONE. 

^Jote the domain around the biotin-binding lysine residue is 
3volutionary related to that around the lipoyl-binding lysine residue 

2-oxo acid dehydrogenase acyltransf erases (see 
<PDOC00168>). 
_ast update 

Siovember 1997 / Pattern and text revised. 
References 
1] 

<nowles J.R. 

Ajinu. Rev. Biochem. 58:195-221(1989). 
[2] 

Samois D., Thronton C.G., Murtif V.L., Kumar G.K., Haase F.C., 
Wood H.G. 

J. Biol. Chem. 263:6461-6464(1988). 
[3] 

Goss N.H., Wood H.G. 

Meth. Enzymol. 107:261-278(1984). 

[4] 

Shenoy B.C., Xie Y., Park V.L., Kumar G.K., Beegen H. t Wood 
H.G., Samois D. 

J. Biol. Chem. 267:18407-18412(1992). 

The 2-oxo acid dehydrogenase muitienzyme complexes [1,2] 
from bacteria! and 

eukaryotic sources catalyze the oxidative decarboxylation of 2- 
oxo acids to 

the corresponding acyl-CoA. The three members of this family of 

muitienzyme 

complexes are: 

- Pyruvate dehydrogenase complex (PDC), 

- 2-oxoglutarate dehydrogenase complex (OGDC). 

- Branched-chain 2-oxo acid dehydrogenase complex 
(BCOADC). 

These three complexes share a common architecture: they 
are composed of 

multiple copies of three component enzymes - E1, E2 and E3. 
E1 is a thiamine 

pyrophosphate-dependent 2-oxo acid dehydrogenase, E2 a 
dihydroiipamide 

acyltransf erase, and E3 an FAD-containing dihydroiipamide 
dehydrogenase. 

E2 acyltransf erases have an essential cofactor, lipoic acid, 
which is 

covalently bound via a amide linkage to a lysine group. The E2 
components of 

OGCD and BCOACD bind a single lipoyi group, while those of 
PDC bind either one 

(in yeast and in Bacillus), two (in mammals), or three (in 

Azotobacter and in 

Escherichia coli) lipoyi groups [3]. 

In addition to the E2 components of the three enzymatic 
complexes described 

above, a lipoic acid cofactor is also found in the following proteins: 

- H-protein of the glycine cleavage system (GCS) [4], GCS is a 
muitienzyme 

complex of four protein components, which catalyzes the 
degradation of 

glycine. H protein shuttles the methylamine group of glycine 
from the P 

protein to the T protein. H-protein from either prokaryotes or 
eukaryotes 
binds a single lipoic group. 
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- Mammalian and yeast pyruvate dehydrogenase complexes 
differ from that of 

other sources, in that they contain, in small amounts, a protein 
Df unknown 

function - designated protein X or component X. Its sequence 
s closely 

related to that of E2 subunits and seems to bind a lipoic group 
5]. 

- Fast migrating protein (FMP) (gene acoC) from Alcaligenes 
sutrophus [6]. 

This protein is most probably a dihydrolipamide acyitransf erase 
nvoived in 
acetoin metabolism. 

We developed a signature pattern which allows the detection of 
the lipoyl- 
binding site. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [GN]-x(2)-[LIVF]-x(5)-[LIVFC]-x(2)-[LIVFA]- 
x{3)-K-[STAIV]-[STAVQDN3-x(2)-ELIVMFS3-x(5)-[GCN]-x- 
[LIVMFY] [K is the lipoyl-binding site] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 2. 

Note the domain around the lipoyl-binding lysine residue is 
evolutionary related to that around the biotin-binding lysine 
residue of biotin requiring enzymes (see <PDOC00167>). 
Last update 

November 1995 / Text revised. 
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Biotinsynth 




Biotin synthase 


Accession number: PF01792 

Definition: Biotin synthase 

Author: Bashton M, Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfarn- B_1 407 (release 4.2) 

Gathering cutoffs: -1 80 -1 80 

Trusted cutoffs: -1 76.30 -1 76.30 

Noise cutoffs: -1 83.90 -1 83.90 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 9631 2354 

Reference Title: Cloning, sequencing, and characterization 
of the Bacillus 
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BolA 



BolA-like protein 



Reference Title: 
Reference Author 
CL ; Rahaim P, Pero J 
Reference Location: 
Reference Number: 
Reference Medline: 
Reference Title: 
cloning, 

Reference Title: 
of Methylobaciilus 
Reference Title: 
glutamicum. 
Reference Author: 
YD; 

Reference Location: 
Database Reference 
Database reference: 
Database reference: 
Database reference: 
Comment: 



subtil is biotin biosynthetic operon. 
Bower S, Perkins JB, Yocum RR, Howttt 

J Bacterid 1996;178:4122-4130. 
[2] 

97074643 

Two new members of the bio B superfamily: 
sequencing and expression of bio B genes 
flagel latum and Corynebacterium 
Serebriiskii IG, Vassin VM, Tsygankov 



Gene 1996;175:15-22. 
INTERPRO; IPR002684; 
PFAMB; PB023954 
PFAMB; PB040740 
PFAMB; PB041208. 
Biotin synthase EC:2.8.1 .6 works with 
fiavodoxin, S-adenosyimethionine, 

Comment: and possibly cysteine to convert dethiobiotin 

to biotin [1]. 

Comment: Biotin (vitamin H) is a prosthetic group in 

enzymes catalysing 

Comment: carboxylation and transcarboxyiation 

reactions [2]. 

Number of members: 29 



Accession number: PF01722 
Definition: BolA-like protein 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 
Source of seed members: Pfam-B_1 996 (release 4.1 ) 
Gathering cutoffs: 23 23 
Trusted cutoffs: 23.70 23.70 
Noise cutoffs: -1 6.00 -1 6.00 

HMM build command line: hmrnbuild -F HMM SEED 
HMM build command line: hmmcalibrate -seed 0 HMM 
[1] 

99291046 

The stationary-phase morphogene bolA 



Reference Number: 
Reference Medline: 
Reference Title: 
from Escherichia coli 
Reference Title: 
growth. 

Reference Author: 
CM; 

Reference Location: 
Reference Number: 
Reference Medline: 
Reference Title: 
promoter triggers 
Reference Title: 
morphogene. 
Reference Author: 
Vicente M, Kushner 
Reference Author: 
Reference Location: 
Database Reference 
Comment: 
BolA from 
Comment: 
over expression of 
Comment: 
may be involved in 
Comment: switching the cell between elongation and 

septation systems during 

Comment: cell division [1]. The expression of BolA is 

growth rate regulated 

Comment: and is induced during the transition into the 

the stationary 

Comment: phase [1]. BolA is also induced by stress 

during early stages of 

Comment: growth [1] and may have a general role in 



is induced by stress during early stages of 

Santos JM, Freire P, Vicente M, Arraiano 

Mol Microbiol 1999;32:789-798. 
[2] 

90059998 
Induction of a growth-phase-dependent 

transcription of bolA, an Escherichia coli 

Aidea M, Garrido T, Hernandez-Chico C, 

SR" 

EMBO J 1989;8:3923-3931. 
INTERPRO; IPR002634; 
This family consist of the morpho-protein 

E. coli and its various homologs. In E. coli 

this protein causes round morphology and 
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stress response. 

Comment: It has also been suggested that BolA can 
nduce the transcription 

Comment: of penicillin binding proteins 6 and 5 [2,1]. 
Slumber of members: 1 8 




casein kappa 




j 


Accession number: PF00997 

Definition: Kappa casein 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 298 (release 3.0) 

Gathering cutoffs: -32 -32 

Trusted cutoffs: 1 6.40 1 6.40 

Noise cutoffs: -73.00 -73.00 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1 3 

Reference Medline: 98072500 

Reference Title: Nucleotide sequence evolution at the 
kappa-casein locus: 

Reference Title: evidence for positive selection within the 
family Bovidae. 

Reference Author: Ward TJ, Honeycutt RL, Derr JN; 
Reference Location: Genetics 1 997;1 47: 1 863-1 872. 
Database Reference INTERPRO; IPR0001 17; 
Comment: Kappa-casein is a mammalian milk protein 
involved in a 

Comment: number of important physiological 
processes. In the gut, 

Comment: the ingested protein is split into an insoluble 
peptide 

Comment: {para kappa-casein) and a soluble 

hydrophiiic glycopeptide 

Comment: (caseinornacropeptide). 

Caseinomacropeptide is responsible 

Comment: for increased efficiency of digestion, 

prevention of neonate 

Comment: hypersensitivity to ingested proteins, and 
inhibition of 

Comment: gastric pathogens. 
Number of members: 56 




CAT 


PDOC00093 


Chloramphenicol 
acetyltransferase 


Chloramphenicol acetyhransferase (CAT) (EC 2.3.1.28) [1] 
catalyzes the 

acetyl-CoA dependent acetylation of chloramphenicol (Cm), an 
antibiotic which 

inhibits prokaryotic peptidyltransf erase activity. Acetylation of 
Cm by CAT 

inactivates the antibiotic. A histidine residue, located in the C- 
terminal 

section of the enzyme, plays a central role in its catalytic 
mechanism. We 

derived a signature pattern from the region surrounding this 

active site 

residue. 

Description of pattern(s) and/or profile(s) 

Consensus pattern Q-[LIV]-H-H-[SAl-x(2)-D-G-[FY]-H [The second 
H is the active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note there is a second family of CAT [2], evolutionary unrelated to 
the main family described above, i nese uai oeiong w me; 
bacterial hexapeptide-repeat containing-transf erases family (see 
<PDOC00094>). 
Last update 

November 1 997 / Text revised. 
References 

Ml 
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3hawW.V., Leslie A.G.W. 

\nnu. Rev. Biophys. Chem. 20:363-386(1991). 

2] 

Parent R., Roy P.H. 

J. BactenoL 174:2891-2897(1992). 


Cation_effiux 
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Accession number: PF01545 

Definition: Cation efflux family 

Author: Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfam-B_232 (release 4.0) 

Gathering cutoffs: -6 -6 

Trusted cutoffs: 6.90 6.90 

Noise cutoffs: -1 9.30 -1 9.30 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98361 887 

Reference Title: Molecular characterization of a 

chromosomal determinant 

Reference Title: conferring resistance to zinc and cobalt ions 
n 

Reference Title: Staphylococcus aureus. 
Reference Author: Xiong A, Jayaswal RK; 
Reference Location: J Bacteriol 1 998;1 80.4024-4029. 
Reference Number: [2] 
Reference Medline: 9621 9090 

Reference Title: Cloning and sequence analysis of czc 

genes in Alcaligenes 

Reference Title: sp. strain CT1 4. 

Reference Author: Kunito T, Kusano T, Oyaizu H, Senoo K, 
Kanazawa S, 

Reference Author: Matsumoto S; 

Reference Location: Biosci Biotechnol Biochem 1996;60:699- 
704. 

Database Reference INTERPRO; IPR002524; 
Database reference: PFAMB; PB038216; 
Comment: Members of this family are integral 
membrane proteins, that 

Comment: are found to increase tolerance to divalent 
metal ions such 

Comment: as cadmium, zinc, and cobalt. These 
proteins are thought to 

Comment: be efflux pumps that remove these ions from 
cells. 

Number of members: 59 


CBD_6 




Cellulose binding domain 


Accession number: PF02018 

Definition: Cellulose binding domain 

Author: Bateman A 

Alignment method of seed: Manual 

Source of seed members: Chris Ponting 

Gathering cutoffs: 19 0 

Trusted cutoffs: 1 9.1 0 19.1 0 

Noise cutoffs: 8.90 8.90 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Mediine: 97074498 

Reference Title: Structure of the N-terminal cellulose-binding 
domain of 

Reference Title: Cellulomonas fimi CenC determined by 
nuclear magnetic 

Reference Title: resonance spectroscopy. 

Reference Author: Johnson PE, Joshi MD, Tomme P, Kilbum 

DG, Mcintosh LP; 

Reference Location: Biochemistry 1 996;35:14381 -1 4394. 
Database Reference: URL; 

http://www.ocms.ox.ac.uk/~ponting/methmb/exampie.html; 
Database Reference: SCOP; 1 ulp; fa; [SCOP-USA] [CATH- 
PDBSUM] 

Database Reference PDB; 1 uio ; 1 ; 1 49; 
Database Reference PDB; 1 ulp ; 1 ; 1 49; 
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Database Reference PDB; 1cx1 A; 2; 6; 
Database Reference PDB; 1 ulo ; 1 50, 1 52; 
Database Reference PDB; 1 ulp ; 1 50; 1 52; 
Database Reference PDB; 1cx1 A; 7; 151 ; 
Database reference: PFAMB; PB012497; 
Database reference: PFAMB; PB041237; 
Database reference: PFAMB; PB041605; 
Number of members: 76 


CBFD NFYB HMF 


PDOC00578 


CBF/NF-Y subunits 
signatures 


Diverse DNA binding proteins are known to bind the CCAAT 
box, a common cis- 

acting element found in the promoter and enhancer regions of a 
large number of 

genes in eukaryotes. Amongst these proteins is one known as 
the CCAAT-binding 

factor (CBF) or NF-Y [1]. CBF is a heteromeric transcription 
factor that 

consists of two different components both needed for DNA- 
binding. 

The HAP protein complex of yeast binds to the upstream 
activation site of 

cytochrome C iso-1 gene (CYC1) as well as other genes 
involved in 

mitochondrial electron transport and activates their expression. 
It also 

recognizes the sequence CCAAT and is structurally and 

evolutionary related to 

CBF. 

The first subuntt of CBF, known as CBF-A or NF-YB in 
vertebrates, HAP3 in 

budding yeast and as php3 in fission yeast, is a protein of 1 16 to 
210amino- 

acid residues which contains a highly conserved central domain 
of about 90 

residues. This domain seems to be involved in DNA-binding; we 
have developed a 

signature pattern from its central part. 

The second subuntt of CBF, known as CBF-B or NF-YA in 
vertebrates, HAP2 in 

budding yeast and php2 in fission yeast, is a protein of 265 to 350 
amino-acid 

residues which contains a highly conserved region of about 60 
residues. This 

region, called the 'essential core 1 [2] , seems to consist of two 
subdomains: 

an N-terminal subunit-association domain and a C-terminal 
DNA recognition 

domain. We have developed a signature pattern from a section 
of the subunit- 
association domain. 

Description of pattern (s) and/or profile(s) 

Consensus pattern C-V-S-E-x-l-S-F-[LIVMl-T-[SG]-E-A-[SC]-[DE]- 
[KRQJ-C 

Sequences known to belong to this class detected by the pattern 
ALL CBF-A subunits. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern Y-V-N-A-K-Q-Y-x-R-l-L-K-R-R-x-A-R-A-K-L-E 
Sequences known to belong to this class detected by the pattern 
ALL CBF-B subunits. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1995 / Patterns and text revised. 

References 
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Benoist C, Math is D. 
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CbiX 




CbiX 


Accession number: PF01903 
Definition: CbiX 

Author: En right A, Ouzounis C, Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Enright A 

Gathering cutoffs: -25 -25 

Trusted cutoffs: -23.1 0 -23.1 0 

Noise cutoffs: -35.10-35.10 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1 ] 

Reference Medline: 9841 61 26 

Reference Title: Cobalamin (vitamin B12) biosynthesis: 
identification and 

Reference Title: characterization of a Bacillus megaterium 
cobl operon. 

Reference Author: Raux E, Lanois A, Warren MJ, Rambach 
A, Thermes C; 

Reference Location: Biochem J 1 998;335:1 59-1 66. 

Database Reference INTERPRO; IPR002762; 

Database reference: PFAMB; PB040604; 

Database reference: PFAMB; PB04061 0; 

Database reference: PFAMB; PB041 575; 

Comment: The function of CbiX is uncertain, however it 

is found 

Comment: in cobalamin biosynthesis operons and so 
may have a 

Comment: related function. Some CbiX proteins 
contain a striking 

Comment: histidine-rich region at their C-terminus, 
which suggests 

Comment- that it might be involved in metal chelation 
[1]- 

Number of members: 6 


cellulase 


PDOC00565 


Glycosy! hydrolases 
family 5 signature 


The microbial degradation of cellulose and xylans requires 
several types of 

enzymes such as endoglucanases (EC 3.2.1 .4), 
cellobiohydrolases (EC 3.2.1 91) 

(exogiucanases), or xytanases (EC 3.2.1 .8) [1 f 2]. Fungi and 
bacteria produces 

a spectrum of cellulolytic enzymes (celiulases) and xylanases 
which, on the 

basis of sequence similarities, can be classified into families. One 
of these 

families is known as the cellulase family A [3] or as the glycosy! 
hydrolases 

family 5 [4,E1]. The enzymes which are currently known to 

belong to this 

family are listed below. 

- Endoglucanases from various species and strains of Bacillus. 

- Butyrivibrio fibrisolvens endoglucanases 1 (endl) and A (celA). 

- Caldocellum saccharolyticum bifunctional 
endoglucanase/exoglucanase (celB). 

This protein consists of two domains; it is the C-terminal 
domain, which 
has endoglucanase activity, which belongs to this family. 

- Clostridium acetobutyiicum endoglucanase (eglA). 

- Clostridium celluloiyticum endoglucanases A (celccA) and D 
(celccD). 

- Clostridium celluiovorans endoglucanase B (engB) and D 
(engD). 

- Clostridium thermocellum endoglucanases B (celB), C (ceIC), 
E(celE), G 

(ceIG) and H (celH). 

- Erwinia chrysanthemi endoglucanase Z (celZ). 

- Fibrobacter succinogenes endoglucanase 3 (eel -3). 

- Pseudomonas fluorescens endoglucanase C (ceIC). 
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Pseudomonas solanacearum endoglucanase (egi). 

- Robiiiarda strain Y-20 endoglucanase I. 

- Ruminococcus albus endoglucanases I (EG-I), A (celA), and B 
(celB). 

- Ruminococcus fiavefaciens cellodextrinase A (celA). 

- Ruminococcus fiavefaciens endoglucanase E (celE). 

- Streptomyces lividans endoglucanase. 

- Thermomonospora fusca endoglucanase E-5 (celE). 

- Trichoderma reesei endoglucanase II (EG LI I). 

- Xanthomonas campestris endoglucanase (engxcA). 

As well as: 

- Baker's yeast glucan 1 ,3-beta-giucosidase l/l I (EC 3.2.1 .58) 
(EXG1). 

- Baker's yeast glucan 1 ,3-beta-glucosidase 2 (EC 3.2.1 .58) 
(EXG2). 

- Baker's yeast sporuiation-specific glucan 1 ,3-beta-glucosidase 
(SPR1). 

- Caidocellum saccharolyticum beta-mannanase (EC 3.2.1.78) 
(manA). 

Yeast hypothetical protein YBR056w. 
Yeast hypothetical protein YIR007w. 

One of the conserved regions in these enzymes contains a 
conserved glutamic 

acid residue which is potentially involved [5] in the catalytic 
mechanism. 

We use this region as a signature pattern. 



Description of pattern(s) and/or profiie(s) 

Consensus pattern [LIV]-[LlVMFYWGA](2)-[DNEQG}-[UVMGST|- 
x-N-E-[PV]- [RHDNSTLIVFY] [E is a putative active site residue] 
Sequences known to belong to this class detected by the pattern 
ALL, except for Robiiiarda Y-20 endoglucanase I whose sequence 
is known to be incorrect and yeast YBR056w. 
Other sequence(s) detected in SWISS-PROT 22. 
Expert (s) to contact by email 
Henrissat B. bernie@afmb.cnrs-mrs.fr 

Last update 

November 1997 / Pattern and text revised. 
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CH 



PDOC00019 



Actinin-type actin-binding 
domain signatures 



[E1] 

http://www.expasy.ch/cgi-bin/lists7glycosid.txt 



Alpha-actinin is a F-actin cross-linking protein which is thought 
to anchor 

actin to a variety of intracellular structures [11 The actin-binding 
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domain 

at aipha-actinin seems to reside in the first 250 residues of the 
protein. A 

similar actin-bindmg domain has been found in the N-terminal 
region of many 

different actin-binding proteins [2,3]: 

- In the beta chain of spectrin (or fodrin). 

- In dystrophin, the protein defective in Duchenne muscular 
dystrophy (DMD) 

and which may play a role in anchoring the cytoskeleton to 
the plasma 
membrane. 

- In the slime mold gelation factor (or ABP-120). 

- In actin-binding protein ABP-280 (or filamin), a protein that link 
actin 

filaments to membrane glycoproteins. 

- Infimbrin (or plastin), an actin-bundling protein. Fimbrin differs 
from 

the above proteins in that it contains two tandem copies of 

the actin- 
binding domain and that these copies are located in the C- 

termina! part of 
the protein. 

We selected two conserved regions as signature patterns for 
this type of 

domain. The first of this region is located at the beginning of the 
domain, 

while the second one is located in the central section and has 
been shown to 

be essential for the binding of actin. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [EQ]-x{2)-[ATV]-[FY]~x(2)-W-x-N 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 25. 

Consensus pattern [LIVM]-x-[SGN]-ELIVM]-[DAGHE]-[SAG]-x- 
[DNEAG]-[LIVM]-x-[DEAG]-x(4)-[LIVM3-x-[LM]-[SAG]-[L!VM]- 
[LIVMTl-W-x- [LIVM]{2) 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Patterns and text revised. 

References 
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chitinase_2 


PDOC00839 


Chitinases family 18 
active site 


Chitinases {EC 3.2.1 .14) [1] are enzymes that catalyze the 
hydrolysis of the 

beta-1 ,4-N-acetyl-D-giucosamine linkages in chitin polymers. 
From the view 

point of sequence similarity chitinases belong to either family 18 
or 1 9 in 

the classification of glycosyl hydrolases [2 f E1]. Chitinases of 
family 1 8 

(also known as classes ill or V) groups a variety of proteins: 
a) Chitinases from: 
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Prokaryotes such as Aiteromonas, Bacillus, Serratia, 
Streptomyces, etc. 
Plants such as Arabidopsis, cucumber, bean, tobacco, etc. 
Fungi such as Aphanocladium, Rhizopus, Saccharomyces, etc. 
Nematode (Brugia malayi). 
Insects (Manduca sexta). 

Baculoviruses (Autographa Californica Nuclear Polyhedrosis 
virus). 



b) Other proteins: 

- Hevamine, a rubber tree protein with chitinase and lysozyme 
activities. 

- Kluyveromyces lactis killer toxin alpha subunit, which acts as a 
chitinase. 

- Flavobacterium and Streptomyces endo-beta-N- 
acetylglucosaminidases (EC 3.2. 

1.96). 

- Mammalian di-N-acetylchitobiase which is involved in the 
degradation of 

asparagine-linked glycoproteins. 

- Human cartilage glycoprotein Gp-39. 

- Jack bean concanavalin B (conB), a protein that has lost its 
catalytic 

activity. 

Site directed mutagenesis experiments [3] and crystallographic 
data [4,5] have 

shown that a conserved glutamate is involved in the catalytic 
mechanism and 

probably acts as a proton donor. This glutamate is at the 
extremity of the 

best conserved region in these proteins. 



Description of pattern(s) and/or profile(s) 

Consensus pattern [LIVMFY]-[DN]-G-PVMF]-[DN]-[LIVMF]-[DN]- 
x-E [E is the active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL, except for conB which has a Gin instead of the active site 
Glu. 

Other sequence(s) detected in SWISS-PROT NONE. 

Expert(s) to contact by email 

Neuhaus J.-M. jean-marc.neuhaus@bota.unine.ch 

Henrissat B. bernie@afmb.cnrs-mrs.fr 

Last update 

November 1997 / Text revised. 
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K.S., VorgiasC.E. 
Structure 2:1169-1180(1994). 



van Scheltinga A.C.T., Kaik K.H., Beintema J.J., Dijkstra B.W. 
Structure 2:1 1 81 -11 89(1994). . 
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ittp://www.expasy.ch/cgi-bi/hsts?glycosid.txt 


Choline kinase 




Choiine/ethanolamine t 
kinase 


Accession number: PF01633 

Definition: Choiine/ethanolamine kinase 

Author: Bateman A 

Alignment method of seed: Clustatw 

Source of seed members: Pfam-B_1 1 65 (release 4. 1 ) 

3athering cutoffs: 25 25 

Trusted cutoffs: 242.90 242.90 

Noise cutoffs: -85.90 -85.90 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 981 75949 

Reference Title: Expression, purification, and 

characterization of choline 

Reference Title: kinase, product of the CKI gene from 
Saccharomyces 

Reference Title: cerevisiae. 

Reference Author: Kim KH, Voelker DR, Flocco MT, Carman 
GM; 

Reference Location: J Biol Chem 1998;273:6844-6852. 
Database Reference INTERPRO; IPR002573; 
Comment: Choline kinase catalyses the committed 
step in the synthesis of 

Comment: phosphatidylcholine by the CDP-choiine 
pathway [1]. 

Number of members: 22 


Chorion 




Chorion protein 


Accession number: PF01723 

Definition: Chorion protein 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pf am-B_1 91 4 (release 4. 1 ) 

Gathering cutoffs: -46 -46 

Trusted cutoffs: -43.70 -43.70 

Noise cutoffs: -49.00 -49.00 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 953331 94 

Reference Title: Sequence analysis of a small early chorion 
gene subfamily 

Reference Title: interspersed within the late gene locus in 
Bombyx mori. 

Reference Author: Kravariti L, Lecanidou R, Rodakis GC; 
Reference Location: J Mof Evol 1 995;41 : 24-33. 
Reference Number: [2] 
Reference Medline: 8631 3609 

Reference Title: Evolution of the silk moth chorion gene 
superfamity: gene 

Reference Title: families CA and CB. 

Reference Author: Lecanidou R, Rodakis GC, Eickbush TH, 

Kafatos FC; 

Reference Location: Proc Nat! Acad Set U S A 1 986;83:651 4- 
6518. 

Database Reference INTERPRO, IPR002635; 
Database reference: PFAMB; PB009425; 
Comment: This family consists of the chorion 
superfamily proteins 

Comment: classes A, B, CA, CB and high-cysteine 
HCB from silk, 

Comment: gypsy and poiyphemus moths. 
Comment: The chorion proteins make up the moths 
egg shell a complex 

Comment: extracellular structure [2]. 
Number of members: 35 


Chorismatemut 




Chorismate mutase 


Accession number: PF01817 
Definition: Chorismate mutase 
Author: Bateman A 
Alignment method of seed: Manual 
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Source of seed members: PSI-BLAST 1 ecm 

fathering cutoffs: 5 5 

rrusted cutoffs: 5.10 5.10 

^Joise cutoffs: -1 9.90 -1 9.90 

HMM build command line: hmmbuild HMM SEED 

HMM build command iine: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 950621 55 

Reference Title: The crystal structure of allosteric 

chorismate mutase at 

Reference Title: 2.2-A resolution. 

Reference Author: Xue Y, Lipscomb WN, Graf R, 

Schnappauf G, Braus G; 

Reference Location: Proc Natl Acad Sci U S A 
1994;91:10814-10818. 
Reference Number: [2] 
Reference Medline: 98307941 

Reference Title: Tyrosine and tryptophan act through the 
same binding site 

Reference Title: at the dimer interface of yeast chorismate 
mutase. 

Reference Author: Schnappauf G, Krappmann S, Braus GH; 

Reference Location: J Biol Chern 1998;273:1 7012-1701 7. 

Reference Number: [3] 

Reference Medline: 98165805 

Reference Title: Chorismate mutase-prephenate 

dehydratase from Escherichia 

Reference Title: coli. Study of catalytic and regulatory 
domains using 

Reference Title: genetically engineered proteins. 
Reference Author: Zhang S, Pohnert G, Kongsaeree P, 
Wilson DB, Clardy J, 
Reference Author: Ganem B; 

Reference Location: J Biol Chem 1 998;273:6248-6253. 
Database Reference: SCOP; 1 csm; fa; [SCOP-USA][CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR002701 ; 

Database Reference PDB; 1ecm B; 6; 89; 

Database Reference PDB; 1ecm A; 5; 89; 

Database Reference PDB; 1 csm A; 1 33; 1 62; 

Database Reference PDB; 3csm A; 133; 243; 

Database Reference PDB; 3csm B; 133; 243; 

Database Reference PDB; 4csm A; 133; 243; 

Database Reference PDB; 4csm B; 133; 243; 

Database Reference PDB; 5csm A; 1 33; 243; 

Database Reference PDB; 2csm A; 133; 246; 

Comment: Chorismate mutase EC:5A99.5 catalyses 

the conversion of 

Comment: chorismate to prephenate in the pathway of 
tyrosine and 

Comment: phenylalanine biosynthesis. This enzyme is 
negatively 

Comment: regulated by tyrosine, tryptophan and 
phenylalanine [2,3]. 
Number of members: 28 








CNhydrolase 


PDOC00712; 
PDOC00943 


Nitrilases / cyanide 
hydratase signatures; 
Uncharacterized protein 
family UPF0012 
signature 


Nitrilases (EC 3.5.5.1) are enzymes that convert nitrites into 
their 

corresponding acids and ammonia. They are widespread in 
microbes as well as in 

plants where they convert indole-3-acetonitrile to the hormone 
indole-3- 

acetic acid. A conserved cysteine has been shown [1 ,2] to be 
essential for 

enzyme activity; it seems to be involved in a nucleophiiic 
attack on the 
nitrile carbon atom. 

Cyanide hydratase (EC 4.2.1 .66) converts HCN to formamide. In 
phytopathogenic 

fungi, it is used to avoid the toxic effect of cyanide released by 
wounded 

plants [3]. The sequence of cyanide hydrolase is evolutionary 
related to that 
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of nitrilases. 

Yeast hypothetical proteins YIL164C and YIL165C also belong to 
this family. 

As signature patterns for these enzymes, we selected two 
conserved regions. 

The first is located in the N-terminai section while the 
second, which 

contains the active site cysteine, is located in the central section. 



Description of pattern (s) and/or profile(s) 

Consensus pattern G-x(2)-[LIVMFY](2)~x-[IF]-x-E-x(2)-[LIVM]-x-G- 
Y-P 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern G-[GAQ]-x(2)-C-[WA]-E-[NH]-x{2)-[PST]- 

[LIVMFYS]-x-[KR] [C is the active site residue] 

Sequences known to belong to this class detected by the pattern 

ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1995 / Patterns and text revised. 

References 

[1] 

Kobayashi M., izui H., NagasawaT., Yamada H. 
Proc. Natl. Acad. Sci. U.S.A. 90:247-251(1993). 



Kobayashi M., Komeda H., Yanaka N., Nagasawa T. s Yamada H. 
J. Biol. Chem. 267:20746-20751(1992). 

[3] 

Wang P., Vanetten H.D. 

Biochem. Biophys. Res. Commun. 187:1048-1054(1992). 

The following uncharacterized proteins have been shown [1] to 

share regions of 

similarities: 

- Yeast chromosome X hypothetical protein YJL126w. 

- Yeast chromosome Xli hypothetical protein YLR351 c. 

- Fission yeast hypothetical protein SpAC26A3.1 1 . 

- Escherichia coli hypothetical protein ybeM. 

- Bacillus subtilis hypothetical protein yhcX. 

- Mycobacterium tuberculosis hypothetical protein 
MtCY20G9.06c. 

- Synechocystis strain PCC 6803 hypothetical protein sl!0601 . 

- A Pseudomonas fluorescens hypothetical protein in pqqF 
5'region. 

- A Staphylococcus hypothetical protein in agr operon. 

Except for yhcX which is larger, these are protein of about 30 to 
35 Kd which 

contain, in their central section, a well conserved region 
centered on a 
cysteine residue. 



Description of pattern(s) and/or profile(s) 

Consensus pattern [GTA]-x(2)-[IVT]-C-Y-D-[LIVM]-x-F-P-x(9)-G 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / First entry. 

References 
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3orA-like Mg2+ > 
ransporter protein [ 

/ 

( 


Accession number: PF01 544 

Definition: CorA-like Mg2+ transporter protein 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_944 (release 4.0) 

3athering cutoffs- -62 -62 

rrusted cutoffs: -5.90 -5.90 

Noise cutoffs: -86.20 -86.20 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 9844851 2 

Reference Title: The CorA magnesium transporter gene 
family. 

Reference Author: Kehres DG, Lawyer CH, Maguire ME; 

Reference Location: Microb Comp Genomics 1998;3:151-169. 

Reference Number: [2] 

Reference Medline: 99003207 

Reference Title: The CorA Mg2+ transport protein of 

Salmonella typhimurium. 

Reference Title: Mutagenesis of conserved residues in the 
third membrane 

R e f erence Title: domain identifies a Mg2+ pore. 
Reference Author: Smith RL, Szegedy MA, Kucharski LM, 
Walker C, Wiet RM, 

Reference Author: Redpath A, Kaczmarek MT, Maguire ME; 

Reference Location: J Bioi Chem 1 998;273:28663-28669. 

Database Reference INTERPRO; IPR002523; 

Database reference: PFAMB; PB041399; 

Comment: The CorA transport system is the primary 

Mg2+ influx system of Salmonella 

Comment: typhimurium and Escherichia coll CorA is 
virtually ubiquitous in the 

Comment: Bacteria and Archaea. There are also 
eukaryotic relatives of this protein 
Number of members: 25 


Cysknot 


PDOC00234 


Glycoprotein hormones 
beta chain signatures 


Glycoprotein hormones [1,2] (or gonadotropins) are a family of 
proteins which 

include the mammalian hormones foll'rtropin (FSH), lutropin 
(LSH), thyrotropin 

(TSH) and chorionic gonadotropin (CG), as well as at least two 
forms of fish 

gonadotropins. All these hormones consist of two glycosylated 
chains (alpha 

and beta). In mammalian gonadotropins, the alpha chain is 
identical in the 

four types of hormones but the beta chains, while homologous, 
are different. 

The beta chains are proteins of about 100 to 140 amino acid 
residues which 

contain twelve conserved cysteines all involved in disulfide 
bonds [3], as 

shown in the following schematic representation. 
+ + 

| + 1 -+ 

| H 1 + t 

j | j **** J j AAAAAAAAAA&AA-A-A 

xxxCxxxxxxxCxCxxCxCxxxxxxxCxxxxxxxxCxxxxxxxCxCxCxxCxx 
xxxCxxxxxxxxxxx 

11 ! 1 t t 

It 1 1 1 1 

11 II +-+ 

H - + 1 

'C: conserved cysteine involved in a disulfide bond. 
'*': position of the patterns. 
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cytochrome_b_C 



PDOC00171 



Cytochrome b/b6 
signatures 



We have developed two patterns for these hormones. The first 
one, located in 

the N-terminal section, is a region which has been said to be 
involved in the 

association between the two chains of the hormones. The 
second pattern 

consists of a cluster of five conserved cysteines in the C-terminal 
section. 



Description of pattern(s) and/or profile(s) 

Consensus pattern C-[STAGM]-G-[HFYL]-C-x-[ST] The two C's 
are involved in disulfide bonds] 

Sequences known to belong to this class detected by the pattern 
ALL, except for rat beta-FSH which has Glu in position 2 of the 
pattern. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern [PAl-V-A-x(2)-C-x-C-x(2)-C-x{4)-[STD]-[DEY]- 
C-x(6,8)- [PGSTAVM]-x(2)-C [The five C's are involved in disulfide 
bonds] 

Sequences known to belong to this class detected by the pattern 
ALL, except for 5 sequences. 

Other sequence(s) detected in SWISS-PROT NONE. 
Expert{s) to contact by email 
Lapthorn A. adrian@chem.gla.ac.uk 

Last update 

July 1998 / Patterns and text revised. 

References 

[1] 

Pierce J.G., Parsons T.F. 

Annu. Rev. Biochem. 50:465-495(1981). 

[2] 

Stockeil Hartree A., Renwick A.G.C. 
Biochem. J. 287:665-679(1992). 

[3] 

Lapthorn A.J., Harris D.C., Littlejohn A., Lustbader J.W., Canfield 
R.E., Machin K.J., Morgan F.J., Isaacs N.W. 
Nature 369:455-461 (1994). 



In the mitochondrion of eukaryotes and in aerobic prokaryotes, 
cytochrome b is 

a component of respiratory chain complex 111 (EC 1 .10.2.2) - also 
known as the 

bd complex or ubiquinol-cytochrome c reductase. In plant 
chloroplasts and 

cyanobacteria, there is a analogous protein, cytochrome b6, a 
component of the 

plastoquinone-plastocyanin reductase (EC 1.10.99.1), also 

known as the b6f 

complex. 

Cytochrome b/b6 [1 ,2] is an integral membrane protein of 
approximately 400 

amino acid residues that probably has 8 transmembrane 
segments. In plants and 

cyanobacteria, cytochrome b6 consists of two subunits 
encoded by the petB 

and petD genes. The sequence of petB is colinear with the N- 
terminal part of 

mitochondrial cytochrome b, while petD corresponds to the C- 
terminal part. 

Cytochrome b/b6 non-covalently binds two heme groups, known 
as b562 and b566. 

Four conserved histidine residues are postulated to be the 
ligands of the 

iron atoms of these two heme groups. 
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\part from regions around some of the histidine heme ligands, 
here are a few 

;onserved regions in the sequence of b/b6. The best conserved of 
hese regions 

ncludes an invariant P-E-W triplet which lies in the loop that 
separates the 

ifth and sixth transmembrane segments. It seems to be important 
or electron 

ransfer at the ubiquinone redox site - called Qz or Qo (where o 
stands for 

outside) - located on the outer side of the membrane. 

\ schematic representation of the structure of cytochrome b/b6 is 
shown below. 

+— Fe-b562— -+ 
t +— Fe-b566--|-+ 

II II 
<xxxxxxxxxxHxHxx>o(xxxxxxxxHxHxxxxxxx 

KXXXX 

< Cytochrome-b - > 

Cytochrome-b6-petB ><-Cytochrome-b6-petD 

> 

We developed two signature patterns for cytochrome b/b6. The 
first includes 

the first conserved histidine of b/b6 t which is a heme b562 ligand; 
the second 

includes the conserved PEW triplet- 
Description of pattern(s) and/or profile(s) 

Consensus pattern [DENQ]-x(3)-G-[FYWMQ]-x-[LIVMF]-R-x(2)-H 
[H is a heme b562 ligand] 

Sequences known to belong to this class detected by the pattern 

ALL, except for 5 sequences. 

Other sequence(s) detected in SWISS-PROT 15. 

Consensus pattern P-[DE]-W-[FY]-[LFY](2) 

Sequences known to belong to this class detected by the pattern 

ALL, except for Odocoileus hemionus (mule deer) and 

Paramecium tetraurelia cytochrome b. 

Other sequence(s) detected in SWISS-PROT 1 . 

Last update 

November 1995 / Patterns and text revised. 

References 

[1] 

Howell N. 

J. Mol. Evol. 29:157-169(1989). 
[2] 

Esposti M.D., de Vries S., Crimi M., Ghelli A., Patarnello T., Meyer 
A. 

Biochim. Biophys. Acta 1 143:243-271 (1993). 


cytochrome b_N 


PDOC00171 


Cytochrome b/b6 
signatures 


In the mitochondrion of eukaryotes and in aerobic prokaryotes, 
cytochrome b is 

a component of respiratory chain complex III (EC 1.10.2.2) - also 
known as the 

bd complex or ubiquinol-cytochrome c reductase. In plant 
chloroplasts and 

cyanobacteria, there is a analogous protein, cytochrome b6, a 
component of the 

piastoquinone-plastocyanin reductase (EC 1.10.99.1), also 

known as the b6f 

complex. 

Cytochrome b/b6 [1 ,2] is an integral membrane protein of 
approximately 400 

amino acid residues that probably has 8 transmembrane 
seqments. In plants and 
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jyanobactena, cytochrome b6 consists of two subunits 
encoded by the petB 

md petD genes. The sequence of petB is colinear with the N- 
erminat part of 

nitochondriai cytochrome b, while petD corresponds to the C- 
erminai part. 

Cytochrome b/b6 non-covalentiy binds two heme groups, known 
as b562 and b566. 

r our conserved histidine residues are postulated to be the 
igands of the 

ron atoms of these two heme groups. 

^part from regions around some of the histidine heme ligands, 
here are a few 

Donserved regions in the sequence of b/b6. The best conserved of 
hese regions 

ncludes an invariant P-E-W triplet which lies in the loop that 
separates the 

fifth and sixth transmembrane segments. It seems to be 
mportant for electron 

transfer at the ubiquinone redox site - called Qz or Qo {where o 
stands for 

outside) - located on the outer side of the membrane. 

A schematic representation of the structure of cytochrome b/b6 is 
shown below. 

+— Fe-b562 — + 
| +— Fe-b566~|-+ 
It II 

xxxxxxxxxxxHxH>c<xxxxxxxxxxHxHxxxxxxxxxxPEW 
xxxxx 

< — Cytochrome-b6-petB ><-Cytochrome-b6-petD- — 

> 

We developed two signature patterns for cytochrome b/b6. The 
first includes 

the first conserved histidine of b/b6, which is a heme b562 tigand; 
the second 

includes the conserved PEW triplet. 
Description of pattern(s) and/or profile(s) 

Consensus pattern [DENQ]-x(3)-G-[FYWMQ]-x4LiVMF]-R-x(2)-H 
[H is a heme b562 iigand] 

Sequences known to belong to this class detected by the pattern 

ALL, except for 5 sequences. 

Other sequence(s) detected in SWISS-PROT 15. 

Consensus pattern P-[DE]-W-[FY]-[LFY](2) 

Sequences known to belong to this class detected by the pattern 

ALL, except for Odocoileus hemionus (mule deer) and 

Paramecium tetraurelia cytochrome b. 

Other sequence(s) detected in SWISS-PROT 1 . 

Last update 

November 1995 / Patterns and text revised. 

References 

[1] 

Howeli N. 

J. Mol. Evol. 29:157-169(1989). 
[2] 

Esposti M.D., de Vries S., Crimi M., Ghelli A., Patarnello T., Meye 
A. 

Biochim. Biophys. Acta 1143:243-271(1993). 


cytochrome c 


PDOC00169 


Cytochrome c family 
heme-binding site 
signature 


In proteins belonging to cytochrome c family [1], the heme group 
is covalently 

attached by thioether bonds to two conserved cysteine residues. 
The consensus 
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equence for this site is Cys-X-X-Cys-His and the histidine 
esidue is one of 

he two axial ligands of the heme iron. This arrangement is 
shared by all 

aroteins known to belong to cytochrome c family, which 
>resently includes 

cytochromes c, c 1 , d to c6, c550 to c556, cc3/Hmc, cytochrome f 

ind reaction 

center cytochrome c. 

description of pattern(s) and/or profile(s) 

Consensus pattern C-{CPWHF}-{CPWR}-C-H-{CFYW} 
Sequences known to belong to this class detected by the pattern 
^LL, except for four cytochrome c's which lack the first thioether 
bond. 

Other sequence(s) detected in SWISS-PROT 454. 

Note: some cytochrome c's have more than a single bound heme 
group c4 has 2, c7 has 3, c3 has 4, the reaction center has 4, and 
cc3/Hmc has 16 ! 
Last update 

June 1992 / Text revised. 

References 

Ml 

Mathews F.S. 

Prog. Biophys. Moi. BioL 45:1-56(1985). 


DAHP_synth_2 




Ciass-tl DAHP 
synthetase family 


Members of this family are aldolase enzymes that catalyse the 
first step of the shikimate pathway. 

These polypeptides can be useful in the synthesis of aromatic 
compounds, such as amino acids, antibiotics, secondary 
metabolites, etc. Such synthesis can occur either in vitro or in 
vivo. 


Dala Dala ligas 




D-ala D-ala ligase 


Accession number: PF01820 

Definition: D-ala D-ala ligase 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: PSI-BLAST 2dln 

Gathering cutoffs: 25 25 

Trusted cutoffs: 44.90 26.60 

Noise cutoffs: 21.50 18.90 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number. [1] 

Reference Medline: 97207065 

Reference Title: D-aianine:D-alanine ligase: phosphonate 
and phosphinate 

Reference Title: intermediates with wild type and the Y21 6F 
mutant. 

Reference Author: Fan C, Park IS, Walsh CT, Knox JR; 
Reference Location: Biochemistry 1 997;36:2531 -2538. 
Database Reference: SCOP; 2dln; fa; [SCOP-USA][CATH- 
PDBSUM] 

Database Reference INTERPRO; 1PR000291 ; 

Database Reference PDB; 1 iov ; 3; 303; 

Database Reference PDB; 1 iow ; 3; 303; 

Database Reference PDB; 2d In ; 3; 303; 

Comment: This family contains D-alanine-D-alanine 

ligase enzymes EC:6.3.2.4. 

Number of members: 80 


DHPS 


PDOC00630 


Dihydropteroate 
synthase signatures 


All organisms require reduced folate cofactors for the synthesis of 
a variety 

of metabolites. Most microorganisms must synthesize folate de 
novo because 

they lack the active transport system of higher vertebrate cells 
which allows 

these organisms to use dietary folates. Enzymes that are 
involved in the 

biosynthesis of folates are therefore the target of a variety of 
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3escriptton 

intimicrobial 

igents such as trimethoprim or sulfonamides. 

Dihydropteroate synthase (EC 2.5.1.15) (DHPS) catalyzes the 
condensation of 

3~hydroxymethyl-7,8-d!hydropteridine pyrophosphate to para- 
aminobenzoic acid 

o form 7,8-dihydropteroate. This is the second step in the 
hree steps 

pathway leading from 6-hydroxymethyl-7,8-dihydropterin to 7,8- 
jihydrofolate. 

DHPS is the target of sulfonamides which are substrates analog 
hat compete 

with para-am inobenzoic acid. 

3acterial DHPS (gene sul or folP) [1] is a protein of about 275 to 
315 amino 

acid residues which is either chromosomal ly encoded or 
found on various 

antibiotic resistance piasrnids. In the lower eukaryote 
Pneumocystis carinii, 

DHPS is the C-terminai domain of a multifunctional folate 
synthesis enzyme 
(gene fas) [2]. 

We developed two signature patterns for DHPS, the first 
signature is located 

in the N-terminal section of these enzymes, while the second 
signature is 

located in the central section. 
Description of pattern (s) and/or profile(s) 

Consensus pattern [LIVM]-x-[AGHUVMF](2)-N-x-T-x-D-S-F-x-D- 
x-[SG] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern [GE]-[SA]-x-[UVM](2)-D-[UVM]-G-|GP]-x(2)- 
[STA]-x-P 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Patterns and text revised. 

References 

[1] 

Slock J., Stahly DP., Han C.-Y., Six E.W., Crawford l.P. 
J. Bacteriol. 172:7211-7226(1990). 

[2] 

Volpes F., Dyer M., Scaife J.G., Darby G., Stammers D.K., Delves 
C.J 

Gene 112:213-218(1992). 


DHquinase 1 


PDOC00788 


Dehydroquinase class i 
active site 


3-dehydroquinate dehydratase (EC 4.2.1 .10), or 
dehydroquinase, catalyzes the 

conversion of 3-dehydroquinate into 3-dehydroshikimate. it is the 
third step 

in the shikimate pathway for the biosynthesis of aromatic amino 
acids from 

chorismate. Two classes of dehydroquinases exist, known as 
types I and II. The 

best studied type I enzyme is from Escherichia coli (gene aroD) 

dllU [Kldltru 

bacteria where it is a homodimeric protein of a chain of about 250 
residues. 

In fungi, dehydroquinase is part of a multifunctional enzyme 
which catalyzes 

five consecutive steps in the shikimate pathway. In aroD, it has 
been shown 
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1]that a histidine is involved in the catalytic mechanism; we 
jsed the 

eg ion around this residue as a signature pattern. 
Description of pattern(s) and/or profile(s) 

Consensus pattern D-[LIVM]-[DE]-[UVMN]-x(18,20)-[LlVM](2)-x- 

SC]-[NHY]-H- [DN] [H is the active site residue] 

Sequences known to belong to this class detected by the pattern 

ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1999 / Pattern and text revised. 
References 

Deka R.K., Kleanthous C, Coggins J.R. 
J. Biol. Chem. 267:22237-22242(1992). 


Diphthamide__syn 




Putative diphthamide 
synthesis protein 


Accession number: PF01 866 

Definition: Putative diphthamide synthesis protein 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 44.90 44.90 

Noise cutoffs: -1 74.70 -1 74.70 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 961 831 1 2 

Reference Title: A cDNA from the ovarian cancer critical 
region of deletion 

Reference Title: on chromosome 17p1 3.3. 
Reference Author: Phillips NJ, Zeigler MR, Deaven LL; 
Reference Location: Cancer Lett 1996;102:85-90. 
Reference Number: [2] 
Reference Medline: 9401 0339 

Reference Title: Diphthamide synthesis in Saccharomyces 
cerevisiae: 

Reference Title: structure of the DPH2 gene. 
Reference Author: Mattheakis LC, Sor F, Collier RJ; 
Reference Location: Gene 1993;132:1 49-154. 
Database Reference INTERPRO; IPR002728; 
Comment: Swiss:Q1 6439 is a candidate tumour 
suppressor gene [1]. DPH2 from 

Comment: yeast Swiss:P32461 [2], which confers 
resistance to diphtheria toxin 

Comment: has been found to be involved in 
diphthamide synthesis. Diphtheria 

Comment: toxin inhibits eukaryotic protein synthesis by 
ADP-ribosylating 

Comment: diphthamide, a posttranslationatly modified 
histidine residue present 

Comment: in EF2. The exact function of the members 
of this family is 

Comment: unknown. 
Number of members: 1 2 


Disintegrin 

- 


PDOC00351 


Disintegrins signature 


Disintegrins [1 ,2] are snake venom proteins which inhibit 
fibrinogen 

interaction with platelet receptors expressed on the glycoprotein 
lib-Ella 

complex. They act by binding to the integrin glycoprotein iib-llla 
receptor on 

the platelet surface and inhibit aggregation induced by ADP, 
thrombin, 

platelet-activating factor and collagen. 

Disintegrins are peptides of about 70 amino acid residues that 
contain many 

cysteines all involved in disulfide bonds [3]. Disintegrins contain 
an Arg- 
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aly-Asp (RGD) sequence, a recognition site of many adhesion 
jroteins. The RGD 

sequence of disintegrins is postulated to interact with the 
jlycoprotein 11b- 
lla complex. 

rhe sequences of disintegrins from different snake species are 
cnown. These 

>roteins are known as: albolabrin, applagin, barbourin, 
jatroxostatin, 

jitistatin, echistatin, elegantin, eristicophin, flavoridin, halysin, 
dstrin, 

ergeminin and triflavin. 

Some other proteins are known to contain a disintegrin domain: 

- Some snake venom zinc metalioproteinases [4] consist of an 
M-terminai 

catalytic domain fused to a disintegrin domain. Such is the 
oase for 

trimerelysin 1 (HR1B), atrolysin e (Ht-e) and trigramtn. It has 
been 

suggested that these proteinases are able to cleave 
themselves from the 

disintegrin domains and that the latter may arise from such 
a post- 

transiational processing. 

- The beta-subunit of guinea pig sperm surface protein PH30 [5]. 
PH30 is a 

protein involved in sperm -egg fusion. The beta subunit 
contains a 
disintegrin at the N-terminal extremity. 

- Mammalian epididymial protein 1 (EAP I) [6]. EAP I is 
associated with the 

sperm membrane and may play a role in sperm maturation. 
Structurally, EAP i 

consists of an N-terminai domain, followed by a zinc 
metalioproteinase 

domain, a disintegrin domain, and a large C-terminal domain that 
contains a 

transmembrane region. 

The schematic representation of the structure of atypical 
disintegrin is 
shown below: 

+--- + 

+ + +— H -+ 

i I Ml t 

xxxxxCxCxxxxxxCCxxxxCxxxxxxxCxxxxCCxxCxxxxxxxxCxxxRGD 
xxxxxCxxxxxxCxxxxxxx 

J I | A*AAAA|AAAAAAAAAAA|* | 

+ + + + + + 

'C: conserved cysteine involved in a disulfide bond. 
'*': position of the pattern. 

As a signature pattern for disintegrins, we selected a 
conserved central 

region that contains five of the cysteines involved in disulfide 
bonds. 

Description of pattern(s) and/or profile(s) 

Consensus pattern C-x(2)-G-x-C-C-x-[NQRS]-C-x-[FM]-x(6)-C- 
[RKj 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1 992 / Pattern and text revised. 
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DLH 




Dienelactone hydrolase 
family 


Accession number: PF01738 

Definition: Dienelactone hydrolase family 

Author: Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfam-B_757 (release 4.2) 

Gathering cutoffs: 15 0 

Trusted cutoffs: 1 5.60 3. 1 0 

Noise cutoffs: 14.40 14.40 

HMM build command Sine: hmmbuild -f HMM SEED 

HMM build command tine: hmmcaiibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 90339491 

Reference Title: Refined structure of dienelactone hydrolase 
at 1 .8 A. 

Reference Author: Pathak D, ONis D; 
Reference Location: J Mol Biol 1 990;21 4:497-525. 
Database Reference: SCOP; 1 din; fa; [SCOP-USA] [CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR002925; 
Database Reference PDB; 1 din ; 1 6; 232; 
Database reference: PFAMB; PB004640; 
Database reference: PFAMB; PB041 1 31 ; 
Database reference: PFAMB; PB041 469; 
Number of members: 42 


DNA__mis_repair 


PDOC00057 


DNA mismatch repair 
proteins mutL / hexB / 
PMS1 signature 


Mismatch repair contributes to the overall fidelity of DNA 
replication [1]. It 

involves the correction of mismatched base pairs that have been 
missed by the 

proofreading element of the DNA polymerase complex. The 
sequence of some 

proteins involved in mismatch repair in different organisms have 
been found to 

be evolutionary related. These proteins are: 

- Escherichia coli and Salmonella typhimurium mutL protein [2]. 
MutL is 

required for dam-dependent methyl-directed DNA repair. 

- Streptococcus pneumoniae hexB protein [3]. The Hex system is 
nick directed. 

- Yeast proteins PMS1 and MLH1 [4]. 

- Human protein MLH1 [5] which is involved in a form of familial 
hereditary 

nonpolyposis colon cancer (HNPCC). 
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As a signature pattern for this ciass of mismatch repair proteins 
we selected 

a perfectly conserved heptapeptide which is located in the N- 
terminal section 
of these proteins. 

Description of pattern(s) and/or profile(s) 
Consensus pattern G-F-R-G-E-A-L 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWiSS-PROT NONE. 
Last update 

November 1995 / Pattern and text revised. 

References 

[1] 

Modrich P. 

Annu. Rev. Biochern. 56:435-466(1987). 
[21 

Mankovich J.A., Mclntyre C.A., Walker G.C. 
J. Bacteriol. 171:5325-5331(1989). 

[3] 

Prudhomme M., Martin B., Mejean V. ( Claverys J. -P. 
J. Bacteriol. 171:5332-5338(1989). 

[4] 

Prolla T.A., Christie D., Liskay R.M. 
Mol. Cell. Biol. 14:407-415(1994). 

[5] 

Bronner C.E., Baker S.M., Morrison P.T., Warren G., Smith LG., 
Lescoe M.K., Kane M., Earibino C, Lipford J., Linblom A. 5 
Tannergard P., Bollag R.J., Godwin A.R., Ward D.C., 
Nordenskjold M., Fishel R., Kolodner R.D., Liskay R.M. 
Nature 368:258-261 (1994). 


DNA primase S 




DNA primase smail 
subunit 


Accession number: PF01896 

Definition: DNA primase small subunrt 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 98.40 1 98.40 

Noise cutoffs: -120.80 -120.80 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 91219475 

Reference Title: Mutations in conserved yeast DNA primase 

domains impair DNA 

Reference Title: replication in vivo. 

Reference Author: Frartcesconi S, Longhese MP, Piseri A, 

Santocanale C, 

Reference Author: Lucchini G, Plevani P; 

Reference Location: Proc Natl Acad Sci U S A 1 991 ; 88:3877- 

3881. 

Database Reference INTERPRO; IPR002755; 

Comment: DNA primase synthesizes the RNA primers 

for the Okazaki 

Comment: fragments in lagging strand DNA synthesis. 
DNA primase 

Comment: is a heterodimer of large and smail subunits. 
Number of members: 14 


DnaB 




DnaB-like heiicase 


Members of this family are comprise DNA replication enzymes 
which unwind the helix. Generally, such polypeptide are ATPases 
which move at the replication fork, disrupting hydrogen bonds. 
Such proteins are use for DNA replication in vivo and/or in vitro. 


DnaJ C 


| DnaJ C terminal region 


Accession number: PF01556 
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Definition: DnaJ C terminal region 

Author: Bashton M , Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pfam -B 342 (release 4.0) 

Gathering cutoffs: -24 -24 

Trusted cutoffs: -22.60 -22.60 

No/se cutoffs: -25.50 -25.50 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1 ] 

Reference Medline: 98308847 

Reference Title: The J -domain family and the recruitment of 

chaperone power. 

Reference Author: Kelley WL; 

Reference Location: Trends Biochem Sci 1998;23:222-227. 

Database Reference INTERPRO; IPR002939; 

Database reference: PFAMB; PB01 3976; 

Comment: This family consists of the C terminal region 

form the DnaJ 

Comment: protein. Although the function of this region 
is unknown, it 

Comment: is always found associated with DnaJ and 
DnaJ_CXXCXGXG. 

Comment: DnaJ is a chaperone associated with the 
Hsp70 heat-shock 

Comment: system involved in protein folding and 
renaturation after stress. 
Number of members: 116 


DnaJ_CXXCXGXG 


PDOC00553 


dnaJ domains signatures 
and profile 


The prokaryotic heat shock protein dnaJ interacts with the 
chaperone hsp70- 

like dnaK protein [1]. Structurally, the dnaJ protein consists of 
an N- 

termmat conserved domain (called 'J' domain) of about 70 
amino acids, a 

glycine-rich region ('G' domain') of about 30 residues, a central 
domain 

containing four repeats of a CXXCXGXG motif ( l CRR f domain) 
and a C-termtnal 

region of 120 to 170 residues. Such a structure is shown in the 
following 

schematic representation: 

+ +-+ -+ + + + 

| N-terminal | | Giy-R [ | CXXCXGXG | C-terminaf 

i 


+ + _ + + + + + 

It has been shown [2] that the 'J' domain as well as the 'CRR' 
domain are also 

found in other prokaryotic and eukaryotic proteins which are listed 
below. 

a) Proteins containing both a 'J' and a 'CRR' domain: 

- Yeast protein MAS5/YDJ1 which seems to be involved in 
mitochondrial protein 

import. 

-Yeast protein MDJ1, involved in mitochondrial biogenesis 
and protein 
folding. 

- Yeast protein SCJ1 , involved in protein sorting. 

- Yeast protein XDJ1 . 

- Plants dnaJ homologs (from leek and cucumber). 

- Human HDJ2, a dnaJ homolog of unknown function. 

- Yeast hypothetical protein YNL077W. 

b) Proteins containing a 'J' domain without a 'CRR' domain: 

- Rhizobium fredii nolC, a protein involved in cultivar-specific 
nodulation 

of soybean. 

- Escherichia colt cbpA [3], a protein that binds curved DNA. 
-Yeast protein SEC63/NPL1, important for protein assembly 
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into the 

endoplasmic reticulum and the nucleus. 

- Yeast protein S1S1 , required for nuclear migration during 
mitosis. 

- Yeast protein CAJ1. 

- Yeast hypothetical protein YFR041 c. 

- Yeast hypothetical protein YIR004w. 

- Yeast hypothetical protein YJL162c. 

- Plasmodium falciparum ring-infected erythrocyte surface 
antigen (RESA). 

RES A, whose function is not known, is associated with the 
membrane skeleton 
of newly invaded erythrocytes. 

- Human HDJ1. 

- Human HSJ1 , a neuronal protein. 

- Drosophila cysteine-string protein (csp). 

We developed a signature pattern for the 'J' domain, based 
on conserved 

positions in the C-terminai half of this domain. We also 
developed a pattern 

for the 'CRR 1 domain, based on the first two copies of that motif . 
We also 

developed a profile for the 'J 1 domain. 



Description of pattern(s) and/or profite(s) 

Consensus pattern [FY]-x(2)^[LIVMA]-x(3)-[FYWHNTi-[DENQSA]- 
x-L-x-[DN]-x(3)- [KRJ-x{2)-[FY!j 

Sequences known to belong to this class detected by the pattern 
ALL 

Other sequence(s) detected in SWISS-PROT 5. 

Consensus pattern C-[DEGSTHKR]-x-C-x-G-x-[GK]-[AGSDMl- 
x(2)-[GSNKR]-x(4,6)-C- x(2,3)-C-x-G-x-G 

Sequences known to belong to this class detected by the pattern 

ALL, except for yeast XDJ1 . 

Other sequence(s) detected in SWISS-PROT 8. 

Sequences known to belong to this class detected by the profile 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note this documentation entry is linked to both a signature pattern 

and a profile. As the profile is much more sensitive than the 

pattern, you should use it if you have access to the necessary 

software tools to do so. 

Expert(s) to contact by email 

Keliey W. ke!ley@medecine.unige.ch 

Last update 

July 1998 / Patterns and text revised. 

References 

[13 

Cyr D.M., LangerT., Douglas M.G. 
Trends Biochem. Sci. 19:176-181(1994). 

Bork P., Sander C, Valencia A., Bukau B. 
Trends Biochem. Sci. 17:129-129(1992). 

[3] 

Ueguchi C, Kaneda M , Yamada H., MizunoT. 
Proc. Natl. Acad. Sci. U.S.A. 91:1054-1058(1994). 



Deoxynucleoside kinase 



Accession number: PF01712 

Definition: Deoxynucleoside kinase 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 744 (release 4.1 ) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 47.50 47.50 ___ 
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Moise cutoffs: -5.40 -5.40 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcaiibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97236800 

Reference Title: Cloning of the cDNA and chromosome 
ocalization of the gene 

Reference Title: for human thymidine kinase 2. 

Reference Author: Johansson M, Karlsson A; 

Reference Location: J Biol Chem 1997;272:8454-8458. 

Reference Number: [2] 

Reference Medline: 9629351 1 

Reference Title: Cloning and expression of human 

deoxyguanosine kinase cDNA. 

Reference Author: Johansson M, Karlsson A; 

Reference Location: Proc Natl Acad Sci U S A 1 996;93:7258- 

7262. 

Database Reference INTERPRO; IPR002624; 
Comment: This family consists of various 
deoxynucleoside kinases 

Comment: cytidine EC:2.7.1 .74, guanosine 
EC:2.7.1 .1 1 3, adenosine EC:2.7.1 .76 

Comment: and thymidine kinase EC:2.7.1 .21 (which 
also phosphorylates deoxyuridine 

Comment: and deoxycytosine.) These enzymes 
catalyse the production of 

Comment: deoxynucleotide ^-monophosphate from a 
deoxynucleoside. 

Comment: Using ATP and yielding ADP in the process. 
Number of members: 20 


DSL 




Delta serrate ligand 


Accession number: PF01414 

Definition: Delta serrate ligand 

Author: Ponting CP, Schultz J, Bork P 

Alignment method of seed: Manual 

Source of seed members: SMART 

Gathering cutoffs: 25 25 

Trusted cutoffs: 43.00 43.00 

Noise cutoffs: 3.40 3.40 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcaiibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 961 251 68 

Reference Title: Interchangeably of Caenorhabditis elegans 
DSL proteins 

Reference Title: and intrinsic signalling activity of their 
extracellular 

Reference Title: domains in vivo. 
Reference Author: Fitzgerald K, Greenwald I; 
Reference Location: Development 1 995; 1 21 :4275-4282. 
Reference Number: [2] 
Reference Medline: 92034990 

Reference Title: Specific EGF repeats of Notch mediate 
interactions with 

Reference Title: Delta and Serrate: implications for Notch as 
a 

Reference Title: multifunctional receptor. 

Reference Author: Rebay I, Fleming RJ, Fehon RG, Cherbas 

L, Cherbas P, 

Reference Author: Artavanis-Tsakonas S; 
Reference Location: Cell 1991 ;67:687-699. 
Reference Number: [3] 
Reference Medline: 95232495 
Reference Title: Notch signaling. 

Reference Author: Artavanis-Tsakonas S, Matsuno K, Fortini 
ME; 

Reference Location: Science 1 995;268:225-232. 
Database reference: bMAK i , UbL, 
Database Reference INTERPRO; IPR001774; 
Number of members: 30 




DUF125 




integral membrane 
protein DUF125 


Accession number: PF01988 

Definition: Integral membrane protein DUF1 25 
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Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: -60 -60 

Trusted cutoffs: -57.90 -57.90 

Noise cutoffs: -64.60 -64.60 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 95028150 

Reference Title: Sequence, mapping and disruption of 
CCC1 , a gene that 

Reference Title: cross-complements the Ca(2+)-sensitive 

phenotype of csgl 

Reference Title: mutants. 

Reference Author: Fu D, Beeler T, Dunn T; 

Reference Location: Yeast 1 994; 1 0:51 5-521 . 

Database Reference INTERPRO; IPR002839; 

Comment: This family of predicted integral membrane 

proteins has no known 

Comment: function. However it does include 
Swiss:P47818, that may have a 

Comment: role in regulating calcium levels [1]. 
Number of members: 7 


DUF25 




Domain of unknown 
function DUF25 


Accession number: PF01641 

Definition: Domain of unknown function DUF25 

Author: Bateman A, Enwright A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 539 (release 4.1 ) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 51 .80 1 51 .80 

Noise cutoffs: 10.60 10.60 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line; hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 20076492 

Reference Title: Novel selenoproteins identified in silico and 
in vivo by 

Reference Title: using a conserved RNA structural motif. 
Reference Author: Lescure A, Gautheret D, Carbon P, Kro\ 
A; 

Reference Location: J Biol Chem 1 999;274:381 47-381 54. 
Database Reference INTERPRO; IPR002579; 
Comment: This domain has no known function, it is 
found associated 

Comment: with the peptide methionine sulfoxide 
reductase enzymatic 

Comment: domain PMSR. The domain has two 
conserved cysteine 

Comment: and histidines that could suggest and zinc 
binding site. 

Comment: The final cysteine is found to be replaced by 
the rare amino 

Comment: acid selenocysteine in some members of 
the family [1]. 

Number of members: 26 


DUF26 




Domain of unknown 
function DUF26 


Accession number: PF01 657 

Definition: Domain of unknown function DUF26 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_980 (release 4.1 ) 

Gathering cutoffs: -8 -8 

Trusted cutoffs: 6.50 1 .40 

Noise cutoffs: -1 7.50 -1 7.50 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference INTERPRO; IPR002902; 

Database reference: PFAMB; PB005223; 

Comment: This domain has no known function. It is 

found in serine/threonine 

Comment: kinases, associated with the Eukaryotic 
protein kinase domain 
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Comment: pkinase. In the 33kDa secretary protein 
Swiss:082551 

Comment: this domain is duplicated. The domain 
contains four conserved 
Comment: cysteines. 
Number of members: 25 


DUF89 






Accession number: PF01937 

Definition: Protein of unknown function DUF89 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Cfustaiw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 636.30 636.30 

Noise cutoffs: -1 42.40 -142.40 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference INTERPRO; 1PR002791 ; 

Comment: This prokaryotic family has no known 

Function. The protein 

Comment: has two closely spaced conserved cysteines 
at its N 

Comment: terminus and a single conserved cysteine at 

its C terminus. 

Number of members: 5 


DUF90 






Accession number: PF01938 

Definition: Domain of unknown function DUF90 

Author: Enright A, Ouzounis C r Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 0 

Trusted cutoffs: 78.90 10.20 

Noise cutoffs: -0.60 -0.60 

HMM build command line: hmmbuiid HMM SEED 

HMM build command line: hmmcaiibrate -seed 0 HMM 

Database Reference INTERPRO; IPR002792; 

Comment: This small domain has no known function. 

However it 

Comment: may perform a nucleic acid binding role 
(Bateman A. 

Comment: unpublished observation). 
Number of members: 1 7 


Dyneinjight 


PDOC00953 


Dynein light chain type 1 
signature 


Dynein is a multisubunit microtubuie-dependent motor enzyme 
that acts as the 

force generating protein of eukaryotic cilia and flageita. The 
cytoplasmic 

isoform of dynein acts as a motor for the intracellular retrograde 
motility of 

vesicles and organelles along microtubules. Dynein is composed 
of a number of 

ATP-binding large subunits, intermediate size subunits and small . 
subunits. 

Among the smalt subunits, there is a family [1 ,2] of highly 
conserved proteins 
which consist of: 

- Chlamydomonas reinhardtii flagellar outer arm dynein 8 Kd and 
11 Kd light 

chains. 

- Higher eukaryotes cytoplasmic dynein light chain 1 . 

- Yeast cytoplasmic dynein light chain 1 (gene DYN2 or SLC1). 

- Caenorhabditis elegans hypothetical dynein light chains M18.2 
and T26A5.9. 

These proteins are have from 89 to 120 amino acids. As a 
signature pattern, 

we selected a highly conserved region. 
Description of pattern(s) and/or profile(s) 
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)escription 

Consensus pattern H-x-i-x-G-[KR]-x-F-[GA]-S-x-V-[STl-[HY]-E 
Sequences known to belong to this class detected by the pattern 
\LL. 

Dther sequence(s) detected in SWiSS-PROT NONE. 
_ast update 

slovember 1997 / First entry. 
References 

<ing S.M., Patel-Ktng R.S. 

J. Biol. Chem. 270:11445-11452(1995). 

Dick T., Ray K., Saiz H.K., Chia W. 
Viol. Cell. Biol. 16:1966-1977(1996). 


elF5„elF2B 




Domain found in 
F2B/IF5 


Accession number: PF01873 

Definition: Domain found in IF2B/IF5 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 233.00 233.00 

Noise cutoffs: -56.10 -56.10 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcatibrate -seed 0 HMM 

Reference Number: [1] 

Reference Mediine: 96060092 

Reference Title: Multidornain organization of eukaryotic 
guanine nucleotide 

Reference Title: exchange translation initiation factor elF-2B 
subunits 

Reference Title: revealed by analysis of conserved 
sequence motifs. 

Reference Author: Koonin EV; 

Reference Location: Protein Set 1 995;4:1 608-1 61 7. 

Database Reference INTERPRO; IPR002735; 

Comment: This family includes the N terminus of elF-5 

Swiss:P55010, and 

Comment: the C terminus of elF-2 beta Swiss:P20042. 
This region 

Comment: corresponds to the whole of the 
archaebacterial elF-2 beta 

Comment: homolog. The region contains a putative 
zinc binding C4 finger. 
Number of members: 20 


elF6 




e!F-6 family 


This family comprises members exhibiting sequence identity to the 
eukaryotic translation initiation factor 6. Some members of this 
family are implicated in protein biosynthesis as a translation 
initiation factor by binding to the 60s ribosomat subunit and 
preventing its association with the 40s ribosomal subunit to form 
the 80s initiation complex. Such activity can play a roie in maximal 
polysome formation and plays an important role in determining 
free 60s ribosomal subunit content. Polypeptides in this family 
can optimize amino acid and nitrogen content in a desired cell or 
organism. References describing eif6 family members and their 
biological activities include, for example, the following: Adams et 
al., Science 87:2185-2195(2000); Wood et al., J. Biol. Chem. 
274:11653-11659(1999); and Si etai., Mol. Cell. Biol. 19:1416- 
1426(1999). 


ER 


PDOC00992 


Enhancer of rudimentary 
signature 


The Drosophila protein 'enhancer of rudimentary' (gene (e(r)) 
is a small 

protein of 104 residues whose function is not yet clear. From an 
evolutionary 

point of view, it is highly conserved [1] and has been found to 
exist in 

probably al! multicellular eukaryotic organisms. It has been 
proposed that 

this protein plays a role in the ceil cycle. 

As as signaure pattern, we selected a conserved region in the 
central part of 
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the protein. 

Description of pattern (s) and/or profile(s) 

Consensus pattern Y-D-i-[SA]-x-L-[FY]-x-F-[)V]-D~x(3)-D-ILJV]-S 
Sequences known to belong to this class detected by the pattern 
ALL 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / First entry. 

References 

[1] 

Gelsthorpe M., Pulumati M., McCallum C, Dang-Vu K, Tsubota 
S.I. 

Gene 186:189-195(1997). 


ERJumenjrecept 


PDOC00732 


ER lumen protein 
retaining receptor 
signatures 


Proteins that reside in the lumen of the endoplasmic reticulum 
(ER) contain a 

C-terminal tetrapeptide (generally K-D-E-L or H-D-E-L) that 
serves as a signal 

for their retrieval (retrograde transport) from subsequent 
compartments of the 

secretory pathway. The signal is recognized by a receptor 
molecule that is 

believed to cycle between the cis side of the Golgi apparatus and 
the ER [1]. 

This protein is known as the ER lumen protein retaining receptor 
or also as 

the 'KDEL receptor'. It has been characterized in a variety of 
species, 

including fungi (gene ERD2), plants, Plasmodium, Drosophila 
and mammals. In 

mammals two highly related forms of the receptor are known. 

Structurally, the receptor is a protein of about 220 residues that 
seems to 

contain seven transmembrane regions [2]. The N-terminal part (3 
residues) is 

oriented toward the fumen while the C-terminal tail (about 12 
residues) is 

cytoplasmic. There are three fumenai and three cytoplasmic 
loops. 

We developed two signature patterns for these receptors. The 
first pattern 

corresponds to the C-terminal half of the first cytoplasmic loop 
as well as 

most of the second transmembrane domain. The second 
pattern is a perfectly 

conserved decapeptide that corresponds to the central part of 
the fifth 

transmembrane domain. 

Description of pattern (s) and/or profiie(s) 

Consensus pattern G-[LIV]-S-x-[KRI-x-IQH]-x-L-[FY]-x-[UVI<2)- 
[FYW]-x(2)-R- Y 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern L-E-[SA]-V-A-I-[LM]-P-Q-[LI] 

Sequences known to belong to this class detected by the pattern 

ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1999 / Patterns and text revised. 

References 

Ml 

Pelham H.R.B. 
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Curr. Opin. Cell Biol. 3:585-591(1991). 
[2] 

Townsley F.M., Wilson D.W M Pelham H.R.B. 
EMBO J. 12:2821-2829(1993). 


ETF alpha 


PDOC00583 


Electron transfer 
flavoprotein alpha- 
subunit signature 


The electron transfer flavoprotein (ETF) [1 ,2] serves as a specific 
electron 

acceptor for various mitochondrial dehydrogenases. ETF 
transfers electrons to 

the main respiratory chain via ETF-ubiquinone oxidoreductase. 
ETF is an 

heterodimer that consist of an alpha and a beta subunit and 
which bind one 

molecule of FAD per dimer. A similar system also exists in some 
bacteria. 

The alpha subunit of ETF is a protein of about 32 Kd which is 
structurally 

related to the bacteria! nitrogen fixation protein fixB which could 
play a 

role in a redox process and feed electrons to ferredoxin. 
Other related proteins are: 

- Escherichia coli hypothetical protein ydiR. 

- Escherichia coli hypothetical protein ygcQ. 

As a signature pattern for these proteins we selected a highly 
conserved 

region which is located in the C-terminal section. 
Description of pattern (s) and/or profile(s) 

Consensus pattern [Ll]-Y-[LIVM]-[AT]-x-G-[iV]-[SD}-G~x-nV]-Q~H- 
x(2)-G-x(6)- [IV]-x-A-[lV]-N 

Sequences known to belong to this class detected by the pattern 
ALL, except for ygcQ. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

July 1 998 / Text revised. 

References 

[1] 

Finocchiaro G. ; Ikeda Y., Ito M., Tanaka K. 
Prog. Clin. Biol. Res. 321:637-652(1990). 

[2] 

Tsai M.H., SaierM.H. Jr. 

Res. Microbiol. 146:397-404(1995). 


Eukjporin 


PDOC00483 


Eukaryotic mitochondrial 
porin signature 


The major protein of the outer mitochondrial membrane of 
eukaryotes is a 

porin that forms a voltage-dependent anion-selective channel 
(VDAC) that 

behaves as a general diffusion pore for small hydrophitic 
molecules [1 to 4]. 

The channel adopts an open conformation at low or zero 
membrane potential and 

a closed conformation at potentials above 30-40 mV. 

This protein contains about 280 amino acids and its sequence is 
composed of 

between 12 to 16 beta-strands that span the mitochondrial 
outer membrane. 

Yeast contains two members of this family (genes POR1 and 
POR2); vertebrates 

have at least three members (genes VDAC1 , VDAC2 and 
VDAC3) [5]. 

As a signature pattern we selected a conserved region 

located at the C- 

terminal part of these proteins. 
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Description of pattern(s) and/or profile(s) 

Consensus pattern [YH]-x(2)-D-[SPCAD]-x-[STAI-x(3)-[TAG3- 
[KRMLIVMF]- [DNSTA]-[DNS]-x(4)-[GSTAN]-[LIVMA]-x-[LIVMY] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWiSS-PROT NONE. 
Last update 

July 1999 / Pattern and text revised- 
References 
[1] 

Benz R. 

Biochim. Biophys. Acta 1197:167-196(1994). 
[2] 

Manella C.A. 

Trends Biochem. Sci. 17:315-320(1992). 
[3] 

Dihanich M. 

Experientia 46:146-153(1990). 
[4] 

Forte M., Guy H.R., Mannefla C.A. 

J. Bioenerg. Biomembr. 19:341-350(1987). 

[5] 

Sampson M J., Loveil R.S., Davison D.B., Craigen W.J. 
Genomics 36:192-196(1996). 


F_bP_aidolase 


PDOC00523 


Fructose-bisphosphate 
aldolase class- H 
signatures 


Fructose-bisphosphate aldolase (EC 4.1.2.13) [1 ,2] is a glycolytic 
enzyme that 

catalyzes the reversible aldoi cleavage or condensation of 
fructose-1 ,6- 

bisphosphate into dihydroxyacetone-phosphate and 
glyceraldehyde 3-phosphate. 

There are two classes of fructose-bisphosphate aldolases with 
different 

catalytic mechanisms. Class-ll aldolases [2], mainly found in 
prokaryotes and 

fungi, are homodimeric enzymes which require a divalent metal 

ion - generally 

zinc - for their activity. 

This family also includes the following proteins: 

- Escherichia coli galactitol operon protein gatY which 
catalyzes the 

transformation of tagatose 1 ,6-bisphosphate into glycerone 
phosphate and D- 
giyceraldehyde 3-phosphate. 

- Escherichia coli N-acetyl gaiactosamine operon protein agaY 
which catalyzes 

the same reaction as that of gatY. 

As signature patterns for this class of enzyme, we selected two 
conserved 

regions. The first pattern is located in the first half of the 
sequence and 

contains two histidine residues that have been shown [4] to be 
involved in 

binding a zinc ion. The second is located in the C-terminal 
section and 

contains clustered acidic residues and glycines. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [FYVMT]-x(1 ,3)~[LIVMH]-[APNT]-[L1VM]- 
x(1 ,2MLIVMl-H-x-D- H-fGACH] (The two H's are zinc ligands] 
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Sequences known to belong to this class detected by the pattern 
ALL, except for Mycoplasma pneumoniae aldolase. 
Other sequence(s) detected in SW1SS-PROT NONE. 

Consensus pattern [LIVM]-E-x-E-[LIVM]-G-x(2HGMHGSTA]-x-E 
Sequences known to belong to this ciass detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1999 / Pattern and text revised. 

References 

[13 

Perham R.N. 

Biochem. Soc. Trans. 18:185-187(1990). 
[2] 

Marsh J.J. , Lebherz H.G. 

Trends Biochem. Sci. 17:110-113(1992). 

[3] 

von der Osten C.H., Barbas C.F. Ill, Wong C.-H., Sinskey AJ. 
Mof. Microbiol. 3:1625-1637(1989). 

C4] 

Berry A., Marshall K.E. 
FEBS Lett. 318:11-16(1993). 


FAAhydrolase 




Fumarylacetoacetate 
{FAA) hydrolase family 


Accession number: PF01557 

Definition: Fumarylacetoacetate (FAA) hydrolase family 

Author: Bashton M, Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfan>B_641 (release 4.0) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 42.1 0 42.1 0 

Noise cutoffs: -93.1 0 -93.1 0 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: nmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Mediine: 97255958 

Reference Title: Mutations in the fumarylacetoacetate 
hydrolase gene causing 

Reference Title" hereditary tyrosinemia type I: overview. 

Reference Author: St-Louis M, Tanguay RM; 

Reference Location: Hum Mutat 1 997;9:291 -299. 

Reference Number: [2] 

Reference Medline: 96125235 

Reference Title: Molecular characterization of the 4- 

hyd roxyphenylacetate 

Reference Title: catabolic pathway of Escherichia coli W: 
engineering a 

Reference Title: mobile aromatic degradative cluster. 
Reference Author: Prieto MA, Diaz E, Garcia JL; 
Reference Location: J Bacteriol 1996;178:1 1 1-120. 
Reference Number: [3] 
Reference Medline: 9601 61 23 

Reference Title: Fungal metabolic model for human type i 
hereditary 

Reference Titfe: tyrosrnaemia. 

Reference Author: Fernandez-Canon JM, Penalva MA; 
Reference Location: Proc Nati Acad Sci USA 1995;92:9132- 
9136. 

Reference Number: [4] 
Reference Medline: 94039092 

Reference Title: Purification, nucleotide sequence and some 
properties of a 

Reference Title: bifunctional isomerase/decarboxylase from 
the 

Reference Title: homoprotocatechuate degradative pathway 

of Escherichia coli 

Reference Title: C. 

Reference Author: Roper Dl, Cooper RA; 

Reference Location: Eur J Biochem 1 993;21 7:575-580. 

Database reference: MIM; 276700; 

Database Reference INTERPRO; IPR002529; 
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Comment: This family consists of fumarylacetoacetate 
FAA) hydrolase, 

Comment: or fumarylacetoacetate hydrolase {FAH} and 
t aiso includes 

Comment: HHDD isomerase/OPET decarboxylase 
rom E. coli strain W, 

Comment: FAA is the last enzyme in the tyrosine 
satabolic pathway, it hydrolyses 

Comment: fumarylacetoacetate into fumarate and 
acetoacetate which then join the 

Comment: citric acid cycle [1]. Mutations in FAA cause 
ype I tyrosinemia in humans 

Comment: this is an inherited disorder mainly affecting 
the liver leading to 

Comment: liver cirrhosis, hetpatocellular carcinoma, 
renal tubular damages and 

Comment- neurologic crises amongst other symptoms 

"1], The enzymatic defect causes 

Comment: the toxic accumulation of 

phenylalanine/tyrosine catabolites [3]. 

Comment: The E, coli W enzyme HHDD 

isomerase/OPET decarboxylase contains two 

Comment: copies of this domain and functions in fourth 

and fifth steps of the 

Comment: homoprotocatechuate pathway; 
Comment: here it decarboxylates OPET to HHDD and 
isomerizes this to OH ED. 

Comment: The final products of this pathway are 
pyruvic acid and succinic 
Comment: semiaidehyde. 
Number of members: 33 


FADbinding 




FAD binding domain 


Accession number: PF00667 

Definition: FAD binding domain 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B J 80 (release 2.1 ) 

Gathering cutoffs: 1 6.8 1 6.8 

Trusted cutoffs: 24.60 16.80 

Noise cutoffs: 13.50 15.90 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hrrimcatibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 95386502 

Reference Title: The flavin reductase activity of the 

fiavoprotein component 

Reference Title: of sulfite reductase from Escherichia coli. A 
new model for 

Reference Title: the protein structure. 

Reference Author: Eschenbrenner M, Coves J, Forrtecave M; 

Reference Location: J Biol Chem 1995;270:20550-20555. 

Reference Number: [2] 

Reference Mediine: 96049560 

Reference Title: NADPH-sulfite reductase fiavoprotein from 
Escherichia coli: 

Reference Title: contribution to the flavin content and 
subunit interaction. 

Reference Author: Eschenbrenner M, Coves J, Fontecave M; 
Reference Location: FEBS Lett 1 995;374.82-84. 
Reference Number: [31 
Reference Medline: 94360001 

Reference Title: Dissection of NADPH-cytochrome P450 
oxidoreductase into 

Reference Title: distinct functional domains. 
Reference Author: Smith GC, Tew DG, Wolf CR; 
Reference Location: Proc Natl Acad Sci U S A 1 994;91 :871 0- 
8714. 

Reference Number: [4] 
Reference Medline: 973851 1 6 

Reference Title: Three-dimensional structure of NADPH- 
cytochrome P450 

Reference Title: reductase: prototype for FMN- and FAD- 
containing enzymes. 

Reference Author: Wanq M, Roberts DL, Paschke R, Shea 
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"M, Masters BS, Kim JJ; 

Reference Location: Proc Natl Acad Sci U S A 1 997;94:841 1 - 
5416. 

Database Reference: SCOP; 1 amo; fa; [SCOP-USA] [CATH- 
=>DBSUM] 

Database Reference INTERPRO; IPR001 709; 

"latahaco Rofprpnrp PDB* 1amo A" 274: 493; 

Database Reference PDB; 1amo B; 274; 493; 

Database Reference PDB; 1quf ; 77; 120; 

Database reference: PFAMB; PB001 390; 

Comment: This domain is found in sulfite reductase, 

\JADPH cytochrome P450 

Comment: reductase and Nitric oxide synthase. 
Murnber of members: 87 


FAD_binding_3 




FAD binding domain t 

i 


Accession number: PF01 494 

Definition: FAD binding domain 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_549 (release 4.0) 

Gathering cutoffs: -7 -7 

Trusted cutoffs: -6,20 -6.20 

Noise cutoffs: -7.90 -7.90 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1J 

Reference Medline: 93028353 

Reference Title: Crystal structure of the reduced form of p- 
hydroxybenzoate 

Reference Title: hydroxylase refined at 2.3A resolution. 

Reference Author: Schreuder HA, van der Laan JM } Swarte 

MB, Kalk KH, Hoi WG, 

Reference Author: Drenth J; 

Reference Location: Proteins 1 992;1 4:1 78-1 90. 

Database Reference: SCOP; 2phh; fa; [SCOP-USA][CATH- 

PDBSUM] 

Database Reference INTERPRO; IPR002938; 
Database Reference PDB; 1 pxa ; 5; 35; 
Database Reference PDB; 1 bf3 ; 5; 139; 
Database Reference PDB; 1 bgj ; 5; 139; 
Database Reference PDB; 1bgn ; 5; 139; 
Database Reference PDB; 1 bkw ; 5; 1 39; 
Database Reference PDB; 1cc4 A; 5; 139; 
Database Reference PDB; 1 cc6 A; 5; 1 39; 
Database Reference PDB; 1 c]2 A; 5; 1 39; 
Database Reference PDB; 1pbb ; 5; 139; 
Database Reference PDB; 1pbc ; 5; 139; 
Database Reference PDB; 1 pbd ; 5; 139; 
Database Reference PDB; 1 pbe ; 5; 1 39; 
Database Reference PDB; 1 pbf ; 5; 1 39; 
Database Reference PDB; 1pdh ; 5; 139; 
Database Reference PDB; 2phh ; 5; 139; 
Database Reference PDB; 1 cj3 A; 5; 1 39; 
Database Reference PDB; 1c]4 A; 5; 139; 
Database Reference PDB; 1phh ; 5; 139; 

Database Reference PDB; 1d7l A; 5; 139; 

Database Reference PDB; 1dob ; 5; 139; 

Database Reference PDB; 1doc ; 5; 139; 

Database Reference PDB; 1dod ; 5; 139; 

Database Reference PDB; 1doe ; 5; 139; 

Database Reference PDB; 1ius ; 5; 139; 

Database Reference PDB; 1 iut ; 5; 1 39; 

Database Reference PDB; 1 iuu ; 5; 139; 

Database Reference PDB; 1 iuv ; 5; 139; 

Database Reference PDB; 1 iuw ; 5; 1 39; 

Database Reference PDB; 1 iux ; 5; 1 39; 

Database Reference PDB; 1foh A; 1 0; 1 51 ; 

Database Reference PDB; 1foh D; 10; 1 51 ; 

Database Reference PDB; 1 foh B; 1 0; 1 51 ; 

Database Reference PDB; 1foh C; 1 0; 151 ; 

Database reference: PFAMB; PB040546; 

Comment: This domain is involved in FAD binding in a 

number of enzymes. 

Number of members: 52 
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FAD binding 4 


PDOC00674 


Oxygen oxidoreductases 
covalent FAD-binding 
site 


Some oxygen-dependent oxidoreductases are flavoproteins 
that contains a 

covalently bound FAD group which is attached to a histidine via 
an 8-alpha~ 

(N3-histidyl)-riboflavin linkage. These proteins are: 

- 6-hydroxy-D-nicotine oxidase (EC 1 .5.3.6) (6-HDNO) [1], a 
bacterial enzyme 

that catalyzes the oxygen-dependent degradation of 6- 
hydroxynicotine into 
6-hydroxypyrid-N-methylosmine 

- Plant reticuline oxidase (EC 1 .5.3.9) [2] (berberine-bridge- 
forrning 

enzyme), an enzyme that catalyzes the oxidation of (S)- 
reticuline into (S)- 

scoulerine in the pathway leading to benzophenanthridine 
alkaloids. 

- L-gulonolactone oxidase (EC 1.1.3.8) (l-gulono-gamma-lactone 
oxidase) [3], 

a mammalian enzyme which catalyzes the oxidation of L- 
gulono-1 ,4-lactone to 

L-xylo-hexulonolactone which spontaneously isomerizes to L- 
ascorbate. 

- D-arabinono-1 ,4-lactone oxidase (EC 1 .1 .3.24) (L- 
galactonolactone oxidase) , 

a yeast enzyme involved in the biosynthesis of D- 
erythroascorbic acid [4]. 

- Mitomycin radical oxidase [5], a bacterial protein involved in 
mitomycin 

resistance and that probably oxidizes the reduced form of 
mitomycins. 

- Rhodococcus fascians fasciation locus protein fas5. 

The region around the histidine that binds the FAD group is 
conserved in these 

enzymes and can be used as a signature pattern. 
Description of pattern(s) and/or profile(s) 

Consensus pattern P-x(10)-[DE]-[LlVM]-x(3)-[UViVII~x(9}-[LiVM]- 
x(3)-[GSA]- [GST1-G-H [H is the FAD binding site] 
Sequences known to belong to this class detected by the pattern 
ALL 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Text revised. 
EMBUGenBank: U40390. References 
E1] 

Brandsch R., Hinkkanen A.E., Mauch L., Nagursky H. f Decker K. 
Eur. J. Biochem. 167:315-320(1987). 

[2] 

Dittrich H., Kutchan T.M. 

Proc. Natl. Acad. Sci. U.S.A. 88:9969-9973(1991). 
[3] 

Koshizaka T., Nishikimi M., Ozawa T., Yagi K. 
J. Biol. Chem. 263:1619-1621(1988). 

[4] 

Huh W.-K., Kim S.-T., Kim J.-Y., Hwang S.-W., Kang S.-O. 
[5] 

August P.R., Flickinger M.C., Sherman D.H. 
J. Bacteriol. 176:4448-4454(1994). 


fer2 


PDOC00175; 
PDOC00642 


2Fe-2S ferredoxins, iron- 
sulfur binding region 
signature; Adrenodoxin 
family, iron-suifur binding 


Ferredoxins [1] are a group of iron-sulfur proteins which mediate 
electron 

transfer in a wide variety of metabolic reactions. Ferredoxins can 
be divided 
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region signature 


into several subgroups depending upon the physiological nature 
of the iron 

sulfur clusterfs) and according to sequence similarities. One 
of these 

subgroups are the 2Fe-2S ferredoxins, which are proteins or 
domains of around 

one hundred amino acid residues that bind a single 2Fe-2S iron- 
sulfur cluster. 

The proteins that are known [2] to belong to this family are listed 
below. 

- Ferredoxin from photosynthetic organisms; namely plants and 
algae where it 

is located in the chloroplast or cyanelle; and cyanobacteria. 

- Ferredoxin from archaebacteria of the Halobacteriurn genus. 

- Ferredoxin IV (gene pftA) and V (gene fdxD) from Rhodobacter 
capsulatus. 

- Ferredoxin in the toluene degradation operon (gene xylT) and 
naphthalene 

degradation operon (gene nahT) of Pseudomonas putida. 

- Hypothetical Escherichia coli protein yfaE. 

- The N-terminal domain of the bifunctional ferredoxin/ferredoxin 
reductase 

electron transfer component of the benzoate 1 ,2-d (oxygenase 
complex (gene 

benC) from Acinetobacter calcoaceticus, the toluene 4- 
monooxygenase complex 

(gene tmoF), the toiuate 1 ,2-dioxygenase system (gene xylZ), 
and the xylene 

monooxygenase system (gene xylA) from Pseudomonas. 

- The N-terminal domain of phenol hydroxylase protein p5 
(gene dmpP) from 

Pseudomonas Putida. 

- The N-terminal domain of methane monooxygenase 
component C (gene mmoC) 

from Methylococcus capsulatus . 

- The C-terminal domain of the vaniliate degradation pathway 
protein vanB in 

a Pseudomonas species. 

- The N-terminal domain of bacterial fumarate reductase iron- 
sulfur protein 

(gene frdB). 

-The N-terminal domain of CDP~6-deoxy-3,4-giucoseen 
reductase (gene ascD) 
from Yersinia pseudotuberculosis. 

- The central domain of eukaryotic succinate dehydrogenase 
(ubiquinone) iron- 
sulfur protein. 

- The N-terminal domain of eukaryotic xanthine dehydrogenase. 

- The N-terminal domain of eukaryotic aldehyde oxidase. 

In the 2Fe-2S ferredoxins, four cysteine residues bind the 
iron-sulfur 

cluster. Three of these cysteines are clustered together in the 
same region of 

the protein. Our signature pattern spans that iron-sulfur binding 
region. 

Description of pattern(s) and/or profile(s) 

Consensus pattern C-{CHC}-[GA]-{C}-C-[GAST]- 
{CPDEKRHFYW}~C P"he three C's are 2Fe-2S ligandsj 
Sequences known to belong to this class detected by the pattern 

Ai i 
ALL. 

Other sequence(s) detected in SWISS-PROT 15. 

Note in addition to the proteins listed above there are a number of 
other ferredoxin-like proteins that bind a 2Fe-2S cluster but which 
do not seem to be evolutionary related to this family. Among them 
are the ferredoxins from the adrenodoxin family (see 
<PDOC00642>) as well as the bacterial aromatic dioxygenase 
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systems ferredoxin-like proteins such as bnzC, ndoA, and todB. 








Last update 








November 1997 / Text revised. 
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Ferredoxins [1 ] are a group of iron-sulfur proteins which mediate 








electron 








transfer in a wide variety of metabolic reactions. Ferredoxins can 








be divided 








into several subgroups depending upon the physiological nature 








of the iron 








sulfur ciuster(s) and according to sequence similarities. One 








family of 








ferredoxins groups together the following proteins that all bind a 








single 2Fe~ 








2S iron-sulfur cluster: 








- Adrenodoxin (ADX) (adrenal ferredoxin), a vertebrate 








mitochondrial protein 








which transfers electrons from adrenodoxin reductase to 








cytochrome P450scc, 








which is involved in cholesterol side chain cleavage. 








- Putidaredoxin (PTX), a Pseudomonas putida protein which 








transfers electrons 








from putidaredoxin reductase to cytochrome P450-cam, which 








is involved in 








the oxidation of camphor. 








- Terpredoxin [2], a Pseudomonas protein which transfers 








electrons from 








terpredoxin reductase to cytochrome P450-terp f which is 








involved in the 








oxidation of alpha-terpineol. 








- Rhodocoxin [3], a Rhodococcus protein which transfers 








electrons from 








rhodocoxin reductase to cytochrome CYP1 1 6 (thcB), which is 








involved in the 








degradation of thiocarbamate herbicides. 








- Escherichia coli ferredoxin {gene fdx) [4] whose exact function 








is not yet 








known. 








- Rhodobacter capsulatus ferredoxin VI [5), which may transfer 








electrons to a 








yet uncharacterized oxygenase. 








- Caulobacter crescentus ferredoxin (gene fdxB) [6]. 








in these proteins, four cysteine residues bind the iron-sulfur 








cluster. Three 








of these cysteines are clustered together in the same region of 








the protein. 








Our signature pattern spans that iron-su!fur binding region. 








Description of pattern(s) and/or profile(s) 








Consensus pattern C-x(2)-[STAQ]-x-[STAMVl-C-[STA]-T^C-[HR] 








[The three C's are 2Fe-2S Hgands] 








Sequences known to belong to this class detected by the pattern 








ALL. 








Other sequence(s) detected in SWISS-PROT 1 . 








Last update 








November 1995 / Pattern and text revised. 
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Amemiya K 


Ferric_reduct 




Ferric reductase like 

transmembrane 

component 


Accession number: PF01 794 

Definition: Ferric reductase like transmembrane 

component 

Author: Bashton M, Bateman A 

Alignment method of seed: T Coffee 

Source of seed members: Pfam-B_728 (release 4.2) 

Gathering cutoffs: -1 22 - 1 22 

Trusted cutoffs: -34.80 -34.80 

Noise cutoffs: -210.30 -210.30 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 93309468 

Reference Title: The fission yeast ferric reductase gene 
frp1 + is required 

Reference Title: for ferric iron uptake and encodes a protein 
that is 

Reference Title: homologous to the gp91 -phox subunit of 
the human NADPH 

Reference Title: phagocyte oxidoreductase. 
Reference Author: Roman DG, Dancis A, Anderson GJ 3 
Klausner RD; 

Reference Location: Mol Cell Biol 1 993; 1 3:4342-4350. 

Reference Number: [2] 

Reference Medline: 92294876 

Reference Title: Cytochrome b558: the flavin-binding 

component of the 

Reference Title: phagocyte NADPH oxidase. 

Reference Author: Rotrosen D, Yeung CL, Leto TL, Malech 

HL f Kwong CH; 

Reference Location: Science 1 992;256:1 459-1 462. 
Reference Number: [3] 
Reference Medline: 87258189 

Reference Title: The glycoprotein encoded by the X-l inked 
chronic 

Reference Title: granulomatous disease locus is a 
component of the 

Reference Title: neutrophil cytochrome b complex. 
Reference Author: Dinauer MC, Orkin SH, Brown R, Jesaitis 
AJ, Parkos CA; 

Reference Location: Nature 1987;327:717-720. 
Reference Number: [4] 
Reference Medline: 872581 90 

Reference Title: The X-linked chronic granulomatous 
disease gene codes for 

Reference Title: the beta- chain of cytochrome b-245. 
Reference Author: Teahan C, Rowe P, Parker P, Totty N, 
Segal AW; 

Reference Location: Nature 1 987;327:720-721 . 

Database Reference INTERPRO; IPR00291 6; 

Comment: This family includes a common region tn the 

transmembrane proteins 

Comment: mammalian cytochrome B-245 heavy chain 
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(gp91-phox), ferric reductase 

Comment. transmembrane component in yeast and 
respiratory burst oxidase from 
Comment: mouse-ear cress. 

Comment: This may be a family of flavocytochromes 
capable of moving electrons 

Comment: across the plasma membrane [1]. 
Comment: The Frp1 protein Swiss.Q04800 from S. 
pombe is a ferric reductase 

Comment: component and is required for cell surface 
ferric reductase activity, 

Comment: mutants in frp1 are deficient in ferric iron 
uptake [1]. 

Comment: Cytochrome B-245 heavy chain 
Swiss: P04839 is a FAD-dependent 
Comment: dehydrogenase it is also has electron 
transferase activity which reduces 

Comment: molecular oxygen to superoxide anion, a 

precursor in the production of 

Comment: microbicidal oxidants [2]. 

Comment: Mutations in the sequence of cytochrome B- 

245 heavy chain (gp91 -phox) 

Comment: lead to the X-linked chronic granulomatous 
disease. The bacteriocidal 

Comment: ability of phagocytic cells is reduced and is 
characterised by the 

Comment: absence of a functional plasma membrane 
associated NADPH oxidase [3]. 

Comment: The chronic granulomatous disease gene 
codes for the beta chain of 

Comment: cytochrome B-245 and cytochrome B-245 is 
missing from patients with 
Comment: the disease [4]. 

Comment: The aligned region includes a potential FAD 

binding domain. 

Number of members: 34 


Flavi_NS5 




Flavivirus RNA-directed 
RNA polymerase 


Accession number: PF00972 

Definition: Flavivirus RNA-directed RNA polymerase 

Author: Finn RD, Bateman A 

Alignment method of seed: Clustatw 

Source of seed members: Pfam-B_200 (release 3.0) 

Gathering cutoffs: 12 12 

Trusted cutoffs: 1 6.00 1 6.00 

Noise cutoffs: 8.50 8.50 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 95159427 

Reference Title: Phylogeny of TYU, SRE, and CFA virus: 
different 

Reference Title: evolutionary rates in the genus Flavivirus. 
Reference Author: Marin MS, Zanotto PM, Gritsun TS, Gould 
EA; 

Reference Location: Virology 1 995;206: 1 1 33-1 1 39. 
Reference Number: [2] 
Reference Medline; 96182933 

Reference Title: Recombinant dengue type 1 virus NS5 
protein expressed in 

Reference Title: Escherichia coli exhibits RNA-dependent 
RNA polymerase 
Reference Title: activity. 

Reference Author: Tan BH, Fu J, Sugrue RJ, Yap EH, Chan 
YC, Tan YH; 

Reference Location: Virology 1 996;21 6:31 7-325. 

Reference Number: [3] 

Reference Medline: 93224895 

Reference Title: Computer-assisted identification of a 

putative 

Reference Title: methyltransf erase domain in NS5 protein of 
flaviviruses and 

Reference Title: lambda 2 protein of reovirus. 

Reference Author: Koonin EV; 

Reference Location: J Gen Virol 1 993; 74: 733-740. 
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Reference Number: [4] 
Reference Medline: 94094568 

Reference Title: Evofution and taxonomy of positive-strand 
RNA viruses: 

Reference Title: implications of comparative analysis of 
amino acid 

Reference Title: sequences. 

Reference Author: Koonin EV, Dolja W; 

Reference Location: Crit Rev Btochem Mol Biol 1993;28:375- 

430. 

Database Reference INTERPRO; 1PR000208; 

Comment: Ffaviviruses produce a poiyprotein from the 

ssRNA genome. 

Comment: This protein is also known as NS5. 
Comment: This RNA-directed RNA polymerase 
possesses a number of short 

Comment: regions and motifs homologous to other 
RNA-directed RNA 

Comment: polymerases [2]. 
Number of members: 1 59 


Forkhead 


PDOC00564 


Fork head domain 
signatures and profile 


It has been shown [1] that some eukaryotic transcription factors 
contain a 

conserved domain of about 100 amino-acid residues, called 
the fork head 

domain (but also known as a "winged helix"), which is involved in 
DNA-bindmg 

[2]. Proteins known to contain this domain are listed below. 

- Drosophtla fork head protein (fkh). Fkh is probably a 
transcription factor 

that regulates the expression of genes involved in terminal 
development. 

- Drosophila protein crocodile (gene croc) [3], which is required 
for the 

establishment of head structures. 

- Drosophila proteins FD2, FD3, FD4, and FD5. 

- Drosophila proteins sloppy paired 1 and 2 (slpl and sip2} 
involved in 

segmentation. 

- Bombyx mori silk gland factor-1 (SGF-1) which regulates 
transcription of 

the sericim-1 gene. 

- Mammalian transcriptional activators HNF-3-alpha, -beta, and 
-gamma. The 

HNF-3 proteins interact with the cis-acting regulatory regions of 
a number 
of liver genes. 

- Mammalian interleukin-enhancer binding factor (ILF). ILF 
binds to the 

purine-rich NFAT-like motifs in the H1V-1 LTR and the 
interleukin-2 

promoter. ILF may be involved in both positive and negative 
regulation of 
important viral and cellular promoter elements. 

- Mammalian transcription factor BF-1 which plays an important 
role in the 

establishment of the regional subdivision of the developing 
brain and in 
the development of the telencephalon. 

- Human HTLF, a protein that binds to the purine-rich region in 
human T-cell 

leukemia virus long terminal repeat (HTLV-l LTR). 

- Mammalian transcription factors FREAC-t (FKHL5, HFH-8), 
FREAC-2 (FKHL6), 

FREAC-3 (FKHL7, FKH-1), FREAC-4 (FKHL8), FREAC-5 
(FKHL9, FKH-2, HFH-6), 

FREAC-6 (FKHL10, HFH-5), FREAC-7 (FKHL11), FREAC-8 
(FKHL12 5 HFH-7), FKH-3, 

FKH-4, FKH-5, HFH-1 and HFH-4. 

- Human AFX1 which is involved in a chromosomal 
translocation that causes 

acute leukemia. 

- Human FKHR which is involved in a chromosomal 
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ransiocation that causes 
rhabdomyosarcoma. 

- Xenopus XFKH1 , a protein essential for normal axis formation. 

- Caenorhabditis elegans lin-31 ; involved in the regulation of 
/ulval cell 

fates. 

- Yeast HCM1 , a protein of unknown function. 
-Yeast FKH1. 

- Yeast FKH2. 

rhe fork domain is highly conserved. We have developed two 
aatterns for its 

detection. The first corresponds to the N-terminai section of the 
domain; the 

second is a heptapeptide located in the central section of the 
domain. 

Description of patterns) and/or profile(s) 

Consensus pattern [KR]-P-[PTQ]-[FYLVQH]-S-FY]-x(2)-[LIVM]- 
x(3,4)-[AC]- [LI M] 

Sequences known to belong to this class detected by the pattern 

ALL, except for AFX1 and FKHFt. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern W-[QKR]-[NS]-S-[LIV]-R-H 

Sequences known to belong to this class detected by the pattern 

ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Patterns and text revised. 

References 

[1] 

Weigel D., Jaeckie H. 
Cell 63:455-456(1990). 

[2] 

Clark K.L, Halay E.D., Lai E., Buriey S.K. 
Nature 364:412-420(1993). 

[3] 

Haecker U., Kaufmann E., Hartmann C, Juergens G., Knoechel 
W., Jaeckie H. 

EMBO J. 14:5306-5317(1995). 


FtsJ 




FtsJ ceil division protein 


Accession number: PF01728 

Definition: FtsJ cell division protein 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1791 (release 4.1) 

Gathering cutoffs: -38 -38 

Trusted cutoffs: -20.90 -20.90 

Noise cutoffs: -56.70 -56.70 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 931 86701 

Reference Title: The Escherichia coll FtsH protein is a 
prokaryotic member 

Reference Title: of a protein family of putative ATPases 
involved in 

Reference Title: membrane functions, cell cycle control, and 
gene 

Reference Title; expression. 

Reference Author: Tomoyasu T, Yuki T, Morimura S, Mori H, 
Yamanaka K, Niki H, 

Reference Author: Hiraga S, Ogura T; 

Reference Location: J Bacteriol 1993;1 75:1 344-1 351 . 

Database Reference INTERPRO; IPR002877; 

Database reference: PFAMB; PB030182; 

Comment: This family consists of FtsJ from various 

bacterial and archaeal sources 
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Comment: In E. coli FtsJ is not essential for growth but 
affects cell division [1]. 
dumber of members: 25 


FTSW RODA SPO 
VE 


3 DOC00352 i 


3ell cycle proteins ftsW / j 
-odA / spoVE signature | 
1 

i 


\ number of prokaryotic proteins involved in cell cycle 
Drocesses have been 

ound [1 ,2] to be structurally related, these proteins are: 

- Escherichia coli and related bacteria ceil division protein 
ftsW. This 

protein plays a role in the stabilization of the ftsZ ring during 
sell 
division. 

- Escherichia coli and related bacteria rod shape-determining 
protein rodA 

(or mrdB). It is required for the expression of the enzymatic 
activity of 

PBP2, which is thought to participate in the synthesis of 
peptidoglycan 
during the initiation of cell elongation. 

- Bacillus subtilis stage V sporulation protein E (spoVE). The 
exact function 

of spoVE in endospore formation is not known. 

- Bacillus subtilis hypothetical protein ylaO. 

- Bacillus subtilis hypothetical protein ywcF (ipa-42D). 

- Cyanophora paradoxa cyanelle ftsW homolog. This protein may 
be involved in 

the organelle division process. 

All these proteins are hydrophobic integral membrane protein and 
contain about 

400 residues. We have selected the best conserved region, 
which is located in 

the C-terminai section, as a signature pattern for these proteins. 
Description of pattern (s) and/or profile(s) 

Consensus pattern [NV]-x(5)-[GTR]-[UVMA]-x-P-[PTLIVM]-x-G- 
[LIVM]-x(3}- [LlVMR/v](2)-S-[YSA}-G-G-[STN]-[SA] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Pattern and text revised. 

References 

[1] 

Ikeda M. f Sato T., Wachi M., Jung H.K., ishino F., Kobayashi Y., 
Matsuhashi M. 

J. Bacteriol. 171:6375-6378(1989). 
[2] 

Joris B., Dive G., Henriques A., Piggot P.J., Ghuysen J.-M. 
Mol. Microbiol. 4:513-517(1990). 


Furin-like 




Furin-like cysteine rich 
region 


Members of this family include receptors that mediate 
transmembrane signalling. These receptors can bind to a number 
of factors including: amphiregulin, epidermal growth factor, gp30, 
heparin-binding egf, insulin, insulin-like growth factor I and II, 
neuregulins, transforming growth factor-alpha and, and vaccinia 
virus growth 

Signal transduction is mediated by catalytic activity of 
tyrosine kinase, such as ATP + A protein tyrosine = ADP + protein 
tyrosine phosphate. Typically, such signal transduction have 
been implicated in metabolic and developmental changes, 
including cell fate and differentiation. Examples include instruction 
of follicle cells to follow a dorsal pathway of development rather 
than the default ventral pathway, may also bind the spitz protein. 
References describing these family members and their biological 
activities: 



Attorney No. 2750-1237P 



897 



Ffarn f 


^roslte jl 


~uft Name E 


description 






/ 

1 
E 
f 

< 
< 
[ 

( 

i 


\bbot et al., J. Biol. Chem. 267:1 0759-1 0763(1 992) ;Araki et aL, 
I Biol. Chem. 262:16186-16191(1987); Aroian etal., EMBO J. 
3:360-366(1994); Aroian et al., Nature 348:693-699(1990); 
Sarbetti et aL, Diabetes 41:408-415(1992); Bargmann et al., 
Mature 319:226-230(1986); Cama et a!., J. Biol. Chem. 268:8060- 
3069(1993); Cama et al., J. Cim. Endocrinol. Metab. 73:894- 
301(1991); Carrera etal., Hum. MoL Genet. 2:1437-1441 (1993); 
Clifford et al., Genetics 137:531-550(1994); Cocozza etal., 
Diabetes 41:521-526(1992); Cooke et al., Biochem. Biophys. Res. 
Dommun. 177:1113-1120(1991); Coussens et aL, Science 
230:1132-1139(1985); Dickens etal., Biochem. Biophys. Res. 
Dommun. 186:244-250(1992); Ebina et al., Cell 40:747- 
758(1985); Ebina et al., Proc. Natl. Acad. Sci. U.S.A. 84:704- 
708(1987); Ehsani et al., Genomics 15:426-429(1993); Elbein et 
al., Diabetes 42:429-434(1993); Elbein, Diabetes 38:737- 
743(1989); Fujita-Yamaguchi et al., Protein Seq. Data Anal. 1:3- 
3(1987); Gullick et al., EMBO J. 11:43-48(1992); Haruta et al., 
Diabetes 42:1837-1844(1993); Hubbard et al., EMBO J. 16:5572- 
5581(1997). 

Hubbard et al., Nature 372:746-754(1994); Iwanishi et al., 
Diabetologia 36:414-422(1993); Kadowaki et a!., J. Clin. Invest. 
86:254-264(1990); Kadowaki etal., Science 240:787-790(1988); 
Kim et al., Diabetologia 35:261-266(1992); Kiinkhamer etal., 
EMBO J. 8:2503-2507(1989); Kusari et al., J. Biol. Chem. 
266:5260-5267(1991); Lai et al., Neuron 6:691-704(1991); Lax et 
aL, Moi. Cell. Biol. 8:1970-1978(1988); Lebrun et al., J. Biol. 
Chem. 268:11272-11277(1993); Lee etal., Oncogene 8:3403- 
3410(1993); Lesokhin et al., Dev. Biol. 205:129-144(1999); Livneh 
et al., Cell 40:599-607(1985) 

Longo et al., Proc. Natl. Acad. Sci. U.S.A. 90:60-64(1993); 
McKeon et aL, MoL Endocrinol. 4:647-656(1990); Molier et aL, J. 
Biol. Chem. 265:14979-14985(1990); Molier et al., MoL 
Endocrinol. 4:1183-1191 (1990); Odawara et al., Science 245:66- 
68(1989); RazetaL, Genetics 129:191-201(1991). 
Sakai et al., J. MoL Biol. 256:548-555(1996); Schaeffer et al., 
Biochem. Biophys. Res. Commun. 189:650-653(1992); Schejter et 
al., Cell 46:1091-1101(1986); Seino etal., Biochem. Biophys. 
Res. Commun. 159:312-316(1989); Seino et aL, Diabetes 39:123- 
128(1990); Semba et aL, Proc. Natl. Acad. Sci. U.S.A. 82:6497- 
6501(1985); Shier et al., J. BioL Chem. 264:14605-14608(1989); 
Taira et al., Science 245:63-66(1989); Tewari et aL, J. BioL 
Chem. 264:1 6238-1 6245(1 989); Ullrich et aL, Nature 31 3:756- 
761(1985). 

Ullrich et al., EMBO J. 5:2503-2512(1986); van der Vorm et aL, 
Diabetologia 36:1 72-1 74(1 993); van der Vorm et aL, J. Btol. 
Chem. 267:66-71 (1 992); Wadsworth et aL, Nature 31 4:1 78- 
180(1985); White et aL, Cell 54:641-649(1988); Xu et aL, J. BioL 
Chem. 265:18673-18681(1990); Yamamoto et al., Nature 
319:230-234(1986); and Yoshimasa et aL, Science 240:784- 
787(1988). 
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Galactosyltransferase 


Accession number: PF01762 

Definition: Galactosyltransferase 

Author: Bashton M, Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfam-B__885 (release 4.2) 

Gathering cutoffs: -46 -46 

Trusted cutoffs: -43.90 -43.90 

Noise cutoffs: -49.80 -49 80 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcahbrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98079080 

Reference Title: Cloning of a human 

Reference Title: UDP-galactose:2-acetamido-2-deoxy-D- 

glucose 3beta- 

Peference Title: galactosyltransferase catalyzing the 
formation of type 1 
Reference Title: chains. 

Reference Author: Kolbinger F, Streiff MB, Katopodis AG; 
Reference Location: J Biol Chem 1998;273:433-440. 
Reference Number: [2] 
Reference Medline: 98079027 

Reference Title: Genomic cloning and expression of three 
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murine 

Reference Title: 
Reference Title: 
Reference Author: 
Rudd PM, Berger 
Reference Author: 
Reference Location: 
Database Reference 
Database reference: 
Database reference: 
Comment: 



G-protein alpha subunit 



UDP-galactose: beta-N- acetylglucosamine 
betal ,3-galactosyltransf erase genes. 
Hennet T, Dmter A, Kuhnert P, Mattu TS, 



EG; 

J Biol Chem 1998;273:58-65. 
INTERPRO; IPR002659; 
PFAMB; PB005938; 
PFAMB; PB012965; 
This family includes the 
galactosyitransferases 

Comment: UDP-gatactose:2-acetamido-2-deoxy-D- 
glucose3beta-galactosyltransferase 

Comment: Swiss:043825 [1] and UDP-Gai:beta- 

GlcNAc beta 1 ,3-galactosyltranferase 

Comment: Swiss:O54904 [2]. 

Comment: Specific galactosyitransferases transfer 

galactose to GlcMAc terminal 

Comment: chains in the synthesis of the lacto-series 

oligosaccharides types 1 
Comment: and2[1]. 
Number of members: 29 



Accession number: PF00503 

Definition: G-protein alpha subunit 

Author: Finn RD 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 1 {release 1 .0) 

Gathering cutoffs: 1 3.8 1 3.8 

Trusted cutoffs: 13.80 13.80 

Noise cutoffs: 9.70 1 2.70 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 



Reference Number: 
Reference Medline: 
Reference Title: 
alpha 1 and the 
Reference Title: 
Reference Author: 
ME, Gilman AG, 
Reference Author: 
Reference Location: 
Reference Number: 
Reference Medline: 
Reference Title: 
Reference Author: 
Reference Location: 
Database Reference: 
Database Reference: 
PDBSUM] 

Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 



[1] 

94353239 
Structures of active conformations of Gi 

mechanism of GTP hydrolysis. 
Coleman DE, Berghuis AM, Lee E, Under 

Sprang SR; 

Science 1994;265:1405-1412. 
[21 

97004345 

How G proteins work: a continuing story. 
Coleman DE, Sprang SR; 
Trends Biochem Sci 1996;21 :41-44. 
PRINTS; PR00318; 
SCOP; 1gia; fa; [SCOP-USA][CATH- 

INTERPRO; IPR001019; 
PDB; 1gia;34; 343; 
PDB; 1gil;34; 343; 
PDB; 1as0 ; 32; 344; 
PDB;1gfi;33;345; 
PDB; 1as2;32; 346; 
PDB; 1 bh2 ; 32; 346; 
PDB; 1cip A; 32; 347; 
PDB; 1 git; 32; 348; 
PDB; 1agr D; 11; 353; 
PDB; 1gg2 A; 6; 348; 
PDB; 1 gp2 A; 6; 348; 
PDB; 1bof ; 10; 353; 
PDB; 1as3; 9; 353; 
PDB; 1gdd ; 9; 353; 
PDB; 1agrA; 6; 353; 
PDB; 1tag ; 27; 340; 
PDB; 1tad A; 27; 342; 
PDB; 1tad B; 27; 342; 
PDB; 1tnd B; 27; 342; 
PDB; 1tnd C;27; 342; 
PDB; 1tad C; 27; 344; 
PDB; 1tnd A; 27; 349; 
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GCV_H 



Glycine cleavage H- 
protein 



GCV T 



Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database reference: 
Comment 



PDB; 1qk C; 39; 388; 
PDB; 1cjt C;39; 388; 
PDB; 1cju C; 39; 388; 
PDB; Icjv C; 39; 388; 
PDB; 1aztA; 35; 391; 
PDB; 1aztB; 35; 391; 
PDB; 1azs C; 36; 393; 
PFAMB; PB034080; 
G proteins couple receptors of extracellular 



signals to intracellular 
Comment: signaling pathways. 

Comment: The G protein alpha subunit binds guanyl 

nucleotide and is a weak 
Comment: GTPase. 
Number of members: 245 



Accession number: PF01597 

Definition: Glycine cleavage H -protein 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_988 (release 4.1 ) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 27.90 27.90 

Noise cutoffs: -58.80 -58.80 

HMM build command line: hmmbuiid -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 



Reference Number: 
Reference Medline: 
Reference Title: 
resolution of a 
Reference Title: 
the glycine 
Reference Title: 
Reference Author: 
Neuburger M, Douce R 
Reference Location: 
4853. 

Database Reference: 
PDBSUM] 

Database Reference 
Database Reference 
Database Reference 
Database Reference 
Comment: 



[1] 

94255425 
X-ray structure determination at 2.6-A 

lipoate- containing protein: the H-protein of 

decarboxylase complex from pea leaves. 
Pares S, Cohen-Addad C, Sieker L, 

Proc Natl Acad Sci U S A 1994;91 .4850- 

SCOP; 1htp; fa; [SCOP-USA][CATH- 

INTERPRO; IPR002930; 
PDB; 1hpc A; 2; 127; 
PDB; 1hpc B; 2; 127; 
PDB; 1htp;2;127; 
This is a family of glycine cleavage H- 



proteins, part of the glycine 
Comment: cleavage multienzyme complex (GCV) 

found in bacteria and the mitochondria 
Comment: of eukaryotes. GCV catalyses the 

catabolism of glycine in eukaryotes. 

Comment: A lipoyl group is attached to a completely 

conserved lysine residue. 

Comment: The H protein shuttles the methylamine 

group of glycine from the 

Comment: P protein to the T protein. 

Number of members: 40 



Glycine cleavage T- 
protein (am inom ethyi 
transferase) 



Accession number: PF01571 

Definition: Glycine cleavage T-protein (aminomethyl 

transferase) 

Author: Bashton M 3 Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_933 (release 4.0) 

Gathering cutoffs: -1 46 -1 46 

Trusted cutoffs: -1 24.50 -1 24.50 

Noise cutoffs: -167.90 -167.90 

HMM build command line: hmmbuiid -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 



Reference Number: 
Reference Medline: 
Reference Title: 
the GCV1 gene 
Reference Title: 
from Saccharomyces 
Reference Title: 



[1] 

97199363 

Cloning, and molecular characterization of 
encoding the glycine cleavage T-protein 
cerevisiae. 
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Reference Author: McNeil JB, Zhang F, Taylor BV, Sinclair 

DA, Peariman RE, 

Reference Author: Bognar AL; 

Reference Location: Gene 1 997;1 86:1 3-20 

Database Reference INTERPRO; IPR002536; 

Database reference: PFAMB; PB004229; 

Comment: This is a family of glycine cleavage T- 

proteins, part of the glycine 

Comment: cleavage multienzyme complex (GCV) 

found in bacteria and the mitochondria 

Comment: of eukaryotes. GCV catalyses the 

catabolism of glycine in eukaryotes. 

Comment: The T-protein is an aminomethyi 

transferase. 

Number of members: 27 


G-gamma 


PDOC01002 


G-protein gamma subunit 
profile 


Guanine nucieotide-binding proteins (G proteins) [1] act as 
intermediaries in 

the transduction of signals generated by transmembrane 
receptors. G proteins 

consist of three subunits (alpha, beta, and gamma). The alpha 
subunit binds to 

and hydrolyzes GTP; the functions of the beta and gamma 
subunits are less 

clear but they seem to be required for the replacement of GDP 
by GTP as well 

as for membrane anchoring and receptor recognition. 

The gamma subunits are small proteins (from 70 to 110 
residues) that are 

bound to the membrane via a isoprenyl group (either a farnesyl 
or a geranyl- 

geranyl) covalently linked to their C-terminus. In mammals there 
are at least 

12 different isoforms of gamma subunits. 

The Caenorhabditis elegans protein egl-10, which is a regulator 
of G-protein 

signalling, contains a G-protein gamma-like domain. 

We have developed a profile that spans the complete length 

of the gamma 

subunit. 

Description of pattern (s) and/or profiie(s) 

Sequences known to belong to this class detected by the profile 
ALL, except for yeast and squid G-protein gamma. 
Other sequence(s) detected in SWISS-PROT NONE. 
Expert{s) to contact by email 
Pennington S.R. srpenn@ltverpool.ac.uk 

Last update 

November 1 997 / First entry. 

References 

[1] 

Pennington S.R. 

Protein Prof. 2:16-315(1995). 


glutaredoxin 


PDOC00173 


Glutaredoxin 


Glutaredoxin [1,2,3], also known as thioltransferase, is a small 
protein of 

approximately one hundred amino-acid residues. It functions as 
an electron 

carrier in the glutathione-dependent synthesis of 
deoxyribonucleotides by the 

enzyme ribonucleotide reductase. Like thioredoxin, which 
functions in a 

similar way, glutaredoxin possesses an active center disulfide 
bond. It exists 

in either a reduced or an oxidized form where the two cysteine 
residues are 

linked in an intramolecular disulfide bond. 
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Giycohydrol 



PDOC00495 



Glycosyl hydrolases 
family 1 signatures 



Glutaredoxin has been sequenced in a variety of species. On 
the basis of 

extensive sequence similarity, it has been proposed [4] that 
vaccinia protein 

02L is most probably a glutaredoxin. Finally, it must be noted 
that phage T4 

thioredoxin seems also to be evolutionary related. 



Description of pattern{s) and/or profile(s) 

Consensus pattern [LIVDHFYSA]-x(4)-C-[PV]-[FYWH3-C-x{2)- 
\TAV\-x(2,3)-\LW\ [The two C's form the redox-active bond] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note in position 5 of the pattern, all glutaredoxin sequences have 
Pro while T4 thioredoxin has Val. 
Last update 

December 1999 / Pattern and text revised. 

References 

Ml 

Gleason F.K., Holmgren A. 

FEMS Microbiol. Rev. 54:271-298(1988). 

[2] 

Holmgren A. 

Biochem. Soc. Trans. 16:95-96(1988). 
[3] 

Holmgren A. 

J. Biol. Chem. 264:13963-13966(1989). 
[4] 

Johnson G.P., Goebel S.J., Perkus M.E., Davis S.W., Winslow 

J.P., Paoletti E. 

Virology 1 81 :378-381 (1 991 ) . 



it has been shown [1 to 4] that the following glycosyl hydrolases 
can be, on 

the basis of sequence similarities, classified into a single family: 

- Beta-glucosidases (EC 3.2.1 .21) from various bacteria such as 
Agrobacterium 

strain ATCC 21400, Bacilius polymyxa, and Caidocellum 
saccharoiyticum. 

- Two plants (clover) beta-glucosidases (EC 3.2.1 .21). 
-Two different beta-galactosidases (EC 3.2.1.23) from the 
archaebacteria 

Sulfolobus solfataricus (genes bgaS and lacS). 

- 6-phospho-beta-galactosidases (EC 3.2.1 .85) from various 
bacteria such as 

Lactobacillus casei, Lactococcus lactis, and Staphylococcus 
aureus. 

- 6-phospho-beta-glucosidases (EC 3.2.1 .86) from Escherichia 
co!i (genes bgIB 

and ascB) and from Erwinia chrysanthemi (gene arbB). 

- Plants myrosinases (EC 3.2.3.1) (sinigrinase) (thioglucosidase). 

- Mammalian lactase-phlorizin hydrolase (LPH) (EC 3.2.1 .108 / 
EC 3.2.1.62). 

LPH, an integral membrane glycoprotein, is the enzyme that 
splits lactose 

in the small intestine. LPH is a large protein of about 1900 
residues which 

contains four tandem repeats of a domain of about 450 
residues which is 

evolutionary related to the above glycosyl hydrolases. 



One of the conserved regions in these enzymes is centered on 
a conserved 

glutamic acid residue which has been shown [5], in the beta- 
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:j!ucosidase from 

^grobacterium, to be directly involved in glycosidic bond 
cleavage by acting 

as a nucleophiie. We have used this region as a signature pattern. 
\s a second 

signature pattern we selected a conserved region, found in the 
sj-terminal 

extremity of these enzymes, this region also contains a glutamic 
acid residue. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [LIVMFSTC]-[LiVFYS}-[LIVl-[LIVMST]-E-N-G- 
L!VMFAR]-[CSAGN] [E is the active site residue] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 12. 

Note this pattern will pick up the last two domains of LPH; the first 
two domains, which are removed from the LPH precursor by 
proteolytic processing, have lost the active site glutamate and 
may therefore be inactive [4], 

Consensus pattern F-x-[FYWM]-[GSTA]-x-EGSTA]-x-[GSTA}{2)- 
[FYNH]-[NQ]-x-E-x- [GSTA] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note this pattern will pick up the last three domains of LPH. 
Expert (s) to contact by email 
Henrissat B. bernie@af mb.cnrs-mrs.fr 

Last update 

November 1995 / Patterns and text revised. 

References 

[1] 

Henrissat B. 

Biochem. J. 280:309-316(1991). 
[2] 

Henrissat B. 

Protein Seq. Data Anal. 4:61-62(1991). 
[3] 

Gonzalez-Candeias L., Ramon D. ; Polaina J. 
Gene 95:31-38(1990). 

[4] 

El Hassouni M., Henrissat B., Chippaux M., Barras F. 
J. Bacterid. 174:765-777(1992). 

[5] 

Withers S.G., Warren R.A.J., Street I.P., Rupitz K., Kempton J.B., 
Aebersold R. 

J. Am. Chem. Soc. 112:5887-5889(1990). 


Glyco__hydro_1 9 


PDOC00620 


Chitinases family 19 
signatures 


Chitinases (EC 3.2.1 .1 4) [1] are enzymes that catalyze the 
hydrolysis of the 

beta-1 ,4-N-acetyl-D-glucosamtne linkages in chitin polymers. 
From the view 

point of sequence similarity chitinases belong to either family 18 
or 19 in 

the classification of giycosyl hydrolases [2,E1]. Chitinases of 
family 19 

(also known as classes IA or I and IB or II) are enzymes from 
plants that 

function in the defense against fungal and insect pathogens by 
destroying 

their chitin-containing cell wall. Class IA/I and IB/ll enzymes differ 
in the 
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presence (lA/t) or absence (IB/II) of a N-terminal chitm-binding 
Jomain (see 

he relevant entry <PDOC00025>). The catalytic domain of these 
enzymes consist 

Df about 220 to 230 amino acid residues. 

\s signature patterns we selected two highly conserved regions, 
he first one 

s located in the N-terminat section and contains one of the six 
cysteines 

which are conserved in most, if not all, of these chitinases and 
which is 

probably involved in a disulfide bond. 
Description of pattern (s) and/or profile(s) 

Consensus pattern C-x(4,5)-F-Y-[ST|-x(3)-[FYl-[UVMF]-x-A-x{3)- 
[YF]-x(2)-F- [GSA] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern [LIVMl-[GSAl-F-x-[STAGl(2)-[LIVMFY]-W- 
[FY]-W-[LIVM] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Expert(s) to contact by email 

Neuhaus J.-M. jean-marc. neuhaus@bota.unine.ch 

Henrissat B. bernie@afmb.cnrs-mrs.fr 

Last update 

November 1997 / Text revised. 

References 

[1] 

FlachJ., PiletP.-E., Jolles P. 
Experientia 48:701 -71 6(1 992). 

[2] 

Henrissat B. 

Biochem. J. 280:309-316(1991). 
http://www.expasy.ch/cgi-bin/lists7gtycosid.txt 


Glyco_hydro_3_C 


PDOC00621 


Glycosyi hydrolases 
family 3 active site 


it has been shown [1 ,2] that the following glycosyi hydrolases can 
be, on the 

basis of sequence similarities, classified into a single family: 

- Beta glucosidases (EC 3.2.1 .21) from the fungi Aspergillus 
wentit (A-3), 

Hansenuia anomala, Kluyveromyces fragilis, 
Saccharomycopsis fibuligera, 

(BGL1 and BGL2), Schizophyllum commune and Trichoderma 
reesei (BGL1). 

- Beta glucosidases from the bacteria Agrobacterium 
tumefaciens (Cbg1), 

Butyrivibrio fibrisolvens (bglA), Clostridium thermocellum 
(bgIB), 

Escherichia coli (bglX), Erwinia chrysanthemi (bgxA) and 
Ruminococcus 
aibus. 

- Alteromonas strain 0-7 beta-hexosaminidase A (EC 3.2.1 .52). 

- Bacillus subtilis hypothetical protein yzbA. 

- Escherichica coli hypothetical protein ycfO and HI 0959, the 
corresponding 

Haemophilus influenzae protein. 

One of the conserved regions in these enzymes is centered on 
a conserved 

aspartic acid residue which has been shown [31, in Aspergillus 
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wentii beta- 

giucosidase A3, to be implicated in the catalytic mechanism. We 
nave used this 

region as a signature pattern. ' 

Description of pattern (s) and/or profile(s) 

Consensus pattern [LIVM](2)-[KR]-x-[EQK]-x(4}-G-[LIVMFT|- 
[LIVTHLIVMF]- [ST]-D-x(2)-[SGADNI] [D is the active site residue] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Expert(s) to contact by email 
Henrissat B. bernie@afmb.cnrs-mrs.fr 

Last update 

November 1997 / Pattern and text revised. 

References 

[1] 

Henrissat B. 

Biochem. J. 280:309-316(1991). 
[2] 

Castle L.A., Smith K.D., Morris R.O. 
J. Bacteriol. 174:1478-1486(1992). 

[3] 

Bause E., Legler G. 

Biochim. Biophys. Acta 626:459-465(1980). 


Glyco_hydro_45 


PDOC00877 


Glycosyl hydrolases 
family 45 active site 


The microbial degradation of cellulose and xylans requires 
several types of 

enzymes such as endoglucanases (EC 3.2.1.4), 
ceilobiohydrolases (EC 3.2.1.91) 

(exoglucanases) f or xylanases (EC 3.2.1 .8) [1,2]. Fungi and 
bacteria produces 

a spectrum of ceilulolytic enzymes (cellulases) and xylanases 
which, on the 

basis of sequence similarities, can be classified into families. One 
of these 

families is known as the celiulase family K or as the glycosyl 
hydrolases 

family 45 [3,E1]. The enzymes which are currently known to 

belong to this 

family are listed below. 

- Endoglucanase 5 from Humicola insolens. 

- Endoglucanase 5 from Trichoderma reesei (eg!5). 

- Endoglucanase Kfrom Fusarium oxysporum. 

- Endoglucanase B from Pseudomonas fluorescens (celB). 

- Endoglucanase 1 from Ustilago maydis (egh). 

The best conserved regions in these enzymes is located in the 
N -terminal 

section. It contains an aspartic acid residue which has been 
shown [4] to act 

as a nucleophile in the catalytic mechanism. We use this region 

as a signature 

pattern. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [STA]-T-R-Y-[FYW]-D-x(5)-[CA] [The D is an 
active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Expert(s) to contact by email 
Henrissat B. bernie@afmb.cnrs-mrs.fr 
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_ast update 

viovember 1997 / Pattern and text revised. 
References 
1] 

3eguin P. 

\nnu. Rev. Microbiol. 44:219-248(1990). 
2| 

3ilkes N.R., Henrissat B., Kilburn D.G., Miller R.C. Jr., Warren 
R.A.J. 

Microbiol. Rev. 55:303-315(1991). 
[3] 

Henrissat B., Bairoch A. 
Biochem. J. 293:781-788(1993). 

[4] 

Davies GJ., Dodson G.G., Hubbard R.E., Tolley S.P., Dauter Z., 
Wilson K.S., Hjort C, Mikkelsen J.M., Rasmussen G., Schuelein 
M. 

Nature 365:362-364(1993). 
[E1] 

http://www.expasy.ch/cgi-bin/lists7glycosid.txt 


Glyco_hydro_47 




Glycosyl hydrolase family 
47 


Members of this family are alpha-mannosidases that catalyse the 
hydrolysis of the terminal 1 ,2-linked alpha-D-mannose residues in 
the oiigo-mannose oligosaccharide Man(9)(GlcNAc){2). These 
enzymes are capable of taking part in the glycosylation pathway 
and glycoprotein processing. 


GTP cyclohydrol 


PDOC00672 


GTP cyclohydrolase 1 
signatures 


GTP cyclohydrolase I (EC 3.5.4.16) catalyzes the biosynthesis of 
formic acid 

and dihydroneopterin triphosphate from GTP. This reaction is the 
first step in 

the biosynthesis of tetrahydrofolate in prokaryotes, of 
tetrahydrobiopterin in 

vertebrates, and of pteridine-containing pigments in insects. 

GTP cyclohydrolase i is a protein of from 190 to 250 amino acid 
residues. The 

comparison of the sequence of the enzyme from bacteriaS and 
eukaryotic sources 

shows that the structure of this enzyme has been extremely 
well conserved 
throughout evolution [1]. 

As signature patterns we selected two conserved regions. The 
first contains a 

perfectly conserved tetrapeptide which is part of the GTP-binding 
pocket [2], 

the second region also contains conserved residues involved in 
GTP-binding. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [DEN]-[LIVM](2)-x(2)-[KRNQ]-[DEN]-[LiVM]- 
x(3)-[ST]-x-C-E- H-H 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern [SA]-x-[RK|-x-Q-[LIVM]-Q-E-[RN]-[LI]-[TSN] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NuNb. 
Last update 

July 1999 / Patterns and text revised. 

References 

[1] 

Maier J., Witter K„ Guetlich M., Ziegler I., Werner T., Ninnemann 
H. 
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3iochem. Biophys. Res. Cornmun. 212:705-711(1995). 
2] 

^Jar H., Huber R., Meining W., Schmid C, Weinkauf S., Bacher A. 
Structure 3:459-466(1995). 


HCV_capsid 


f 


Hepatitis C virus capsid 
protein \ 
1 
f 
f 
i 

1 


Famiiy members include nucleocapsid proteins of the 
-HCV. This virus famiiy comprises a nnucieocapsid covered by a 
ipoprotein envelope. The envelope consists of two proteins: 
xotein M and glycoprotein E. The nucleocapsid is a complex of 
protein c and mRNA. Uses for these polypeptides include: 
mmunulogical epitopes for vaccines; or as mRNA chaperone 
Droteins to aid in processing or to prevent degradation. 

References describing examples of these capsid 
Dolypeptides include: Chen et al M Virology 188:102-1 13(1992); 
and Okamoto et al., J. Gen. Virol. 72:2697-2704(1991 


HD 




HD domain 


Accession number: PF01966 

Definition: HD domain 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Enright A 

Gathering cutoffs: -1 -1 

Trusted cutoffs: -0.50 -0.50 

Noise cutoffs: -2.50 -2.50 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 99085258 

Reference Title: The HD domain defines a new superfamily 
of metal-dependent 

Reference Title: phosphohydrolases. 

Reference Author: Aravind L, Koonin EV; 

Reference Location: Trends Biochem Sci 1998;23:469-472. 

Database Reference INTERPRO; IPR002819; 

Database reference: PFAMB; PB005654; 

Database reference: PFAMB; PB006725; 

Database reference: PFAMB; PB00961 7; 

Database reference: PFAMB; PB01 2663; 

Database reference: PFAMB; PB035384; 

Database reference: PFAMB; PB040597; 

Comment: HD domains are metal dependent 

phosphohydrolases. 

Number of members: 63 


HDV_ag 




Hepatitis delta virus delta 
antigen 


Accession number: PF01 51 7 

Definition: Hepatitis delta virus delta antigen 

Author: Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfam-B_808 (release 4.0) 

Gathering cutoffs: -8 -8 

Trusted cutoffs: 23.30 23.30 

Noise cutoffs: -40.50 -40.50 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 94065676 

Reference Title: Characterization of RNA-binding domains 

of hepatitis delta 

Reference Title: antigen. 

Reference Author: Poisson F, Roingeard P, Baillou A, 
Dubois F, Bonelfi F, 

Reference Author: Caiogero RA, Goudeau A; 
Reference Location: J Gen Virol 1 993;74:2473-2478. 
Reference Number: [2] 
Reference Medline: 98362586 

Reference Title: Structural basis of the oligomer ization of 
hepatitis delta 

Reference Title: antigen. 

Reference Author: Zuccola HJ, Rozzelle JE, Lemon SM, 
Erickson BW, Hogle JM; 

Reference Location: Structure 1 998;6:821 -830. 

Database Reference: SCOP; 1 a92; fa; [SCOP-USA][CATH- 

PDBSUM] 
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Database Reference INTERPRO; IPR002506; 

Database Reference PDB; 1a92 A; 12; 23; 

Database Reference PDB; 1a92 B; 12; 23; 

Database Reference PDB; 1a92 C; 12; 23; 

Database Reference PDB; 1a92 D; 12; 60; 

Database Reference PDB; 1a92 A; 47; 60; 

Database Reference PDB; 1a92 B; 47; 60; 

Database Reference PDB; 1a92 C; 47; 60; 

Comment: The hepatitis delta virus (HDV) encodes a 

single protein, the 

nnmmpnt' hf^nPititiQ Holts ?intinon ^f— inAf^ Tho fontrjil 
wui i ii iici ii. i itJjjduiio u ci let cif iiiytJf i (nurty^. 11 its uci hi cii 

region of this protein 

Comment: has been shown to bind RNA [1j. Several 
interactions are also 

Comment: mediated by a coiled-coil region at the N 
terminus of the protein [2]. 
Number of members: 1 45 


hemolysinCabind 


PDOC00293 


Hemolysin-type calcium- 
binding region signature 


Gram-negative bacteria produce a number of proteins which are 
secreted into 

the growth medium by a mechanism that does not require a 
cleaved N-terminal 

signal sequence. These proteins, while having different functions, 
seem [1 ] to 

share two properties: they bind calcium and they contain a 
variable number of 

tandem repeats consisting of a nine amino acid motif rich in 
glycine, aspartic 

acid and asparagine. It has been shown [2] that such a domain 
is involved in 

the binding of calcium ions in a parallel beta roll structure. The 
proteins 

which are currently known to belong to this category are: 

- Hemolysins from various species of bacteria. Bacterial 
hemolysins are 

exotoxins that attack blood cell membranes and cause cell 
rupture. The 

hemolysins which are known to contain such a domain are 
those from: E. coli 

(gene hlyA), A. pleuropneumoniae (geneappA), A. 
actinomycetemcomitans 

and P. haemolytica (leukotoxin) (gene IktA). 

- Cyclolysin from Bordetella pertussis (gene cyaA). A 
multifunctional protein 

which is both an adenylate cyclase and a hemolysin. 

- Extracellular zinc proteases: serralysin (EC 3.4.24.40) from 
Serratia, prtB 

and prtC from Erwinia chrysanthemi and aprA from 
Pseud omonas aeruginosa. 

- Nodulation protein nodO from Rhizobium leguminosarum. 

We derived a signature pattern from conserved positions in the 
sequence of the 
calcium-binding domain. 

Description of pattern (s) and/or profile(s) 

Consensus pattern D-x-[LI]-x(4)-G-x-D-x-[Li]-x-G-G-x(3}-D 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note this pattern is found once in nodO and the extracellular 
proteases but up to 5 times in some hemolysm/cyclolysms. 
Last update 

October 1 993 / Text revised. 

References 

[1] 

Economou A., Hamilton W.D.O., Johnston A.W.B., Downie J.A. 
EMBO J. 9:349-354(1990). 
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[2] 

Baumann U. ( Wu S., Flaherty K.M., McKay D.B. 
EMBO J. 12:3357-3364(1993). 


Heptosyitranf 




Heptosyltransferase 


Accession number: PF01 075 

Definition: Heptosyltransferase 

Author: Finn RD, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_839 (release 3.0) 

Gathering cutoffs: -40 -40 

Trusted cutoffs: -31 .80 -31 .80 

Noise cutoffs: -47.10 -47.10 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line- hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 981 1 2827 

Reference Title: Enzymatic synthesis of lipopolysaccharide in 
Escherichia 

Reference Title: coli. Purification and properties of 
heptosyltransferase I. 

Reference Author: Kadrmas JL, Raetz CR; 

Reference Location: J Biol Chem 1998;273:2799-2807. 

Database Reference INTERPRO; IPR002201; 

Database reference: PFAMB; PB021 100; 

Database reference: PFAMB; PB033445; 

Database reference: PFAMB; PB041 423; 

Comment: Lipopolysaccharide is a major component of 

the outer leaflet of 

Comment: the outer membrane in Gram-negative 
Dacteria. It is composed of 

Comment: three domains; lipid A, Core oligosaccharide 
and the O-antigen. 

Comment: All of these enzymes transfer heptose to the 
lipopolysaccharide 
Comment: core. 
Number of members: 46 


Herpes_aIk_exo 




Herpesvirus alkaline 
exonuciease 


Accession number: PF01 771 

Definition: Herpesvirus alkaline exonuciease 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_822 (release 4.2) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 318.00 318.00 

Noise cutoffs: -277.60 -277.60 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 85107093 

Reference Title: Studies on the herpes simplex virus alkaline 
nuclease: 

Reference Title: detection of type-common and type-specific 

epitopes on the 

Reference Title: enzyme. 

Reference Author: Banks LM, Halliburton IW, Purifoy DJ, 

Killington RA, Powell 

Reference Author: KL; 

Reference Location: J Gen Virol 1 985;66:1 -1 4. 

Database Reference INTERPRO; IPR001616; 

Comment: This family includes various alkaline 

exonucl eases from 

Comment: members of the herpesviridae. Alkaline 
exonuciease 

Comment: appears to have an important role in the 
replication of 

Comment: herpes simplex virus [1]. 
Number of members: 23 


Herpes_gl 




Alphaherpesvirus 
glycoprotein 1 


Accession number: PF01688 

Definition: Alphaherpesvirus glycoprotein I 

Author: Bashton M } Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 222 (release 4.1 ) 

Gathering cutoffs: 25 25 
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Trusted cutoffs: 1 57.20 1 57.20 

Noise cutoffs: -126.70 -126.70 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number. [1] 

Reference Mediine: 96357074 

Reference Title: Biosynthesis of glycoproteins E and I of 
feline 

Reference Title: herpesvirus: gE-gi interaction is required for 
Reference Title: intracellular transport. 
Reference Author: Mijnes JD, van der Horst LM, van Anken 
E, Horzinek MC, 

Reference Author: Rottier PJ, de Groot RJ; 
Reference Location: J Virol 1 996;70:5466-5475. 
Reference Number: [2] 
Reference Medline: 94267406 

Reference Title: Identification of the feline herpesvirus type 1 
(FHV-1) 

Reference Title: genes encoding glycoproteins G, D, I and E: 
expression of 

Reference Title: FHV-1 glycoprotein D in vaccinia and 
raccoon poxviruses. 

Reference Author: Spatz SJ, Rota PA, Maes RK; 
Reference Location: J Gen Virol 1 994;75: 1 235-1 244. 
Reference Number: [3] 
Reference Medline: 94267879 

Reference Title: Unusual phosphorylation sequence in the 
gpIV (gi) component 

Reference Title: of the varicella-zoster virus gpl-gpIV 
glycoprotein complex 

Reference Title: (VZV gE-gi complex). 
Reference Author: Yao Z, Grose C; 
Reference Location: J Virol 1 994;68:4204-421 1 . 
Database Reference INTERPRO; IPR002874; 
Comment: This family consists of glycoprotein I form 
various members of the 

Comment: alphaherpesvirinae these include 
herpesvirus, varicella-zoster virus 

Comment: and pseudorabies virus. Glycoprotein I (gl) 
is important during natural 

Comment: infection, mutants lacking gi produce smaller 
lesions at the site of 

Comment: infection and show reduced neuronal spread 
[1]. gl forms a heterodimeric 

Comment: complex with gE; this complex displays Fc 
receptor activity (binds to 

Comment: the Fc region of immunoglobulin) [1]. 
Glycoproteins are also important 

Comment: in the production of virus-neutralizing 
antibodies and cell mediated 

Comment: immunity [2]. The alphaherpesvirinae have a 
dsDNA gnome and have no 

Comment: RNA stage during viral replication. 
Number of members: 22 


Herpes_glycop_D 




Herpesvirus glycoprotein 
M 


Accession number: PF01528 

Definition: Herpesvirus glycoprotein M 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_929 (release 4.0) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 97.30 1 97.30 

Noise cutoffs: -229.70 -229.70 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96357105 

Reference Title: Identification and characterization of 
pseudorabies virus 

Reference Title: glycoprotein gM as a nonessential virion 
component. 

Reference Author: Dijkstra JM, Visser N, Mettenleiter TC, 
Klupp BG; 

Reference Location: J Virol 1996;70:5684-5688. 
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Reference Number: [2] 
Reference Medline: 95381 61 1 

Reference Title: Identification and molecular characterization 
of the murine 

Reference Title: cytomegalovirus homolog of the human 
cytomegalovirus UL100 
Reference Title: gene. 

Reference Author: Li W, Eidman K, Gehrz RC, Kari B; 
Reference Location: Virus Res 1 995;36:1 63-1 75. 
Database Reference INTERPRO; IPR000785; 
Comment: The herpesvirus glycoprotein M (gM) is an 
ntegral membrane protein 

Comment: predicted to contain 8 transmembrane 
segments [2]. Glycoprotein M is 

Comment: not essential for viral replication [1]. 
Number of members: 24 


HesB-like 


PDOC00887 


Hypothetical 
hesB/yad R/yf hF fam i ly 
signature 


The following uncharacterized proteins have been shown [1] to 

share regions of 

similarities: 

- Anabaena and related cyanobacteria protein hesB which may 
be required for 

nitrogen fixation. 

- Escherichia coli hypothetical protein yadR and HI1723, the 
corresponding 

Haemophilus influenzae protein. 

- Escherichia coli hypothetical protein ydiC. 
-Escherichia coli hypothetical protein yfhF and HI0376, the 
corresponding 

Haemophilus influenzae protein. 

- Mycobacterium tuberculosis hypothetical protein Rv2204c. 

- Synechocystis strain PCC 6803 hypothetical protein sfrl 41 7. 

- Synechocystis strain PCC 6803 hypothetical protein slr1565, 
-A hypothetical protein in the nifU 5'region of many nitrogen 
fixing 

bacteria. 

- Porphyra purpurea chloroplast hypothetical protein in apcF- 
rps4 intergenic 

region. 

- Yeast hypothetical protein YLL027W. 

- Yeast hypothetical protein YPR067W. 

These are small proteins (106 to 135 amino-acid residues in 
bacteria, about 

200 residues in fungi) that contain a number of conserved 
regions. The most 

noteworthy of these regions is located in the C-terminal 
extremity, it 

contains two conserved cysteines. We have used this region 

as a signature 

pattern. 

Description of pattern (s) and/or profile(s) 

Consensus pattern F-x-[LIVMFY]-x-N-[PGHNSKQ]-x(4)-C-x-C- 
[GS]-x-S-F 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence (s) detected in SWISS-PROT NONE. 
Last update 

December 1999 / Pattern and text revised. 

References 

EH 

Bairoch A., Rudd K.E. 
Unpublished observations (1995). 


HisG 


PDOC01020 


ATP 

phosphoribosyltransferas 
e signature 


ATP phosphoribosyltransferase (EC 2.4.2.17) is the enzyme that 
catalyzes the 

first step in the biosynthesis of histidine in bacteria, fungi and 
plants. !t 

is a protein of about 23 to 32 Kd. As a signature pattern we 
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selected a region 

ocated in the C-termtnaf part of this enzyme. 
Description of pattern(s) and/or profiie(s) 

Consensus pattern E-x(5)-G-x-[SAG]-x(2)-[IV]-x-D-[UV|-x(2)-[ST|- 
G-x-T-[LM] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Last update 

July 1998/ First entry. 


histone 


PDOC00045 
PDOC00046 
PDOC00287 
PDOC00308 


Histone H2A signature; 
Histone H4 signature; 
Histone H3 signatures; 
Histone H2B signature 


Histone H2A is one of the four histones, along with H2B, H3 
and H4, which 

forms the eukaryotic nucfeosome core. Using alignments of 
histone H2A 

sequences [1 ,2,E1] we selected, as a signature pattern, a 
conserved region in 

the N-terminal part of H2A. This region is conserved both in 
classical S- 

phase regulated H2A's and in variant histone H2As which are 
synthesized 

throughout the cell cycle. 

Description of pattern(s) and/or profile(s) 
Consensus pattern [AC]-G-L-x-F-P-V 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 2. 
Last update 

November 1995 / Pattern and text revised. 

References 

[1] 

Weils D.E., Brown D. 

Nucleic Acids Res. 19:2173-2188(1991). 

[2] 

Thatcher T.H., Gorovsky M.A. 
Nucleic Acids Res. 22:174-179(1994). 

[E1] 

http://www.ncbi.nlm.nih.gov/Baxevani/HISTONES/index.html 

Histone H4 is one of the four histones, along with H2A, H2B 
and H3, which 

forms the eukaryotic nucieosome core. Along with H3, it plays a 
central role 

in nucieosome formation. The sequence of histone H4 has 
remained almost 

invariant in more then 2 billion years of evolution [1 ,E1]. The 
region we use 

as a signature pattern is a pentapeptide found in positions 1 4 to 
18 of all H4 

sequences. It contains a lysine residue which is often acetylated 
[2] and a 

histidine residue which is implicated in DNA-binding [3]. 

Description of pattern(s) and/or profiie(s) 
Consensus pattern G-A-K-R-H 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 1 . 
Last update 

November 1995 / Text revised. 
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References 
[1] 

Thatcher T.H., Gorovsky M.A. 
Nucleic Acids Res. 22:174-179(1994). 

[2] 

Doenecke D., Gallwitz D. 

Mol. Cell. Biochem. 44:113-128(1982). 

[3] 

Ebralidse K.K., Grachev S.A., Mirzabekov A.D. 
Nature 331 :365-367(1988). 

[E13 

http://www.ncbi.nlm.nih.gov/Baxevani/HISTONES/index.html 

Histone H3 is one of the four histones, along with H2A, H2B 
and H4, which 

forms the eukaryotic nucleosome core. It is a highly conserved 
protein of 135 

amino acid residues [1 ,2,E1]. 

The following proteins have been found to contain a C-terminal 
H3-like domain: 

-Mammalian centromeric protein CENP-A [3]. Coufd act as a 
core histone 
necessary for the assembly of centromeres. 

- Yeast chromatin-associated protein CSE4 [4]. 

- Caenorhabditis elegans chromosome Hi encodes two highly 
related proteins 

(F54C8.2 and F58A4.3) whose C-terminal section is 
evolutionary related to 

the last 1 00 residues of H3. The function of these proteins is 
not yet 

known. 

We developed two signature patterns, The first one corresponds 
to a perfectly 

conserved heptapeptide in the N-terminai part of H3. The second 
one is derived 

from a conserved region in the central section of H3. 

Description of pattern (s) and/or profile(s) 
Consensus pattern K-A-P-R-K-Q-L 

Sequences known to belong to this class detected by the pattern 
ALL, except for the H3-iike proteins and some protozoan H3. 
Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern P-F-x-[RA]-L-[VA3-[KRQ3-PEG]-EIV] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Patterns and text revised. 

References 

[1] 

Wells D.E., Brown D. 

Nucleic Acids Res. 19:2173-2188(1991). 

[2] 

Thatcher T.H., Gorovsky M.A. 
Nucleic Acids Res. 22:174-179(1994). 

[3] 

Sullivan K.F., Hechenberger M., Mash K. 
J. Cell Biol. 127:581-592(1994). 

[4] 

Stoier S., Keith K.C., Curnick K.E., Fitzqerald-Hayes M. 
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Genes Dev. 9:573-586(1995). 
[E1] 

http://www.ncbi.nim.nth.gov/Baxevani/HISTONES/index.html 

Histone H2B is one of the four histones, along with H2A, H3 
and H4, which 

forms the eukaryotic nucteosome core. Using alignments of 
histone H2B 

sequences [1,2,E1], we selected a conserved region in the C- 

terminai part of 

H2B. 

Description of pattem(s) and/or profile(s) 

Consensus pattern [KR]-E-[LIVM]-[EQ]-T-x(2)-[KR]-x-[LlVlvl](2)-x~ 

[PAG]-[DE]-L- x-[KR]-H-A-[LIVM]-[STA]-E-G 

Sequences known to belong to this class detected by the pattern 

ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1995 / Pattern and text revised. 

References 

[1] 

Wells D.E., Brown D. 

Nucleic Acids Res. 19:2173-2188(1991). 

[2] 

Thatcher T.H., Gorovsky M.A. 
Nucleic Acids Res. 22:174-179(1994). 

[E1] 

http://www.ncbi.nlm.nih.gov/Baxevani/HISTONES/index.html 


HMA 


PDOC00804 


Heavy-metai -associated 
domain 


A conserved domain of about 30 amino acid residues has been 
found [1] in a 

number of proteins that transport or detoxify heavy metats. 
This domain 

contains two conserved cysteines that could be involved in the 
binding of 

these metals. The domain has been termed Heavy-MetaS- 
Associated (HMA). It has 
been found in: 

- A variety of cation transport ATPases (E1-E2 ATPases) (see 
<PDOC00139>). 

The human copper ATPAses ATP7A and ATP7B which are 
respectively involved in 

Menke's and Wilson's diseases. ATP7A and ATP7B both 
contain 6 tandem copies 

of the HMA domain. The copper ATPases CCC2 from budding 
yeast, copA from 

Enterococcus faecal is and synA from Synechococcus contain 
one copy of the 

HMA domain. The cadmium ATPases cadA from Bacillus 
firmus and from plasmid 

p!258 from Staphylococcus aureus also contain a single HMA 
domain, while 

a chromosomal Staphylococcus aureus cad A contains two 
copies. Other, less 

characterized ATPases that contain the HMA domain are: fixl 
from Rhizobium 

meliloti, pacS from Synechococcus strain PCC7942), 
Mycobacterium leprae 

ctpA and ctpB and Escherichia coii hypothetical protein yhhO. 
In all these 

ATPases the HMA domain(s) are located in the N-terminal 
section. 

- Mercuric reductase (EC 1.16.1.1) (gene merA) which is 
generally encoded by 

| plasmids carried by mercury-resistant Gram-negative 
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bacteria. Mercuric 

reductase is a class-1 pyridine nucleotide-disulphide 
oxidoreductase (see 

<PDOC00073>). There is generally one HMA domain (with 
the exception of a 

chromosomal merA from Bacillus strain RC607 which has 
two) in the N- 

terrninai part of merA. 

- Mercuric transport protein periplasmic component (gene merP), 
also encoded 

by plasmids carried by mercury-resistant Gram-negative 
bacteria. It seems 

to be a mercury scavenger that specifically binds to one 
Hg(2+) ion and 

which passes it fo the mercuric reductase via the merT 
protein. The N- 

terminal half of merP is a HMA domain. 

- Helicobacter pylori copper-binding protein copP. 

-Yeast protein ATX1 [2], which could act in the transport 
and/or 
partitioning of copper. 

The consensus pattern for HMA spans the complete domain. 
Description of pattern (s) and/or profile(s) 

Consensus pattern [UVNS]-x(2HLiVMFA]-x-C-x-[STAGCDNH]-C- 
xOMLIVFG]- x^HLIVl-x^l 1)-[IVA]-x-[LVFYS] [The two C's 
probably bind metals] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 6. 
Last update 

December 1999 / Pattern and text revised. 

References 

[1] 

Bull P.C., Cox D.W. 

Trends Genet. 10:246-252(1994). 

[2] 

Lin S.-J., Culotta V.L 

Proc. Natl. Acad. Sci. U.S.A. 92:3784-3788(1995). 


HMG-CoA„red 


PDOC00064 


Hydroxymethylglutaryi- 
coenzyme A reductase 
signatures and profile 


Hydroxymethylglutaryl-coenzyme A reductase (EC 1.1.1 .34) 
(HMG-CoA reductase) 

[1 ,2] catalyzes the NADP-dependent synthesis of mevalonate 
from 3-hydroxy-3- 

methyiglutaryi-CoA. In vertebrates, HMG-CoA reductase is the 
rate-limiting 

enzyme in cholesterol biosynthesis. In plants, mevalonate is the 
precursor of 

all isoprenoid compounds. 

HMG-CoA reductase is a membrane bound enzyme. 
Structurally, it consists of 3 

domains. An N-terminal region that contains a variable number of 
transmembrane 

segments (7 in mammals, insects and fungi; 2 in plants), a linker 
region and a 

C-terminal catalytic domain of approximately 400 amino-acid 
residues. 

In archebacteria [3] HMG-CoA reductase, which is involved in the 
biosynthesis 

of the isoprenoids side chains of lipids, seems to be cytoplasmic 
and lack the 

N-terminal hydrophobic domain. 

Some bacteria, such as Pseud omonas mevalonii, can use 
mevalonate as the sole 

carbon source. These bacteria use an NAD-dependent 
HMG-CoA reductase 
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(EC 1.1 .1.88) to deacetylate mevalonate into 3-hydroxy-3- 
Tiethyigiutaryl-CoA 

3]. The Pseudomonas enzyme is structurally related to the 
catalytic domain 

of NADP-dependent HMG-CoA reductases. 

We selected three conserved regions as signature patterns 
for HMG-CoA 

reductases. The first is located in the center of the catalytic 
domain, the 

second is a glycine-rich region located in the C-terminal section 
of the same 

catalytic domain and the third is also located in the C-terminai 
section and 

contains an histidine residue that seems [4] to be implicated in the 
catalytic 

mechanism as a general base. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [RKH]-x(6)~D-x-M-G-x-N-x-[LIVMA] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 4. 

Consensus pattern [LlVM]-G-x-[LIVM]-G-G-[AG]-T 

Sequences known to belong to this class detected by the pattern 

ALL. 

Other sequence(s) detected in SWISS-PROT 5. 

Consensus pattern A-[LIVM]-x-[STAN]-x(2)-[Li]-x-[KRNQ]-[GSA]- 
H-[LM]-x- [FYLH] [H is an active site residue] 
Sequences known to belong to this class detected by the pattern 
ALL, except for archaebacterial HMG-CoA reductases. 
Other sequence(s) detected in SWISS-PROT NONE. 

Sequences known to belong to this class detected by the profile 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note this documentation entry is linked to both a signature pattern 
and a profile. As the profile is much more sensitive than the 
pattern, you should use it if you have access to the necessary 
software tools to do so. 
Last update 

November 1997 / Patterns and text revised; profile added. 

References 

[1] 

Caelles C, Ferrer A., Balcells L, Hegardt F.G., Boronat A. 
Plant Mol. Biol. 13:627-638(1989). 

[2] 

Basson M.E., Thorsness M., Finer-Moore J., Stroud R.M., Rine J. 
Mol. Cell. Biol. 8:3797-3808(1988). 

[33 

Lam W.L, Doolittle W.F. 

J. Biol. Chem. 267:5829-5834(1992). 

[4] 

Beach M J., Rodweli V.W. 

J. Bacterid. 171:2994-3001(1989). 

[5] 

Darnay B.G., Wang Y., Rodweli V.W. 
J. Biol. Chem. 267:15064-15070(1992). 


HMGL-like 


PDOC00813 
PDOC00643 


Hydroxymethylglutaryi- 
coenzyme A lyase active 
site; 

Alpha-isopropylmalate 
and homocitrate 


3-hydroxy-3-methy1glutaryl-coenzyme A lyase (HMG-CoA lyase or 
HL) (EC 4.1.3.4) 

catalyzes the transformation of HMG-CoA into acetyl-CoA and 
acetoacetate. In 

vertebrates it is a mitochondrial enyme which is involved in 
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synthases signatures 


ketogenesis and 

n leucine catabolism [1]. In some bacteria, such as 
3 seudomonas mevalonic 

t is involved in mevalonate catabolism (gene mvaB). A cysteine 
has been shown 

2], in mvaB, to be required for the activity of the enzyme. The 
region around 

this residue is perfectly conserved and is used as a signature 
pattern. 

Description of pattern (s) and/or profile(s) 

Consensus pattern S-V-A-G-L-G-G-C-P-Y [C is the active site 
residue] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1995 / First entry. 

References 

[1] 

Mitchell G.A., Robert M.-F., Hruz P.W., Wang S., Fontaine G., 
Behnke C.E., Mende-Mueiler LM., Schappert K., Lee C, Gibson 
K.M., Miziorko KM. 
J. Biol. Chem. 268:4376-4381(1993). 

[2] 

Hruz P.W., Narasimhan C, Miziorko H.M. 
Biochemistry 31 :6842-6847(1992). 

The following enzymes have been shown [1]to be functionally 
as well as 
evolutionary related: 

- Alpha-isopropyimalate synthase (EC 4.1.3.12) which catalyzes 
the first step 

in the biosynthesis of leucine, the condensation of acetyi-CoA 
and alpha- 

ketoisovalerate to form 2-isopropyimalate synthase. 

- Homocitrate synthase (EC 4.1 .3.21) (gene nifV) which is 
involved in the 

biosynthesis of the iron-moiybdenum cofactor of nitrogenase 
and catalyzes 

the condensation of acetyl-CoA and alpha-ketoglutarate into 
homocitrate. 

- Soybean late nodulin 56. 

- Methanococcus jannaschii hypothetical proteins MJ05Q3, 
MJ1195 and MJ1392. 

We have selected two conserved regions as signature 
patterns for these 

enzymes. The first region is located in the N-terminal section 
while the 

second region is located in the central section and contains two 
conserved 

histidine residues which could be implicated in the catalytic 
mechanism. 

Description of pattern (s) and/or profile(s) 

Consensus pattern L-R-[DE]-G-x-Q-x(10)-K 

Sequences known to belong to this class detected by the pattern 

ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern [LIVMFW]-x(2)-H-x-H-[DN]-D-x-G-x-[GAS]-x- 
[GASLI] 

Sequences known to belong to this class detected by the pattern 
ALL. 
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Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Patterns and text revised. 

References 

[1] 

Wang S.-Z., Dean D.R., Chen J.-S., Johnson J.L. 
J. Bacterid. 173:3041-3046(1991). 


hormones 


PDOC00237 


Neurohypophysial 
hormones signature 


Oxytocin (or ocytocin) and vasopressin [1] are small (nine 
amino acid 

residues), structurally and functionally related 
neurohypophysial peptide 

hormones. Oxytocin causes contraction of the smooth muscle of 
the uterus and 

of the mammary gland while vasopressin has a direct antidiuretic 
action on the 

kidney and also causes vasoconstriction of the peripheral 
vessels. Like 

the majority of active peptides, both hormones are synthesized 
as larger 

protein precursors that are enzymatically converted to their 
mature forms. 

Peptides belonging to this family are also found in birds, fish, 
reptiles and 

amphibians (mesotocin, isotocin, valitocin, giumitocin, 
aspargtocin f 

vasotocin, seritocin, asvatocin, phasvatocin), in worms 
(annetocin), octopi 

(cephalotocin), locust (locupressin or neuropeptide F1/F2) and 
in molluscs 

(conopressins G and S) [2]. 

The pattern developed to detect this category of peptides spans 
their entire 

sequence and includes four invariant amino acid residues. 

Description of pattern (s) and/or profile(s) 

Consensus pattern C-[LIFY](2)-x-N-[CS]-P-x-G [The two Cs are 
linked by a disulfide bond]. 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1995 / Pattern and text revised. 

References 

[13 

Acher R., Chauvet J. 
Biochimie 70:1197-1207(1988). 

[2] 

Chauvet J., Michel G., Ouedraogo Y., Chou J., Chait B.T., Acher 
R. 

Int. J. Pept Protein Res. 45:482-487(1995). 


HPPK 


PDOC00631 


7,8-dihydro-6- 
hydroxymethylpterin- 
pyrophosphokinase 
signature 


All organisms require reduced folate cofactors for the synthesis of 
a variety 

of metabolites. Most microorganisms must synthesize folate de 
novo because 

they lack the active transport system of higher vertebrate cells 
which allows 

these organisms to use dietary folates. Enzymes involved 
in folate 

biosynthesis are therefore targets for a variety of antimicrobial 
agents such 

as trimethoprim or sulfonamides. 

7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase (EC 
2.7.6.3) (HPPK) 

catalyzes the attachment of pyrophosphate to 6-hydroxymethyl- 
7 f 8-dihydropterin 

to form 6-hydroxymethyl-7,8-dihydropteridine pyrophosphate. 
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This is the first 

step in a three-step pathway leading to 7,8-dihydrofolate. 

Bacterial HPPK (gene folK or sulD) [1] is a protein of 1 60 to 
270 amino 

acids. In the lower eukaryote Pneumocystis carinii, HPPK is the 
central domain 

of a multifunctional folate synthesis enzyme (gene fas) [2]. 

As a signature for HPPK, we selected a conserved region located 
in the central 

section of these enzymes. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [KRHD]-x-[GA]-[PSAE]-R-x(2)-D-[UV]-D- 
[LIVM](2) 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

July 1999 / Pattern and text revised. 

References 

[1] 

Talarico T.L, Ray P.H., Dev I.K., Merrill B.M., Dallas W.S. 
J. Bacteriol. 174:5971-5977(1992). 

[2] 

Volpes F. f Dyer M., Scaife J.G., Darby G., Stammers D.K., Delves 
C.J. 

Gene 11 2:21 3-21 8(1992). 


HTH _AraC 


PDOC00040 


Bacterial regulatory 
proteins, araC family 
signature and profile 


The many bacterial transcription regulation proteins which bind 
DNA through a 

'helix-turn-helix' motif can be classified into subfamilies on the 
basis of 

sequence similarities. One of these subfamilies groups together 
the following 
proteins [1 ,2]: 

- aarP, a transcriptional activator of the 2'-N-acetyltransf erase 
gene in 

Providencia stuartii. 

- ada, an Escherichia coli and Salmonella typhimurium 
bifunctional protein 

that repairs alkylated guanine in DNA by transferring the alky I 
group at 

the 0(6) position to a cysteine residue in the enzyme. The 
methylated 

protein acts a positive regulator of its own synthesis and of the 
alkA, 

alkB and aidB genes. 

- adaA, a Bacillus subtil is bifunctional protein that acts both 
as a 

transcriptional activator of the ada operon and as a 
methyl phosphotriester- 
DNA alkyltransf erase. 

- adiY, an Escherichia coli protein of unknown function. 

- aggR, the transcriptional activator of aggregative adherence 
fimbria I 

expression in enteroaggregative Escherichia coli. 

- appY, a protein which acts as a transcriptional activator of 
acid 

phosphatase and other proteins during the deceleration phase 
of growth and 

acts as a repressor for other proteins that are synthesized in 
exponential 
growth or in the stationary phase. 

- araC, the arabinose operon regulatory protein, which 
activates the 

transcription of the araBAD genes. 
I - cafR, the Yersinia pestis F1 operon positive regulatory protein. 
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celD, the Escherichia coli eel operon repressor. 
cfaD, a protein which is required for the expression of the CFA/I 
adhesin 

of enterotoxigenic Escherichia coli. 
csvR, a transcriptional activator of fimbrial genes in 
enterotoxigenic 
Escherichia coli. 

envY, the porin thermoregulatory protein, which is involved in 
the control 

of the temperature-dependent expression of several 
Escherichia coli 

envelope proteins such as ompF, ompC, and lamB. 

exsA, an activator of exoenzyme S synthesis in Pseud omonas 
aeruginosa. 

fapR, the positive activator for the expression of the 987 P 
operon coding 

for the fimbrial protein in enterotoxigenic Escherichia coli. 
hrpB, a positive regulator of pathogenicity genes in 
Burkholderia 
solanacearum. 

invF, the Salmonella typhimurium invasion operon regulator. 

mar A, which may be a transcriptional activator of genes 
involved in the 

multiple antibiotic resistance (mar) phenotype. 

meIR, the melibiose operon regulatory protein, which 
activates the 

transcription of the melAB genes. 

- mixE, a Shigella flexneri protein necessary for secretion of ipa 
invasms. 

- mmsR, the transcriptional activator for the mmsAB operon in 
Pseudornonas 

aeruginosa. 

- msmR, the multiple sugar metabolism operon transcriptional 
activator in 

Streptococcus mutans. 

pchR, a Pseudornonas aeruginosa activator for pyochelin and 
ferripyochelin 
receptor. 

perA, a transcriptional activator of the eaeA gene for 
intimin in 
enteropathogenic Escherichia coli. 

- pocR, a Salmonella typhimurium regulator of the cobalamm 
biosynthesis 

operon. 

- pqrA, from Proteus vulgaris. 

- rafR, the regulator of the raffinose operon in Pediococcus 
pentosaceus. 

- ramA, from Klebsiella pneumoniae. 

- rhaR, the Escherichia coli and Salmonella typhimurium L- 
rhamnose operon 

transcriptional activator. 

rhaS, an Escherichia coli and Salmonella typhimurium positive 
activator of 
genes required for rhamnose utilization, 
rns, a protein which is required for the expression of the cs1 
and cs2 

adhesins of enterotoxigenic Escherichia coli. 

- rob, a protein which binds to the right arm of the replication 
origin oriC 

of the Escherichia coli chromosome. 

soxS, a protein that, with the soxR protein, controls a superoxide 
response 
regulon in Escherichia coli. 
tetD, a protein from transposon TN10. 
tcpN or toxT, the Vibrio chol erae transcriptional activator of 
the tcp 

operon involved in pilus biosynthesis and transport. 
thcR, a probable regulator of the the operon for the 
degradation of the 

thiocarbamate herbicide EPTC in Rhodococcus sp. strain 
NI86/21. 

- ureR, the transcriptional activator of the plasmid-encoded urease 
operon in 

Enterobacteriaceae. 
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- virF and IcrF, the Yersinia virulence regulon transcriptional 
activator. 

-virF, the Shigella transcriptional factor of invasion related 
intigens 
ipaBCD. 

- xylR, the Escherichia coli xylose operon regulator. 

-xylS, the transcriptional activator of the Pseudomonas putida 
POL plasmid 

(pWWO, pWW53 and pDK1) meta operon (xylDLEGF genes). 

- yfeG, an Escherichia coli hypothetical protein. 

- yhiW, an Escherichia coli hypothetical protein. 

- yhiX, an Escherichia coli hypothetical protein. 

- yidL, an Escherichia coli hypothetical protein. 

- yijO, an Escherichia coli hypothetical protein. 

- yuxC, a Bacillus subtil is hypothetical protein. 

- yzbC, a Bacillus subtilis hypothetical protein. 

Except for celD, all of these proteins seem to be positive 
transcriptional 

Factors. Their size range from 107 (soxS) to 529 (yzbC) residues. 

The helix-turn-helix motif is located in the third quarter of most 
of the 

sequences; the N-terminal and central regions of these proteins 
are presumed 

to interact with effector molecules and may be involved in 
dimerization [3]. 

The minimal DNA binding domain, which spans roughly 100 
residues and comprises 

the HTH motif contains another region with similarity to classical 
HTH domain. 

However, it contains an insertion of one residue in the turn- 
region 

A signature pattern was derived from the region that follows the 
first HTH 

domain and that includes the totality of the putative second HTH 
domain. A 

more sensitive detection of members of the araC family is 
available through 

the use of a profile which spans the minima! DNA-binding 

region of 1 00 

residues. 

Description of pattern (s) and/or profiie(s) 

Consensus pattern [KRQ]-[LIVMA]-x(2)-[GSTALIV]-{FYWPGDN}- 
x(2)-[LIVMSAl-x(4,9)-[UVMF]-x(2)-[LIVMSTA]-[GSTACIL]-x(3)- 
[GANQRF]- [LIVMFY]-x(4,5)-[LFY]-x(3)-[FYIVAl-{FYWHCM}-x{3)- 
[GSADENGKR]-x-[NSTAPKL]-[PARL] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 37. 

Sequences known to belong to this class detected by the profile 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note this documentation entry is linked to both a signature pattern 

and a profile. As the profile is much more sensitive than the 

pattern, you should use it if you have access to the necessary 

software tools to do so, 

Expert(s) to contact by email 

Ramos J.L iiramos@samba.cnb.uam.es 

Gallegos M.-T. mtrini@samba.cnb.uam.es 

Last update 

November 1997 / Text revised. 

References 

EH 

Galleqos M.-T. t Michan C, Ramos J.L. 
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Nucleic Acids Res. 21:807-810(1993). 
2] 

Henikoff S., Wallace J.C., Brown J.P. 
deth. Enzymol. 183:111-132(1990). 

3] 

3ustos S.A., Schleif R.F. 

3 roc. Natl. Acad. Sci. U.S.A. 90:5638-5642(1993). 


Hydrolase 




haloacid dehalogenase- 
ike hydrolase 


Accession number: PF00702 

Definition: haloacid dehalogenase-like hydrolase 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_566 (release 2.1 ) 

Gathering cutoffs: 7 7 

Trusted cutoffs: 7.10 7.10 

Noise cutoffs: 2.90 2.90 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96355356 

Reference Title: Crystal structure of L-2-ha!oacid 

dehalogenase from 

Reference Title: Pseudomonas sp. YL. An alpha/beta 
hydrolase structure that 

Reference Title: is different from the aipha/beta hydrolase 
fold. 

Reference Author: Hisano T, Hata Y, Fujii T, Liu JQ r Kurihara 
T, Esaki N, 

Reference Author: Soda K; 

Reference Location: J Biol Chem 1996;271 :20322-20330. 
Database Reference: SCOP; 1jud; sf; [SCOP-USA][CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR001 454; 

Database Reference PDB; 1jud ; 4; 197; 

Database Reference PDB; 1zrm ; 4; 197; 

Database Reference PDB; 1zrn ; 4; 197; 

Database Reference PDB; 1 aq6 A; 2; 1 93; 

Database Reference PDB; 1aq6 B; 2; 193; 

Database Reference PDB; 1 qq5 A; 2: 1 93; 

Database Reference PDB; 1qq5 B; 2; 193; 

Database Reference PDB; 1 qq6 A; 2; 1 93; 

Database Reference PDB; 1 qq6 B; 2; 1 93; 

Database Reference PDB; 1 qq7 A; 2; 193; 

Database Reference PDB; 1qq7 B; 2; 193; 

Database Reference PDB; 1cqz A; 4; 19; 

Database Reference PDB; 1cr6 A; 4; 19; 

Database Reference PDB; 1cqz B; 4; 206; 

Database Reference PDB; 1cr6 B; 4; 206; 

Database Reference PDB; 1cqz A; 48; 206; 

Database Reference PDB; 1cr6 A; 48; 206; 

Database reference: PFAMB; PB000701 ; 

Database reference: PFAMB; PB001048; 

Database reference: PFAMB; PB019234; 

Database reference: PFAMB; PB032787; 

Database reference: PFAMB; PB040985; 

Database reference: PFAMB; PB041061; 

Database reference: PFAMB; PB041 1 82; 

Database reference: PFAMB; PB041477; 

Database reference: PFAMB; PB041535; 

Database reference: PFAMB; PB041628; 

Database reference: PFAMB; PB041 677; 

Comment: This family are structurally different from the 

alpha/ 

Comment: beta hydrolase family (abhydrolase). 
Comment: This family includes L-2-haloacid 
dehalogenase, epoxide 

Comment: hydrolases and phosphatases. 
Comment: The structure of the family consists of two 
domains. One 

Comment: is an inserted four helix bundle, which is the 
least well 
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Comment: conserved region of the alignment, between 
residues 1 6 and 

Comment: 96 of Swiss: P24069. The rest of the fold is 
composed of the 

Comment: core alpha/beta domain. 
Number of members: 134 


HypB ^ UreG 




HypB/UreG nucleotide- 
binding domain 


Accession number: PF01495 

Definition: HypB/UreG nucleotide-binding domain 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B„428 (release 4.0} 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 97.70 1 97.70 

Noise cutoffs: -40.00 -40.00 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97285753 

Reference Title: The HypB protein from Bradyrhizobium 
japonicum can store 

Reference Title: nickel and is required for the nickel- 
dependent 

Reference Title: transcriptional regulation of hydrogenase. 
Reference Author: Olson JW, Fu C, Maier RJ; 
Reference Location: Mol Microbiol 1997;24:1 19-128. 
Reference Number: [2] 
Reference Medline: 97352660 

Reference Title: Characterization of UreG, identification of a 
Reference Title: UreD-UreF-UreG complex, and 
evidencesuggesting that a 

Reference Title: nucleotide-binding site in UreG is required 
for in vivo 

Reference Title: metal locenter assembly of Klebsiella 
aerogenes urease. 

Reference Author: Moncnef MB, Hausinger RP; 
Reference Location: J Bacterio! 1 997;1 79:4081 -4086. 
Reference Number: [3] 
Reference Medline: 931 39028 

Reference Title: The product of the hypB gene, which is 
required for nickel 

Reference Title: incorporation into hydrogenases, is a novel 
guanine 

Reference Title: nucleotide-binding protein. 
Reference Author: Maier T, Jacobi A, Sauter M, Bock A; 
Reference Location: J Bacteriol 1993;175:630-635. 
Reference Number: [4] 
Reference Medline: 9232501 6 

Reference Title: Klebsiella aerogenes urease gene cluster: 
sequence of ureD 

Reference Title: and demonstration that four accessory 
genes (ureD, ureE, 

Reference Title: ureF, and ureG) are involved in nickel 
metallocenter 

Reference Title: biosynthesis. 

Reference Author: Lee MH, Mulrooney SB, Renner MJ, 
Markowicz Y, Hausinger RP; 

Reference Location : J Bacteriol 1 992; 1 74:4324-4330 . 
Database Reference INTERPRO; IPR002894; 
Comment: This domain is found in HypB, a 
hydrogenase expression / formation 

Comment: protein, and UreG a urease accessory 
protein. Both these proteins contain 

Comment: a P-ioop nucleotide binding motif [2,3]. 
HypB has GTPase activity 

Comment: and is a guanine nucleotide binding protein 
[3]. It is not known 

Comment. whether UreG binds GTP or some other 
nucleotide. Both enzymes are involved 

Comment: in nickel binding. HypB can store nickel and 
is required for nickel 

Comment: dependent hydrogenase expression [1]. 
UreG is required for functional 

Comment: incorporation of the urease nickel 
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Tietallocenter.[4j GTP hydrolysis may 

Dornmenfc required by these proteins for nickel 

ncorporation into other nickel 

Comment: proteins [1]. 

Number of members: 41 


IBB 


( 


mportin beta binding t 
domain 

* 


Accession number: PF01 749 

Definition: Importin beta binding domain 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pfam-B_544 (reiease 4.2) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 67.30 67.30 

Noise cutoffs: -1 5.90 -1 5.90 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcaiibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 983591 19 

Reference Title: Crystallographic analysis of the recognition 
of a nuclear 

Reference Title: localization signal by the nuclear import 
factor 

Reference Title: karyopherin alpha. 

Reference Author: Conti E, Uy M, Leighton L, Blobel G, 

Kuriyan J; 

Reference Location: Cell 1 998;94:1 93-204. 
Reference Number: [2] 
Reference Medline: 98275030 

Reference Title: Importins and exportins: how to get in and 
out of the 

Reference Title: nucleus [published erratum appears in 
Trends Biochem Sci 

Reference Title: 1 998 Jul; 23 (7): 235] 
Reference Author: Weis K; 

Reference Location: Trends Biochem Sci 1 998;23:1 85-1 89. 
Reference Number: [3] 
Reference Medline: 98250643 

Reference Title. Transport into and out of the cell nucleus. 

Reference Author: Gorlich D; 

Reference Location: EMBO J 1998;1 7:2721 -2727. 

Reference Number: [4] 

Reference Medline: 96270582 

Reference Title: The binding site of karyopherin alpha for 
karyopherin beta 

Reference Title: overlaps with a nuclear localization 
sequence. 

Reference Author: Moroianu J, Blobel G, Radu A; 
Reference Location: Proc Natl Acad Sci U S A 1 996;93:6572- 
6576. 

Reference Number: [5] 
Reference Medline: 962031 01 

Reference Title: A 41 amino acid motif in importin-alpha 
confers binding to 

Reference Title: importin- beta and hence transit into the 
nucleus. 

Reference Author: Gorlich D, Henklein P, Laskey RA, 
Hartmann E; 

Reference Location: EMBO J 1996;15:1810-1817. 
Database Reference: SCOP; 1 bk5; fa; [SCOP-USA][CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR002652; 

Database Reference PDB; 1eji I; 72; 99; 

Database Reference PDB; 1 ejy I; 72; 99; 

Database Reference PDB; 1 ial A; 44; 99; 

Database Reference PDB; 1 qgr B; 28; 51 ; 

Database Reference PDB; 1 qgk B; 1 1 ; 54; 

Database Reference PDB; 1 ee5 A; 90; 1 1 0; 

Database Reference PDB; 1 bk5 A; 89; 1 1 0; 

Database Reference PDB; 1 bk5 B; 89; 1 1 0; 

Database Reference PDB; 1 bk6 A; 89; 1 1 0; 

Database Reference PDB; 1 bk6 B; 89; 1 1 0; 

Database Reference PDB; 1 ee4 A; 87; 1 1 0; 

Database Reference PDB; 1ee4 B; 87; 110; 

Comment: This family consists of the importin alpha 
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karyopherin alpha), 

Comment. importin beta (karyopherin beta) binding 
iomain. The domain mediates 

Comment: formation of the importin alpha beta 
complex; required for classical 

Comment: NLS import of proteins into the nucleus, 
hrough the nuclear pore 

Comment: complex and across the nuclear envelope. 
Comment: Also in the alignment is the NLS of importin 
alpha which overlaps 

Comment: with the IBB domain [4]. 
NJumber of members: 38 


IF-2B 




initiation factor 2 subunit 
family 


Accession number: PF01008 

Definition: Initiation factor 2 subunit family 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1302 (release 3.0} 

Gathering cutoffs: -135 -135 

Trusted cutoffs: -82.40 -82.40 

Noise cutoffs: -157.30 -1 57.30 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Mediine: 981 88271 

Reference Title: Archaeal translation initiation revisited: the 
initiation 

Reference Title: factor 2 and eukaryotic initiation factor 2B 

Reference Title: alpha-beta-delta subunit families. 

Reference Author: Kyrpides NC, Woese CR; 

Reference Location: Proc Natl Acad Sci U S A 1998;95:3726- 

3730. 

Database Reference INTERPRO; IPR000649; 
Comment: This family includes initiation factor 2B 
alpha, beta and delta 

Comment: subunits from eukaryotes, inrtiation factor 2B 
subunits 1 and 2 

Comment: from archaebacteria and some proteins of 
unknown function from 

Comment: prokaryotes. initiation factor 2 binds to Met- 
tRNA, GTP and the 

Comment: small nbosomal subunit. 
Number of members: 33 


IF3 


PDOC00723 


initiation factor 3 
signature 


Initiation factor 3 (IF-3) (gene infC) [1] is one of the three 
factors 

required for the initiation of protein biosynthesis in bacteria. IF- 
3 is 

thought to function as a fidelity factor during the assembly of the 
ternary 

initiation complex which consist of the 30S ribosomal subunit, 
the initiator 

tRNA and the messenger RNA. IF-3 binds to the 30S ribosomal 
subunit; it is a 

basic protein of 141 to 212 residues. 

The chioropiast initiation factor IF-3(chl) is a protein that 
enhances the 

poly(A,U,G)-dependent binding of the initiator tRNA to 
chioropiast ribosomal 

30s subunits. In its mature form it is a protein of about 400 
residues whose 

central section is evolutionary related to the sequence of bacterial 
IF-3 [2]. 

As a signature pattern we selected a highly conserved region 
located in the 

central section of bacterial IF-3 and of lF-3(cht). 

Description of pattem(s) and/or profile(s) 

Consensus pattern [KRHLIVM1(2WDN1-[FYMGSNHKR1- 
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[LIVMFYS]-x-[FY]- [DEQTH]-x(2)-[KRQ] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SW1SS-PROT NONE. 
Last update 

July 1999 / Pattern and text revised. 
References 
1] 

_iveris D., Schwartz J.J., Geertman R., Schwartz I. 
FEMS Microbiol. Lett. 112:211-216(1993). 

[2] 

Lin Q. ; Ma L, Burkhart W. 5 Spremulli L.L. 
J.Biol Chem. 269:9436-9444(1994). 


IF4E 


PDOC00641 


Eukaryotic initiation factor 
4E signature 


Eukaryotic translation initiation factor 4E (el F-4E) [1] is a protein 
that 

binds to the cap structure of eukaryotic cellular mRNAs. elF-4E 
recognizes and 

binds the 7-methyiguanosine-containing (m7Gppp) cap during 
an early step in 

the initiation of protein synthesis and facilitates ribosome binding 
to a mRNA 

by inducing the unwinding of its secondary structures. 

eiF-4E is a conserved protein of about 25 Kd. Site directed 
mutagenesis 

experiments have shown [2] that a tryptophan in the central 
part of the 

sequence of human elF-4E seems to be implicated in cap-binding. 
The signature 

pattern for e!F-4E includes this tryptophan. 
Description of pattern (s) and/or profile(s) 

Consensus pattern [DE]-[IFY]-x(2)-F-[KR]-x(2)-[LIVMl-x-P-x-W-E- 
[DVA]-x(5)-G- G-[KR]-W [The first W seems to be involved in cap- 
binding] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

July 1999 / Pattern and text revised. 

References 

[1] 

Thach R.E. 

Cell 68:177-180(1992). 
[2] 

Ueda H. ? lyo H., Doi M., Inoue M„ IshidaT., Morioka H., Tanaka 
T., Nishikawa S., Uesugi S. 
FEBS Lett. 280:207-210(1991). 


iF5_elF4 elF2 




elF4-gamma/elF5/e!F2- 
epsilon 


Accession number: PF02020 

Definition: elF4-gamma/eiF5/elF2-epsilon 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: [1 ] 

Gathering cutoffs: 25 25 

Trusted cutoffs: 26.1 0 26.10 

Noise cutoffs: -21 .50 -21 .50 

HMM build command line: hmmbuiid HMM SEED 

HMM build command line: hmmcaiibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96060092 

Reference Title: Multidomain organization of eukaryotic 
guanine nucleotide 

R e f erenC e Title: exchange translation initiation factor elF-2B 
subunits 

Reference Title: revealed by analysis of conserved 
sequence motifs. 

Reference Author: Koonin EV; 
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Reference Location: Protein Sci 1 995;4:1 608-1 61 7. 
Comment. This domain of unknown function is found at 
he C-terminus 

Comment: of several transcription initiation factors [1]. 
siumber of members: 31 




3 DOC00262 1 
r 
( 
< 


mmunoglobulins and 
major histocompatibility 1 
complex proteins I 
signature 

t 
( 

( 

1 


l"he basic structure of immunoglobulin (Ig) [1] molecules is a 
etramer of two 

ight chains and two heavy chains linked by disulfide bonds, 
rhere are two 

ypes of light chains: kappa and lambda, each composed of a 
constant domain 

CL) and a variable domain (VL). There are five types of heavy 
chains: alpha, 

delta, epsilon, gamma and mu, all consisting of a variable 
domain (VH) and 

three (in alpha, delta and gamma) or four (in epsilon and mu) 
constant 

domains (CM to CH4). 

The major histocompatibility complex (MHC) molecules are 
made of two chains. 

In class I [2] the alpha chain is composed of three extracellular 
domains, a 

transmembrane region and a cytoplasmic tail. The beta 
chain (beta-2- 

microglobulin) is composed of a single extracellular domain. In 
class II [3], 

both the alpha and the beta chains are composed of two 
extracellular domains, 

a transmembrane region and a cytoplasmic tail. 

It is known [4,5] that the tg constant chain domains and a 
single 

extracellular domain in each type of MHC chains are related. 
These 

homologous domains are approximately one hundred amino 
acids long and 

include a conserved intradomain disulfide bond. We developed a 
small pattern 

around the C-terminal cysteine involved in this disulfide bond 
which can be 

used to detect these category of Ig related proteins. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [FY]-x-C-x-[VA]-x-H-Sequences known to 
belong to this class detected by the pattern: lg heavy chains type 
Alpha C region : All, in CH2 and CH3. Ig heavy chains type Delta 
C region : All, in CH3. Ig heavy chains type Epsilon C region: All, 
in CH1 , CH3 and CH4. Ig heavy chains type Gamma C region : 
All, in CH3 and also CH1 in some cases Ig heavy chains type Mu 
C region : All, in CH2, CH3 and CH4. Ig light chains type Kappa C 
region : in all CL except rabbit and Xenopus. Ig light chains type 
Lambda C region : In ail CL except rabbit. MHC class I alpha 
chains : All, in alpha-3 domains, including in the cytomegalovirus 
MHC-1 homologous protein [6]. Beta-2-microglobuiin : All. MHC 
class II alpha chains: All, in alpha-2 domains. MHC class II beta 
chains: All, in beta-2 domains. 
Other sequence(s) detected in SWISS-PROT 71 . 
Last update 

May 1991 / Text revised. 

References 

[1] 

Gough N. 

t renos tJiocnern. ooi, o.tuo-^suj^ i so i ). 
[2] 

Klein J., Figueroa F. 
Immunol. Today 7:41 -44(1986). 

[31 . 
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r igueroa F., Klein J. 

mmunol. Today 7:78-81 (1986). 

4] 

Orr H.T., Lancet D., Robb R.J., Lopez de Castro J.A., Stromtnger 
J.L 

Mature 282:266-270(1979). 
[5] 

Cushley W. ? Owen M.J. 
mmunol. Today 4:88-92(1983). 

[6] 

Beck S., Barrel B G. 
Nature 331 : 269-272 (1988). 


!MPDH_C 


PDOC00391 


IMP dehydrogenase/ 
GMP reductase 
signature 


IMP dehydrogenase (EC 1 .1 .1 .205) (IMPDH) catalyzes the rate- 
limiting reaction 

of de novo GTP biosynthesis, the NAD-dependent reduction of 
IMP into XMP [1]. 

Inhibition of IMP dehydrogenase activity results in the 
cessation of DNA 

synthesis. As IMP dehydrogenase is associated with cell 
proliferation, it is a 

possible target for cancer chemotherapy. Mammalian and 
bacterial IMPDHs are 

tetramers of identical chains. There are two IMP 
dehydrogenase isozymes in 
humans [2]. 

GMP reductase (EC 1 .6.6.8) catalyzes the irreversible and 
NADPH-dependent 

reductive deamination of GMP into IMP [3]. It converts 
nucleobase, nucleoside 

and nucleotide derivatives of G to A nucleotides, and maintains 
intracellular 

balance of A and G nucleotides. 

IMP dehydrogenase and GMP reductase share many regions of 
sequence similarity. 

One of these regions is centered on a cysteine residue 
thought [3] to be 

involved in binding IMP. We have used this region as a signature 
pattern. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [LIVM]-[RK]-[LIVM]-G-[LlVM3-G-x-G-S-[LiVM]- 

C-x-T [C is the putative IMP-binding residue] 

Sequences known to belong to this class detected by the pattern 

ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Last update 

May 1 991 / First entry. 

References 

[1] 

Collart F.R., Huberman E. 

J. Biol. Chem. 263:15769-15772(1988). 

[2] 

Natsumeda Y. } Oh no S., Kawasaki H., Konno Y., Weber G., 
Suzuki K. 

J. Biol. Chem. 265:5292-5295(1990). 
[31 

Andrews S.C., Guest J. R. 
Biochem. J. 255:35-43(1988). 


1nos-1-P_synth 




Myo-inositoi-1 -phosphate 
synthase 


Accession number: PF01658 
Definition: Myo-inositol-1 -phosphate synthase 
Author: Bashton M, Bateman A 
Alignment method of seed: Clustalw 
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PP isomerase 



Source of seed members: Pfam-B_959 (release 4.1 ) 
Gathering cutoffs: 25 25 
Trusted cutoffs: 86.80 86.80 
Noise cutoffs: -21 9.00 -21 9.00 

HMM build command line: hmmbuiid -F HMM SEED 
HMM build command line: hmmcalibrate -seed 0 HMM 
[1] 

95066381 

Comparison of 1N01 gene sequences and 



Reference Number: 
Reference Medline: 
Reference Title: 
products in Candida 
Reference Title: 
Reference Author: 
C; 

Reference Location: 
Database Reference 
Comment: 
synthases. 

Comment: InositoM -phosphate catalyses the 

conversion of glucose-6- 

Comment: phosphate to inositoi-1 -phosphate, which is 

then dephosphorylated 

Comment: to inositol [1]. Inositol phosphates play an 

important role in 

Comment: signal transduction. 

Number of members: 27 



albicans and Saccharomyces cerevisiae. 
Klig LS, Zobel PA, Devry CG, Losberger 

Yeast 1994;10:789-800. 
INTERPRO; IPR002587; 
This is a family of myo-inositol-1 -phosphate 



Isopentenyl-diphosphate 
deita-isom erase 



Accession number: PF01772 

Definition: Isopentenyl-diphosphate deita-isom erase 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 
Source of seed members: Pfam-B_J 099 (release 4.2) 
Gathering cutoffs: -88 -88 
Trusted cutoffs: -66.70 -66.70 
Noise cutoffs: -1 06.90 -1 06.90 

HMM build command line: hmmbuiid -F HMM SEED 
HMM build command line: hmmcalibrate -seed 0 HMM 



K-box 



[1] 

98409684 

Differential expression of two isopentenyl 

isomerases and enhanced carotenoid 

unicellular chiorophyte 
Sun Z, Cunningham FX Jr, Gantt E; 
Proc Natl Acad Sci USA 

[23 

97373600 
Cloning and subcellular localization of 

isopentenyl diphosphate dimethylaliyl 



PDOC00302 



Reference Number: 
Reference Medline: 
Reference Title: 
pyrophosphate 
Reference Title: 
accumulation in a 
Reference Title: 
Reference Author: 
Reference Location: 
1998;95:11482-11488 
Reference Number: 
Reference Medline: 
Reference Title: 
hamster and rat 
Reference Title: 
diphosphate 

Reference Title: isomerase. A PTS1 motif targets the 
enzyme to peroxisomes. 

Reference Author: Paton VG, Shackelford JE, Krisans SK; 
Reference Location: J Biol Chem 1 997;272:1 8945-1 8950. 
Database Reference INTERPRO; IPR002667; 
Comment: Isopentenyl-diphosphate delta-isomerase or 

IPP isomerase EC:5.3.3.2 

Comment: catalyses the interconversion of isopentenyl 

diphosphate and 

Comment: dimethylaliyl diphosphate. Dimethylaliyl 

phosphate is the initial substrate 

Comment: for the biosynthesis of carotenoids and other 

long chain isoprenoids [1]. 
Number of members: 24 



MADS-box domain 
signature and profile 



A number of transcription factors contain a conserved domain of 
56 amino-acid 

residues, sometimes known as the MADS-box domain [E1]. They 
are listed below: 



-Serum response factor (SRF) [1], a mammalian transcription 
factor that 
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1 


binds to the Serum Response Element (SRE). This is a short 
sequence of dyad 

symmetry located 300 bp to the 5' end of the transcription 
nittation site 

of genes such as c-fos. 

- Mammalian myocyte-specific enhancer factors 2A to 2D 
(MEF2A to MEF2D). 

These proteins are transcription factor which binds specifically 
o the 

MEF2 element present in the regulatory regions of many 
muscle-specific 
genes. 

- Drosophila myocyte-specific enhancer factor 2 (MEF2). 
-Yeast GRM/PRTF protein (gene MCM1) [2], a transcriptional 
regulator of 

mating-type-specific genes. 

- Yeast arginine metabolism regulation protein I (gene ARGR1 or 
ARG80). 

- Yeast transcription factor RLM1 . 

- Yeast transcription factor SMP1 . 

- Arabidopsis thaiiana agamous protein (AG) [3], a probable 
transcription 

factor involved in regulating genes that determines stamen 
and carpel 

development in wild-type flowers. Mutations in the AG gene 
result in the 

replacement of the stamens by petals and the carpels by a new 
flower. 

-Arabidopsis thaiiana homeotic proteins Apetalal (AP1), 
Apeta!a3 (AP3) and 

Pistillate (PI) which act locally to specify the identity of the 
floral 

meristem and to determine sepal and petal development [4]. 

- Antirrhinum majus and tobacco homeotic protein deficiens 
(DEFA) and globosa 

(GLO) [5]. Both proteins are transcription factors involved in the 
genetic 

control of flower development. Mutations in DEFA or GLO 
cause the 

transformation of petals into sepals and of stamina into carpels. 

- Arabidopsis thaiiana putative transcription factors AGL1 to 
AGL6 [6]. 

- Antirrhinum majus morphogenetic protein DEF H33 (squamosa). 

In SRF, the conserved domain has been shown [1] to be involved 
in DNA-binding 

and dimerization. We have derived a pattern that spans the 
complete length of 

the domain. The profile also spans the length of the MADS-box. 

Description of pattern (s) and/or profile(s) 

Consensus pattern R-x-[RKl-x(5)-l-x-[DNGSK]-x(3)-[KR]-x(2)-T- 

[FY]-x-[RK](3)-x(2)-[UVM]-x-K(2)-A-x-E-[UVWI]-[STA]-x-L-x(4)- 

[LIVM]-x- [LIVM](3)-x(6)-[LIVMF]-x(2)-[FY] 

Sequences known to belong to this class detected by the pattern 

ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Sequences known to belong to this class detected by the profile 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note this documentation entry is linked to both signature patterns 
and a profile. As the profile is much more sensitive than the 
patterns, you should use it if you have access to the necessary 
software tools to do so. 
Last update 

July 1999 / Pattern and text revised. 

References 

[1] 

Norman C, Runswick M., Pollock R., Treisman R. 
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Cell 55:989-1003(1988). 
[2] 

Passmore S., Maine G.T., Elble R., Christ C., Tye B.-K. 
J. Mol. Biol. 204:593-606(1988). 

[3] 

Yanofsky M. ? Ma H , Bowman J., Drews G., Feidmann K.A., 
Meyerowitz E.M. 
Nature 346:35-39(1990). 

[4] 

Goto K., Meyerowitz E.M. 
Genes Dev. 8:1548-1560(1994). 

[53 

Troebner W., Ramirez U Motte P., Hue I., Huijser P., Loennig W.- 
E., Saedier H. 3 Sommer H., Schwartz-Sommer Z. 
EMBO J. 1 1 : 4693-4704(1 992). 

[6] 

Ma H., Yanofsky M.F., Meyerowitz E.M. 
Genes Dev. 5:484-495(1991). 

[E13 

http://transfac.gbf-braunschweig.de/cgi-bin/qt/getEntry.pl7C0014 


Keratin_B2 




Keratin, high sulfur B2 
protein 


Accession number: PF01500 

Definition: Keratin, high sulfur B2 protein 

Author: Bateman A 

Alignment method of seed: Ciustatw 

Source of seed members: Pfam-B_706 (release 4.0) 

Gathering cutoffs: -17-17 

Trusted cutoffs: -1 .50 -1 .50 

Noise cutoffs: -46.00 18.50 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98201605 

Reference Title: Structure and hair follicle-specific 

expression of genes 

Reference Title: encoding the rat high sulfur protein B2 
family. 

Reference Author: Mitsui S, Ohuchi A, Adachi-Yamada T, 

Hotta M, Tsuboi R, 

Reference Author: Ogawa H; 

Reference Location: Gene 1998;208:123-129. 

Database Reference INTERPRO; IPR002494; 

Comment: High sulfur proteins are cysteine-rich 

proteins synthesized 

Comment: during the differentiation of hair matrix cells, 
and form hair 

Comment: fibers in association with hair keratin 
intermediate filaments [1]. 

Comment: This family has been divided up into four 
regions, with the second 

Comment: region containing 8 copies of a short repeat 
[1]. This family is 

Comment: also known as B2 or KAP1 . 
Number of members: 17 


ketoacyl-synt 


PDOC00529 


Beta-ketoacyi synthases 
active site 


Beta-ketoacyl-ACP synthase (EC 2.3.1.41) (KAS) [1] is the 
enzyme that 

catalyzes the condensation of malonyl-ACP with the growing 
fatty acid chain. 

it is found as a component of the following enzymatic systems' 

- Fatty acid synthetase (FAS), which catalyzes the formation of 
long-chain 

fatty acids from acetyl-CoA, malonyl-CoA and NADPH. 
Bacterial and plant 

chloroplast FAS are composed of eight separate subunits which 
correspond to 

different enzymatic activities; beta-ketoacyl synthase is one 
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polypeptides. Fungal FAS consists of two multifunctional 
proteins, FAS1 and 

FAS2;the beta-ketoacyl synthase domain is located in the 
^-terminal 

section of FAS2. Vertebrate FAS consists of a single 
nuitifunctional chain; 

the beta-ketoacyl synthase domain is located in the N-terminal 
section [2]. 

- The multifunctional 6-methysalicylic acid synthase (MSAS) from 
Penicillium 

patulum [3]. This is a multifunctional enzyme involved in the 
Diosynthesis 

of a polyketide antibiotic and which has a KAS domain in its 
N-terminal 
section. 

- Polyketide antibiotic synthase enzyme systems. Polyketides 
are secondary 

metabolites produced by microorganisms and plants from 
simple fatty acids. 

KAS is one of the components involved in the biosynthesis 
of the 

Streptomyces polyketide antibiotics granatacin [4], 
tetracenomycin C [5] 
and erythromycin. 

- Emericelia nidulans multifunctional protein Wa. Wa is 
involved in the 

biosynthesis of conidial green pigment Wa is protein of 21 6 
Kd that 
contains a KAS domain. 

- Rhizobium nodulation protein nodE, which probably acts as a 
beta-ketoacyl 

synthase in the synthesis of the nodulation Nod factor fatty acyl 
chain. 

- Yeast mitochondrial protein CEM1. 

The condensation reaction is a two step process: the acyl 
component of an 

activated acyl primer is transferred to a cysteine residue of the 
enzyme and 

is then condensed with an activated malonyl donor with the 
concomitant release 

of carbon dioxide. The sequence around the active site 
cysteine is well 

conserved and can be used as a signature pattern. 

Description of pattern (s) and/or profile{s) 

Consensus pattern G-x(4}-[LIVMFAP]-x(2)-[AGC]-C-[STA](2)- 
[STAG]-x(3)-[LlVMF] [C is the active site residue] 
Sequences known to belong to this class detected by the pattern 
ALL, except for bacterial and plant beta-ketoacyl synthase III 
(KAS III). 

Other sequence(s) detected in SWISS-PROT 10. 
Last update 

November 1997 / Text revised. 

References 

[1] 

Kauppinen S., Siggaard-Andersen M., von Wettstein-Knowles P. 
Carlsberg Res. Commun. 53:357-370(1988). 

[23 

Witkowski A., Rangan V.S., Randhawa Z.I., Amy CM , Smith S. 
Eur. J. Biochem. 198:571-579(1991). 

[3] 

Beck J., Ripka S., Siegner A., Schiitz E., Schweizer E. 
Eur. J. Biochem. 192:487-498(1990). 

[4] 

Bibb M.J., Biro S., Motamedi H., Collins J.F., Hutchinson C.R. 
EMBO J. 8:2727-2736(1989). 
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KRAB box 



lectin legB 



PDOC00278 



Legume lectins 
signatures 



Description 



[5] 

Sherman D.H., Malpartida F., Bibb M.J., Kieser H.M., Bibb M.J., 
Hopwood D.A. 

EMBO J. 8:2717-2725(1989}. 



Accession number PF01352 

Definition: KRAB box 

Author: Bateman A 

Alignment method of seed: Manual 

Source of seed members: Bateman A 

Gathering cutoffs: 0 0 

Trusted cutoffs: 1.10 1.10 

Noise cutoffs: -5.40 -5.40 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 



Reference Number: 
Reference Medline: 
Reference Title: 
upstream from the 
Reference Title: 
Reference Author: 
Martial JA; 
Reference Location: 
Reference Number: 
Reference Medline: 
Reference Title: 
KRIP-1, 

Reference Title: 
repressor domain 
Reference Title: 
Reference Author: 
Vidal M, Bonventre 
Reference Author: 
Reference Location: 
1996;93:15299-15304, 
Reference Number: 
Reference Medline: 
Reference Title: 
conserved KRAB 
Reference Title: 
Reference Author: 
Speicher DW, Huang 
Reference Author: 
Reference Location: 
Database Reference 
Database reference: 
Comment: 
box) is present in 
Comment: 



[1] 

91319563 

Conserved KRAB protein domain identified 

zinc finger region of Kox 8. 
Thiesen HJ, Beliefroid E, Revelant O, 

Nucleic Acids Res 1 991 ; 19:3996-3996. 
[2] 

97140325 

A novel member of the RING finger family, 

associates with the KRAB-A transcriptional 

of zinc finger proteins. 
Kim SS, Chen YM, O'Leary E, Witzgali R, 

JV; 

Proc Natl Acad Sci U S A 
[3] 

96365472 

KAP-1 , a novel corepressor for the highly 

repression domain. 
Friedman JR, Fredericks WJ, Jensen DE, 

XP, Neiison EG, Rauscher FJ: 
Genes Dev 1996;10:2067-2078. 
INTERPRO; 1PR001909; 
PFAMB; PB036541 ; 
The KRAB domain (or Kruppel-associated 



about a third of zinc finger proteins 



containing C2H2 fingers. 



Comment: 

protein-protein 

Comment: 

Comment: 

two exons. The 

Comment: 

as KRAB-A and 

Comment: 

Number of members 



The KRAB domain is found to be involved in 
interactions [2,3]. 

The KRAB domain is generally encoded by 

regions coded by the two exons are known 

KRAB-B. 
105 



Leguminous plants synthesize sugar-binding proteins which are 
called legume 

lectins [1,2]. These lectins are generally found in the seeds. 
The exact 

function of legume lectins is not known but they may be 
involved in the 

attachment of nitrogen-fixing bacteria to legumes and in the 
protection 

against pathogens. Legume lectins bind calcium and 
manganese (or other 
transition metals). 

Legume lectins are synthesized as precursor proteins of about 



Attorney No. 2750-1237P 



933 



Flam 



Prosfte 



Full Name 



igase-CoA 



CoA-ligases 



230 to 260 amino 

acid residues. Some legume lectins are proteolytically 
processed to produce 

two chains: beta (which corresponds to the N-terminal) and alpha 
(C-terminal}. 

The lectin concanavalin A (con A) from jack bean is exceptional in 
that the two 

chains are transposed and ligated (by formation of a new peptide 
bond). The 

N-terminus of mature conA thus corresponds to that of the alpha 
chain and the 

C-terminus to the beta chain. 

We have developed two signature patterns specific to legume 
lectins: the first 

is located in the C-terminal section of the beta chain and 
contains a 

conserved aspartic acid residue important for the binding of 
calcium and 

manganese; the second one is located in the N-terminal of the 
alpha chain. 



Description of pattern(s) and/or profile(s) 

Consensus pattern [LIV1-[STAG]-V-[DEQV]-[FU]-D-[STI [D binds 
manganese and calcium] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 21 . 

Consensus pattern [LIV]-x-[EDQ]-[FYWKR)-V-x-[LtVF]-G-[LF]-[STl 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 4. 
Last update 

July 1999 / Patterns and text revised. 

References 

[1] 

Sharon N., Lis H. 

FASEB J. 4:3198-320(1990). 

[2] 

Lis H, Sharon N. 

Annu. Rev. Biochem. 55:33-37(1986). 



Accession number: PF00549 

Definition: CoA-ligases 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: SCOP 

Gathering cutoffs: 25 25 

Trusted cutoffs: 28.70 28.70 

Noise cutoffs: 1 4.70 1 4.70 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 



Reference Number: 
Reference Medline: 
Reference Title: 
synthetase from 
Reference Title: 
Reference Author: 
Bridger WA; 
Reference Location: 
Database Reference: 
PDBSUM] 

Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 



[1] 

94193797 
The crystal structure of succinyl-CoA 

Escherichia coli at 2.5-A resolution. 
Wolodko WT, Fraser ME, James MN, 

J Biol Chem 1994;269:10883-10890. 
SCOP; 1scu; sf; [SCOP-USA][CATH- 

INTERPRO; IPR000303; 

PDB; 1 cqi A; 132; 279; 

PDB;1cqi D; 132; 279; 

PDB; 1cqj A; 132; 279; 

PDB; 1 cqj D; 132; 279; 

PDB;2scuA; 132; 279; 

PDB; 2scu D; 132; 279; 
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Database Reference PDB; 1scu A; 132; 279; 
Database Reference PDB; 1scu D; 132; 279; 
Database Reference PDB; 1cqi B; 246; 385; 
Database Reference PDB; 1cqi E; 246; 385; 
Database Reference PDB; 1cqj B; 246; 385; 
Database Reference PDB; 1cqj E; 246; 385; 
Database Reference PDB; 2scu B; 246; 385; 
Database Reference PDB; 2scu E; 246; 385; 
Database Reference PDB; 1scu B; 246; 388; 
Database Reference PDB; 1scu E; 246; 388; 
Database reference: PFAMB; PB039724; 
Database reference: PFAMB; PB041236; 
Comment: -!- This family includes the CoA ligases 
Succinyl-CoA synthetase alpha 

Comment: and beta chains, malate CoA ligase and 
ATP-citrate lyase. 

Comment: Some members of the family utilise ATP 

others use GTP. 

Number of members: 76 


LIM_bind 




LIM-domain binding 
protein 


Accession number: PF01803 

Definition: LIM-domain binding protein 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 352 (release 4.2) 

Gathering cutoffs: -92 -92 

Trusted cutoffs: 1 3.40 1 3.40 

Noise cutoffs: -1 97.90 -1 97.90 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline- 97477378 

Reference Title: Chip, a widely expressed chromosomal 
protein required for 

Reference Title: segmentation and activity of a remote wing 
margin enhancer 

Reference Title: in Drosophila. 

Reference Author: Morcilto P, Rosen C, Baylies MK, Dorsett 
D; 

Reference Location: Genes Dev 1 997;1 1 :2729-2740. 

Reference Number* [2] 

Reference Medline: 97336071 

Reference Title: A family of LIM domain-associated 

cofactors confer 

Reference Title: transcriptional synergism between LIM and 
Otx homeodomain 
Reference Title: proteins. 

Reference Author: Bach I, Carriere C, Ostendorff HP, 
Andersen B, Rosenfeld 
Reference Author: MG; 

Reference Location: Genes Dev 1997;1 1 :1370-1380. 
Reference Number: [3] 
Reference Medline: 97078753 

Reference Title: Interactions of the LIM-domain-binding 
factor Ldb1 with LIM 

Reference Title: homeodomain proteins. 

Reference Author: Agulnick AD, Taira M, Breen JJ, Tanaka 

T, Dawid IB, 

Reference Author: Westphal H; 
Reference Location* Nature 1 996; 384: 270-272. 
Reference Number: [4] 
Reference Medline: 97030257 

Reference Title: Nuclear LIM interactor, a rhombotm and 
LIM homeodomain 

Reference Title: interacting protein, is expressed early in 
neuronal 

Reference Title: development. 

Reference Author: Jurata LW, Kenny DA, Gill GN; 

Reference Location: Proc Natl Acad Sci U S A 

1996;93:11693-11698. 

Database Reference INTERPRO; IPR002691 ; 
Comment: The LIM-domain binding protein, binds to 
the LIM domain LIM of 

Comment: LIM homeodomain proteins which are 
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transcriptional regulators of 
Comment: development. 

Comment: Nuclear LIM interactor (NLI) / LIM domain- 

binding protein 1 (LDB1) 

Comment: Swiss: P70662 is located in the nuclei of 

neuronal cells during 

Comment: development, it is co-expressed with isl1 in 

early motor neuron 

Comment: differentiation and has a suggested role in 

the IsM dependent 

Comment: development of motor neurons [4]. 

Comment: It is suggested that these proteins act 

synergisticaliy to enhance 

Comment: transcriptional efficiency by acting as co- 

factors for LIM homeodomain 

Comment: and Otx class transcription factors both of 

which have essential roles 

Comment: in development [2]. 

Comment: The Drosophila protein Chip Swiss:01 8353 

is required for segmentation 

Comment: and activity of a remote wing margin 

enhancer [1]. Chip is a ubiquitous 

Comment: chromosomal factor required for normal 

expression of diverse genes at 

Comment: many stages of development [1]. It is 

suggested that Chip cooperates 

Comment: with different LIM domain proteins and other 

factors to structurally 

Comment: support remote enhancer-promoter 

interactions [1]. 

Number of members: 19 



Triglyceride lipases (EC 3.1.1.3) [1] are lipolytic enzymes that 
hydrolyzes 

the ester bond of triglycerides. Lipases are widely distributed in 
animals, 

plants and prokaryotes. In higher vertebrates there are at least 
three tissue- 
specific isozymes: pancreatic, hepatic, and gastric/lingual. These 
three types 

of lipases are closely related to each other as well as to 
lipoprotein lipase 

(EC 3.1 .1 .34) [2], which hydrolyzes triglycerides of chylomicrons 
and very low 

density lipoproteins (VLDL). 

The most conserved region in all these proteins is centered 
around a serine 

residue which has been shown [3] to participate, with an 
histidine and an 

aspartic acid residue, to a charge relay system. Such a region is 
also present 

in lipases of prokaryotic origin and in lecithin-cholesterol 
acyltransferase 

(EC 2.3.1.43) (LCAT) [4], which catalyzes fatty acid transfer 
between 

phosphatidylcholine and cholesterol. We have built a pattern from 
that region. 



Description of pattern (s) and/or profile(s) 

Consensus pattern [LlV]-x-[LIVFY]-[LIVMST]-G-[HYVW]-S-x-G- 
[GSTAC] [S is the active site residue] 

Sequences known to belong to this ciass detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 35. 

Note Drosophila vitellogenins are also related to lipases [5], but 
they have lost their active site serine. 
Last update 

November 1997 / Pattern and text revised. 
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PDOC00842 


Lipolytic enzymes "G-D- 
S-L" family, serine active 
site 


Recently [1], a family of lipolytic enzymes has been 
characterized. This 

family currently consist of the following proteins: 

- Aeromonas hydrophila lipase/phosphatidylcholine-steroi 
acyltransf erase. 

- Xenorhabdus luminescens lipase 1 . 

- Vibrio mimicus aryiesterase. 

- Escherichia coli acyl-coA thioesterase I (genetesA). 

- Vibrio parahaemolyticus therm olabite hemolysin/atypical 
phospholipase. 

- Rabbit phospholipase AdRab-B, an intestinal brush border 
protein with 

esterase and phospholipase A/fysophospholipase activity that 
could be 

involved in the uptake of dietary lipids. AdRab-B contains four 
repeats of 
about 320 amino acids. 

- Arabidopsis thatiana and Brassic napus anther-specific proline- 
rich protein 

APG. 

- A Pseudomonas putida hypothetical protein in trpE-trpG 
mtergenic region. 

A serine has been identified a part of the active site in the 
Aeromonas, 

Vibrio mimicus and Escherichia coli enzymes. It is located in a 
conserved 

sequence motif that can be used as a signature pattern for these 
proteins. 

Description of pattem(s) and/or profile(s) 

Consensus pattern [LiVMP^AG](4)-G-D-S-[LIVM]-x(1 ! 2)-[TAG]-G 
[S is the active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note this pattern will pick up two of the four repeats in AdRab-B, 
the first one is not detected as its sequence has diverged in the 
region of the putative active site residue. The last one is also not 
detected because it is slightly divergent at the end of the pattern. 
Expert (s) to contact by email 
Upton C. upton@so!.uvic.ca 

Buckley J.T. tbuckley@sol.uvic.ca 

Last update 

November 1 995 / First entry. 
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Lipoprotein! 


PDOC00013 


Prokaryotic membrane 
lipoprotein lipid 
attachment site 


n prokaryotes, membrane lipoproteins are synthesized with a 
precursor signal 

peptide, which is cleaved by a specific lipoprotein signal 
Deptidase (signal 

peptidase II). The peptidase recognizes a conserved sequence 
and cuts upstream 

of a cysteine residue to which a glyceride-fatty acid lipid is 
attached [1]. 

Some of the proteins known to undergo such processing 
currently include (for 
recent listings see [1 ,2,3]): 

- Major outer membrane lipoprotein (murein-lipoproteins) (gene 
IPP). 

- Escherichia coii iipoprotein-28 (gene nlpA). 

- Escherichia coli lipoprotein-34 (gene nlpB). 

- Escherichia coli lipoprotein nlpC. 

- Escherichia coli lipoprotein nlpD. 

- Escherichia coli osmotically inducible lipoprotein B {gene 
osmB). 

- Escherichia coli osmotically inducible lipoprotein E (gene 
osmE). 

- Escherichia coli pepttdoglycan-associated lipoprotein (gene 
pal). 

- Escherichia coli rare lipoproteins A and B (genes rplA and rpiB). 

- Escherichia coli copper homeostasis protein cutF (or nlpE). 

- Escherichia coli plasm ids traT proteins. 

- Escherichia coli Coi plasmids lysis proteins. 

- A number of Bacillus beta-iactamases. 

- Bacillus subtilis periplasmic oligopeptide-binding protein (gene 
oppA). 

- Borrelia burgdorferi outer surface proteins A and B (genes ospA 
and ospB). 

- Borretia hermsii variable major protein 21 (gene vmp21) and 7 
(gene vmp7). 

- Chlamydia trachomatis outer membrane protein 3 (gene omp3). 

- Fibrobacter succinogenes endoglucanase cel-3. 

- Haemophilus influenzae proteins Pal and Pep. 

- Klebsiella pullulunase (gene pulA). 

- Klebsiella pullulunase secretion protein puis. 

- Mycoplasma hyorhinis protein p37. 

- Mycoplasma hyorhinis variant surface antigens A, B f and C 
(genes vlpABC). 

- Neisseria outer membrane protein H.8. 

- Pseudomonas aeruginosa lipopeptide (gene ippL). 

- Pseudomonas soianacearum endoglucanase egl. 

- Rhodopseudomonas viridis reaction center cytochrome subunit 
(gene cytC). 

- Rickettsia 17 Kd antigen. 

- Shigella flexneri invasion piasmid proteins mxiJ and mxiM. 

- Streptococcus pneumoniae oligopeptide transport protein A 
(gene amiA). 

- Treponema pallidium 34 Kd antigen. 

- Treponema pallidium membrane protein A (genetmpA). 

- Vibrio harveyi chitobiase (gene chb). 

- Yersinia virulence piasmid protein yscJ. 

- Halocyanin from Natrobacterium pharaonis [4], a membrane 
associated copper- 
binding protein. This is the first archaebacterial protein 

known to be 
modified in such a fashion). 

From the precursor sequences of all these proteins, we derived 
a consensus 

pattern and a set of rules to identify this type of post- 

translational 

modification. 
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Description of pattern (s) and/or profiie(s) 

Consensus pattern {DERK}(6}-[LIVMFWSTAG]{2)- 
LIVMFYSTAGCQ]-[AGS]-C [C is the lipid attachment site] 
Additional rules: 1) The cysteine must be between positions 15 
and 35 of the sequence in consideration. 2) There must be at 
east one Lys or one Arg in the first seven positions of the 
sequence. 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT some 100 
prokaryotic proteins. Some of them are not membrane 
ipoproteins, but at least half of them could be. 
Last update 

November 1995 / Pattern and text revised. 
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Lipoprotein_2 


PDOC00013 


Prokaryotic membrane 
lipoprotein lipid 
attachment site 


In prokaryotes, membrane lipoproteins are synthesized with a 
precursor signal 

peptide, which is cleaved by a specific lipoprotein signal 
peptidase (signal 

peptidase II). The peptidase recognizes a conserved sequence 
and cuts upstream 

of a cysteine residue to which a glyceride-fatty acid lipid is 
attached [1]. 

Some of the proteins known to undergo such processing 
currently include (for 
recent listings see [1 ,2,3]): 

- Major outer membrane lipoprotein (murein-lipoproteins) (gene 
Ipp). 

- Escherichia coli lipoprotein-28 (gene nip A). 

- Escherichia coli lipoprotein-34 (gene nlpB). 

- Escherichia coli lipoprotein nipC. 

- Escherichia coli lipoprotein nipD. 

- Escherichia coli osmotically inducible lipoprotein B (gene osmB). 

- Escherichia coli osmoticaiiy inducible lipoprotein E (gene osmE). 

- Escherichia coli peptidoglycan-associated lipoprotein (gene pal). 

- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 

- Escherichia coli copper homeostasis protein cutF (or nlpE). 

- Escherichia coli plasm ids traT proteins. 

- Escherichia coli Col plasmids lysis proteins. 

- A number of Bacillus beta-lactamases. 

- Bacillus subtilis periplasmic oligopeptide-binding protein (gene 
oppA). 

- Borrelia burgdorferi outer surface proteins A and B (genes ospA 
and ospB). 

- Borrelia hermsii variable major protein 21 (gene vmp21) and 7 
(gene vmp7). 

- Chlamydia trachomatis outer membrane protein 3 (gene omp3). 

- Fibrobacter succinogenes endoglucanase eel -3. 

- Haemophilus influenzae proteins Pat and Pep. 

- Klebsiella pullulunase (gene pulA). 

- Klebsiella pullulunase secretion protein puis. 

- Mycoplasma hyorhinis protein p37. 

- Mycoplasma hyorhinis variant surface antigens A, B, and C 



Attorney No. 2750-1237P 



939 



Ffam 


Prosite 


Full Nam© 


DescripSon 








;genes vipABC). 

- Neisseria outer membrane protein H.8. 

- Pseudomonas aeruginosa lipopeptide (gene IppL). 

- Pseudomonas soianacearum endoglucanase egl. 

- Rhodopseudomonas viridis reaction center cytochrome subunit 
(gene cytC). 

- Rickettsia 17 Kd antigen. 

- Shigella flexneri invasion plasmid proteins mxiJ and mxiM. 

- Streptococcus pneumoniae oligopeptide transport protein A 
(gene amiA). 

- Treponema pallidium 34 Kd antigen. 

- Treponema pallidium membrane protein A (gene tmpA). 

- Vibrio harveyi chitobiase (gene chb). 

- Yersinia virulence plasmid protein yscj. 

- Halocyanin from Natrobacterium pharaonis [4], a membrane 
associated copper- 
binding protein. This is the first archaebacteriai protein known 

to be 

modified in such a fashion). 

From the precursor sequences of all these proteins, we derived 
a consensus 

pattern and a set of rules to identify this typeofpost- 

translational 

modification. 

Description of pattern (s) and/or profile(s) | 

Consensus pattern {DERK}(6)-[LIVIv1FWSTAG](2)- 
[LIVMFYSTAGCQ]-[AGS]-C [C is the lipid attachment site] 
Additional rules: 1) The cysteine must be between positions 15 
and 35 of the sequence in consideration. 2) There must be at least 
one Lys or one Arg in the first seven positions of the sequence. 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWiSS-PROT some 100 
prokaryotic proteins. Some of them are not membrane 
lipoproteins, but at least half of them could be. 
Last update 

November 1995 / Pattern and text revised. 
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attachment site 


in prokaryotes, membrane lipoproteins are synthesized with a 
precursor signal 

peptide, which is cleaved by a specific lipoprotein signal 
peptidase (signal 

peptidase il). The peptidase recognizes a conserved sequence 
and cuts upstream 

of a cysteine residue to which a glyceride-fatty acid lipid is 
attached [1]. 

Some of the proteins known to undergo such processing 
currently include (for 
recent listings see [1 ,2,3]): 

- Major outer membrane lipoprotein (murein-lipoproteins) (gene 
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IPP)- 

- Escherichia coli lipoprotein-28 (gene nlpA). 

- Escherichia coii iipoprotein-34 (gene nlpB). 

- Escherichia coli lipoprotein nlpC. 

- Escherichia coli lipoprotein nipD. 

- Escherichia coii osmoticaliy inducible lipoprotein B {gene 
osmB). 

- Escherichia coii osmoticaliy inducible lipoprotein E (gene 
osmE). 

- Escherichia coli peptidoglycan-associated lipoprotein (gene 
pal). 

- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 

- Escherichia coli copper homeostasis protein cutF (or nlpE). 

- Escherichia coli plasm ids traT proteins. 

- Escherichia coli Col plasmids lysis proteins. 

- A number of Bacillus beta-iactamases. 

- Bacillus subtilis periplasm ic oligopeptide-binding protein (gene 
oppA). 

- Borrelia burgdorferi outer surface proteins A and B (genes ospA 
and ospB). 

- Borrelia hermsii variable major protein 21 (gene vmp21) and 7 
(gene vmp7). 

- Chlamydia trachomatis outer membrane protein 3 (gene omp3). 

- Fibrobacter succinogenes endoglucanase cei-3. 

- Haemophilus influenzae proteins Pal and Pep. 

- Klebsiella pullulunase (gene pulA). 

- Klebsiella pullulunase secretion protein puiS. 

- Mycoplasma hyorhinis protein p37. 

- Mycoplasma hyorhinis variant surface antigens A, B, and C 
(genes vIpABC). 

- Neisseria outer membrane protein H.8. 

- Pseudomonas aeruginosa lipopeptide (gene IppL). 

- Pseudomonas solanacearum endoglucanase eg I. 

- Rhodopseudomonas viridis reaction center cytochrome subunit 
(gene cytC). 

- Rickettsia 1 7 Kd antigen. 

- Shigella flexneri invasion piasmid proteins mxiJ and mxtM. 

- Streptococcus pneumoniae oligopeptide transport protein A 
(gene amiA). 

- Treponema pallid ium 34 Kd antigen. 

- Treponema pallidium membrane protein A (gene tmpA). 

- Vibrio harveyi chitobiase (gene chb). 

- Yersinia virulence piasmid protein yscJ. 

- Halocyanin from Natrobacterium pharaonis [4], a membrane 
associated copper- 
binding protein. This is the first archaebacterial protein 

known to be 
modified in such a fashion). 

From the precursor sequences of all these proteins, we derived 
a consensus 

pattern and a set of rules to identify this type of post- 
translational 
modification- 
Description of pattern (s) and/or profile(s) 

Consensus pattern {DERK}(6)-[LIVMFWSTAG](2)- 
[LIVMFYSTAGCG]-[AGS]-C [C is the lipid attachment site] 
Additional rules: 1) The cysteine must be between positions 15 
and 35 of the sequence in consideration. 2) There must be at 
least one Lys or one Arg in the first seven positions of the 
sequence. 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT some 100 
prokaryotic proteins. Some of them are not membrane 
lipoproteins, but at least half of them could be. 
Last update 

November 1995 / Pattern and text revised. 
References 
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LuteoJVpg 




Luteovirus putative VPg 
genome linked protein 


Accession number: PF01659 

Definition: Luteovirus putative VPg genome linked protein 

Author: Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfam-B_970 (release 4.1 ) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 91 .70 1 91 .70 

Noise cutoffs: -47.90 -47.90 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 94120742 

Reference Title: Soybean dwarf luteovirus contains the third 
variant genome 

Reference Title: type in the luteovirus group. 
Reference Author: Rathjen JP, Karageorgos LE, Habili N ; 
Waterhouse PM, Symons 
Reference Author: RH; 

Reference Location: Virology 1994;198:671-679. 
Database Reference INTERPRO; IPR001964; 
Comment: This family consists of several putative 
genome linked proteins. 

Comment: The genomic RNA of luteoviruses are linked 
to virally encoded genome 

Comment: proteins (VPg). Open reading frame 4 is 
thought to encode the VPg 

Comment: in Soybean dwarf luteovirus [1]. 
Comment: Luteoviruses have isometric capsids that 
contain a positive stand 

Comment: ssRNA genome, they have no DNA stage 
during their replication. 
Number of members: 32 


MATH 




MATH domain 


Accession number: PF00917 

Definition: MATH domain 

Author: Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfam-BJ 602 (release 3.0) 

Gathering cutoffs: 17 0 

Trusted cutoffs: 1 7.90 0.20 

Noise cutoffs: 1 1 .80 1 1 .80 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96334294 

Reference Title: TRAF proteins and meprins share a 

conserved domain. 

Reference Author: Uren AG, Vaux DL; 

Reference Location: Trends Biochem Sci 1996;21 :244-245. 

Reference Number: [2] 

Keterence ivjeatine. yyo^uo i 

Reference Title: Crystallographic analysis of CD40 

recognition and signaling 

Reference Title: by human TRAF2. 

Reference Author: McWhirter SM, Pullen SS, Holton JM, 

Crute J J, Kehry MR, 

Reference Author: Alber T; 



Attorney No. 2750-1237P 



942 



Ram 


Prosite 


Full Name 


Description 








Reference Location: Proc Natl Acad Sci USA1 999;96:8408- 
3413. 

Reference Number: [3] 
Reference Medline: 99069615 

Reference Title: Comparison of the complete protein sets of 
worm and yeast: 

Reference Title: orthology and divergence. 

Reference Author: Chervitz SA, Aravind L, Sherlock G, Ball 

CA, Koonin EV, 

Reference Author: Dwight SS, Harris MA, Doiinski K, Mohr S, 
Smith T, Weng S, 

Reference Author: Cherry JM, Botstein D; 
Reference Location: Science 1 998;282:2022-2028. 
Database Reference: SCOP; 1 qsc; fa; [SCOP-USA] [CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR002083; 

Database Reference PDB; 1qsc A; 357; 498; 

Database Reference PDB; 1qsc B; 357; 498; 

Database Reference PDB; 1qsc C; 357; 498; 

Database reference: PFAMB; PB018448; 

Database reference: PFAMB; PB040690; 

Database reference: PFAMB; PB041 1 98; 

Comment: This motif has been called the Meprin And 

TRAF-Homology 

Comment: {MATH) domain. This domain is hugely 
expanded in the nematode 
Comment: C. elegans [3]. 
Number of members: 212 


MCT 




Monocarboxylate 
transporter 


Accession number: PF01587 

Definition: Monocarboxylate transporter 

Author: Bashton M, Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfam-B_483 {release 4.1 ) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 322.90 322.90 

Noise cutoffs: -38.20 -38.20 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98087501 

Reference Title: Cloning and sequencing of four new 
mammalian 

Reference Title: monocarboxylate transporter (MCT) 
homologues confirms the 

Reference Title: existence of a transporter family with an 
ancient past. 

Reference Author: Price NT, Jackson VN, Halestrap AP, 
Reference Location: Biochern J 1 998;329:321 -328. 
Database Reference INTERPRO; IPR002897; 
Comment: This domain consists of the transmembrane 
region of the monocarboxylate 

Comment: transporters. Monocarboxylate transporters 
(MTC) are transmembrane 

Comment: glycoproteins with 10-12 predicted 
transmembrane regions. 

Comment: They catalyse the proton linked transport of 
lactic acid, 

Comment: pyruvate and ketone bodies across the 
plasma membrane [1]. 
Number of members: 33 


Methioninesynt 




Methionine synthase, 
vitamin-B12 independent 


Accession number: PF01717 

Definition: Methionine synthase, vitamin-B1 2 

independent 

Author: Bashton M, Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfam-B 1 909 {release 4.1 ) 

Gathering cutoffs: -1 55.0 -1 55.0 

Trusted cutoffs: -1 55.00 -1 55.00 

Noise cutoffs: -1 70.00 -1 70.00 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 
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Reference Medline: 98301657 

Reference Title: The specific features of methionine 

biosynthesis and 

Reference Title: metabolism in plants. 1 
Reference Author: Ravanel S, Gakiere B, Job D, Douce R; 
Reference Location. Proc Natl Acad Sci USA 1998;95:7805- 
7812. 

Database Reference INTERPRO; 1PR002629; 

Database reference: PFAMB; PB041 61 7; 

Comment: This is a family of vitamin-B1 2 independent 

methionine synthases 

Comment: or S-methyltetrahydropteroyltriglutamate- 
homocysteine 

Comment: methyltransf erases, EC.2.1 .1.14 from 
bacteria and plants. 

Comment: Plants are the only higher eukaryotes that 

have the required enzymes 

Comment: for methionine synthesis [1]. 

Comment: This enzyme catalyses the last step in the 

production of methionine 

Comment: by transferring a methyl group from 5- 

methyltetrahydrofotate to 

Comment: homocysteine [1]. 

Comment: The aligned region makes up the carboxy 

region of the approximately 

Comment: 750 amino acid protein except in some 
hypothetical archaeal proteins 

Comment: present in the family, where this region 

corresponds to the 

Comment: entire length. 

Number of members: 28 


Methyltransf^ 




O-methyitransferase 


Accession number: PF00891 

Definition: O-methyltransferase 

Previous Pfam IDs: Methyltransf; 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 52 (release 3.0) 

Gathering cutoffs: -53 -53 

Trusted cutoffs: -22.00 -22.00 

Noise cutoffs: -84.60 -84.60 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcafibrate -seed 0 HMM 

Reference Number: [1 ] 

Reference Medline: 9316781 1 

Reference Title: Purification of a 40-kilodaiton 

methyltransferase active in 

Reference Title: the aflatoxin biosynthetic pathway. 
Reference Author: Keller NP, Dischinger HC, Bhatnagar D, 
Cleveland TE, Ultah 
Reference Author: AH; 

Reference Location: Appl Environ Microbiol 1993;59:479-484. 
Database Reference INTERPRO; IPR001 077; 
Comment: This family includes a range of O- 
methyltransferases. These 

Comment: enzymes utilise S-adenosyl methionine. 
Number of members: 67 


Methyitransf_3 




O-methyltransferase 


Accession number: PF01596 

Definition: O-methyltransferase 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_749 (release 4.1 ) 

Gathering cutoffs: -86 -86 

Trusted cutoffs: -81 .80 -81 .80 

Noise cutoffs: -91 .00 -91 .00 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hrnmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97090395 

Reference Title: Two multifunctional peptide synthetases 
and an 

Reference Title: O-methyltransferase are involved in the 
biosynthesis of the 
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Reference Title: DNA-binding antibiotic and antitumour 
agent saframycin Mx1 

Reference Title: from Myxococcus xanthus. 
Reference Author: Pospiech A, Bietenhader J, Schupp T; 
Reference Location . Microbiology 1 996;1 42:741 -746. 
Database Reference: SCOP; 1 vid; fa; [SCOP-USA][CATH- 
PDBSUM] 

Database Reference INTERPRO; 1PR002935; 
Database Reference PDB; 1vid ; 13; 186; 
Database reference: PFAMB; PB040269; 
Comment: Members of this family are O- 
methyltransferases. The family 

Comment: includes catechol o-methyltransferase 
Swiss:P21964 ) caffeoyl-CoA 

Comment: O-methyltransferase Swiss:Q43095 and a 
family of bacterial 

Comment: O-methyltransferases that may be involved 
in antibiotic 

Comment: production [1]. 
Number of members: 39 


MMRJHSR1 




GTPase of unknown 
function 


Accession number: PF01926 

Definition: GTPase of unknown function 

Author: En right A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: -21 -21 

Trusted cutoffs: -20.70 -20.70 

Noise cutoffs: -31 .60 -31 .60 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 94235953 

Reference Title: Structure and evolution of a member of a 
new subfamily of 

Reference Title: GTP-binding proteins mapping to the 
human MHC class i 
Reference Title: region. 

Reference Author: Vernet C, Ribouchon MT, Chimini 
GPontarotti P; 

Reference Location: Mamm Genome 1 994;5:1 00-1 05. 
Database Reference INTERPRO; IPR002917; 
Database reference: PFAMB; PB000471 ; 
Database reference: PFAMB; PB0021 71 ; 
Database reference: PFAMB; PB015790; 
Number of members: 67 


MoaC 




MoaC family 


Accession number: PF01967 

Definition: MoaC family 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 73.00 73.00 

Noise cutoffs: -93.90 -93.90 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 99337076 

Reference Title: Characterization of a molybdenum cofactor 
biosynthetic gene 

Reference Title: cluster in Rhodobacter capsulatus which is 
specific for the 

Reference Title: biogenesis of dimethy [sulfoxide reductase. 
Reference Author: Solomon PS, Shaw AL, Lane I, Hanson 
GR, Palmer T, McEwan 
Reference Author: AG; 

Reference Location: Microbiology 1999;145:1421-1429. 
Database Reference INTERPRO; 1PR002820; 
Comment: Members of this family are involved in 
molybdenum 

Comment: cofactor biosynthesis. However their 
molecular 

I Comment: function is not known. 
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Number of members: 24 


MorbilhJP 




Morbillivirus RNA > 
Dolymerase alpha I 
subunit > 
/ 
< 

( 

t 
1 
1 

1 


Accession number: PF01647 

Definition: Morbillivirus RNA polymerase alpha subunit 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B 903 (release 4.1 ) 

fathering cutoffs: -74 -74 

Frusted cutoffs: 22,90 22.90 

Sloise cutoffs: -1 71 .70 -1 71 .70 

HMM buiid command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcaiibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 92341 068 

Reference Title: Sequence analysis of the genes encoding 
he nucleocapsid 

Reference Title: protein and phosphoprotein (P) of phocid 
distemper virus, 

Reference Title: and editing of the P gene transcript. 
Reference Author: Blixenkrone-Moller M, Sharma B, Varsanyi 
TM, Hu A, Norrby 

Reference Author: E, Kovamees J; 

Reference Location: J. Gen. Virol. 1992;73:885-893. 

Database Reference INTERPRO; IPR002581 ; 

Database reference: PFAMB; PB002389; 

Comment: This family consists of morbillivirus RNA 

polymerase alpha subunit 

Comment: and non structural protein V. The P gene of 
morbillivirus is 

Comment: ^transcriptionally edited leading to the N- 
terminal 

Comment: half of the P protein being appended to the 
C-terminal of the P protein, 

Comment: and a cysteine rich region in the V fusion 
protein which has been 

Comment: shown to bind zinc [see Virology 3rd edition, 

volume 1, chapter 40, 

Comment: pages 1 1 82-1 1 84]. 

Comment: Morbilliviruses are positive strand ssRNA 

viruses and a part of the 

Comment: paramyxoviridae family, members include 
measles virus and phocine 
Comment: distemper virus. 
Number of members: 52 


Myc_N_term 




Myc amino-terminal 
region 


Accession number: PF01 056 

Definition: Myc amino-terminal region 

Author: Finn RD, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_387 (release 3.0) 

Gathering cutoffs: -1 09 -1 09 

Trusted cutoffs: -81 .20 -81 .20 

Noise cutoffs: -137.40 -137.40 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcaiibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98280742 

Reference Title: The molecular role of Myc in growth and 
transformation: 

Reference Title: recent discoveries lead to new insights. 
Reference Author: Facchini LM S Penn LZ; 
Reference Location: FASEB J 1 998;1 2:633-651 . 
Reference Number: [2] 
Reference Medline: 9731 8600 
Reference Title: Myc target genes. 
Reference Author: Grandori C, Eisenman RN; 
Reference Location: Trends Biochem Sci 1 997;22:1 77-1 81 . 
Database Reference INTERPRO, lPH0u^4i o, 
Comment: The myc family belongs to the basic helix- 
loop-helix leucine zipper 

Comment: class of transcription factors, see HLH. Myc 
forms a 

Comment: heterodimer with Max, and this complex 
regulates cell growth through 
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< 


Comment: direct activation of genes involved in cell 
r eplication [2]. 

Number of members: 56 


Myosin_tail 




Myosin tail 


Accession number: PF01 576 

Definition: Myosin tail 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_356 (release 4.1 ) 

Gathering cutoffs: 19 19 

Trusted cutoffs: 23.30 23.30 

Noise cutoffs: 15.10 15.10 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 87060988 

Reference Title: Complete nucleotide and encoded amino 
acid sequence of a 

Reference Title: mammalian myosin heavy chain gene. 
Evidence against 

p e f eren ce Title: intron-dependent evolution of the rod. 
Reference Author: Strehler EE, Strehler-page M-A, Perriard 
JC, Periasamy M, 

Reference Author: Nadal-ginard B; 
Reference Location: J MOL BIOL 1 986; 190:291 -31 7. 
Database Reference INTERPRO; IPR002928; 
Comment: The myosin molecule is a multi-subunit 
complex made up 

Comment: of two heavy chains and four light chains it is 
a fundamental contractile 

Comment: protein found in all eukaryote celt types [1]. 
Comment: This family consists of the cotled-coil myosin 
heavy chain tail region. 

Comment: The coiled-coil is composed of the tail from 
two molecules of myosin. 

Comment: These can then assemble into the 
macromolecular thick filament [1]. 

Comment: The coiled-coil region provides the structural 

backbone the thick 

Comment: filament [1 ] . 

Number of members: 1 82 


Na Ala^symp 


PDOC00681 


Sodium: alanine 
symporter family 
signature 


It has been shown [1] that integral membrane proteins that 
mediate the intake 

of a wide variety of molecules with the concomitant uptake of 
sodium ions 

(sodium sym porters) can be grouped, on the basis of sequence 
and functional 

similarities into a number of distinct families. One of these 
families is 

known as the sodium:alanine symporter family (SAF) and 
currently consists of 
the following proteins: 

- Thermophilic bacterium PS-3 alanine carrier protein (ACP). 
ACP can use both 

sodium and hydrogen as asymport ion. 

- Aiteromonas haloplanktis D-alanine/glycine permease (gene 
dagA). 

- Bacillus subtilis alsT. 

- Hypothetical protein yaaJ from Escherichia coh and 
HI0183, the 

corresponding Haemophilus influenzae protein. 

- Haemophilus influenzae hypothetical protein HI 0883. 

These integral membrane proteins are predicted to comprise a 
least eight 

membrane spanning domains. As a signature pattern we 
selected a highly 

conserved region which is located in the N-terminal section and 
which includes 

part of the first transmembrane region. 
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Description of pattern (s) and/or profile(s) 

Consensus pattern G-G-x-[GA](2)-[UVM]-F-W-M-W-(LIVWI]-x- 
[STAV]-[LIVMFA](2)-G 

Sequences known to belong to this class detected by one pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Pattern and text revised. 

References 

[1] 

Reizer J., Reizer A., Saier M.H. Jr. 
Biochim. Biophys. Acta 1197:133-136(1994). 


Na_Ca_ Ex 




Sodium/calcium 
exchanger protein 


Accession number: PF01 699 

Definition: Sodium/calcium exchanger protein 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pfam-B_1 680 (release 4. 1 ) 

Gathering cutoffs: 3 3 

Trusted cutoffs: 3.40 3.40 

Noise cutoffs: 1.20 1.20 

HMM build command line: hmmbuiid HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96394663 

Reference Title: Cloning of a third mammalian Na+-Ca2+ 
exchanger, NCX3. 

Reference Author: Nicoll DA, Quednau BD, Qui Z, Xia YR, 
Lusis AJ, Philipson 
Reference Author: KD; 

Reference Location: J Biol Chem 1 996;271 :2491 4-24921 . 
Reference Number: [2] 
Reference Medline: 91 047958 

Reference Title: Molecular cloning and functionat expression 
of the cardiac 

Reference Title: sarcofemmat Na(+)-Ca2+ exchanger. 
Reference Author: Nicoll DA, Longoni S, Philipson KD; 
Reference Location: Science 1990;250:562-565. 
Database Reference INTERPRO; IPR002613; 
Database reference' PFAMB; PB002768; 
Database reference: PFAMB; PB040773; 
Database reference: PFAMB; PB041540; 
Comment: This is a family of sodium/calcium 
exchanger integral membrane 

Comment: proteins. This family covers the integral 
membrane regions of 

Comment: the proteins. Sodium/calcium exchangers 
regulate intracellular Ca2+ 

Comment: concentrations in many cells; cardiac 
myocytes, epithelial cells, 

Comment: neurons retinal rod photoreceptors and 
smooth muscle cells [2]. 

Comment: Ca2+ is moved into or out of the cytosol 
depending on Na+ concentration 

Comment: [2]. In humans and rats there are 3 
isoforms; NCX1 NCX2 and NCX3 [1] 

Comment: see Swiss:Q01 728, Swiss:P48768 and 
Swiss:P70549 respectively. 
Number of members: 1 05 


Na__Galacto_symp 


PDOC00680 


Sod ium :galactoside 
symporter family 
signature 


It has been shown [1] that integral membrane proteins that 
mediate the intake 

of a wide variety of molecules with the concomitant uptake of 
sodium ions 

(sodium symporters) can be grouped, on the basis of sequence 
and functional 

similarities into a number of distinct families. One of these 
families is 

known as the sodium:galactoside symporter family (SGF) and 

currently consists 

of the following proteins: 
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J Description 



Na K ATPase N 



The melibiose carrier (gene m el B) from a variety of 
enterobacteria. This 

protein is responsible for melibiose transport and is capable 
of using 

hydrogen, sodium, and lithium cations as coupling cations for 
cotransport. 

The lactose permease from Lactobacillus (gene lacS or lacY). 
This protein 

is responsible for the transport of beta-galactosides into the 
ceil, with 

the concomitant export of a proton. It consists of two 
domains; a N- 

terminal SGF domain and a C-termina! domain that resembles 
that of enzyme 

IIA of the PEP:sugar phosphotransferase system. 

The raffinose permease from Pediococcus pentosaceus. It also 
consists of a 

N-terminal SGF domain and a C-terminal IIA domain. 

- The glucuronide carrier (gene gusB or uidP) from Escherichia 
coli. 

- The xylose transporter {gene xylP) from Lactobacillus pentosus. 

- Escherichia coli hypothetical protein yagG. 

- Escherichia coli hypothetical protein yicJ. 

- Escherichia coli hypothetical protein yihO. 
Escherichia coli hypothetical protein yihP. 
Bacillus subtiiis hypothetical protein yjmB. 
Bacillus subtiiis hypothetical protein ynaJ. 

Like sugar transport proteins, these integral membrane proteins 
are predicted 

to comprise twelve membrane spanning domains. Asa 
signature pattern we 

selected a highly conserved region which is located in a 
cytoplasmic loop 

between the second and third transmembrane regions. This 
region starts with 

a conserved aspartate which has been shown [2], in melB, to be 

important for 

the activity of the protein. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [DG]-x(3)-G-x(3)-[DN]-x(6 1 8)-[GA]-[KRHQ]- 
[FSAHKRj-IPT]- [FYW]-[LlVMWQ]-[LIV]-x-[GAFV]-[GSTA] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWiSS-PROT NONE. 
Last update 

July 1999 / Pattern and text revised. 

References 

EH 

Reizer J., Reizer A., Saier M.H. Jr. 
Biochim. Biophys. Acta 1 197:133-136(1994). 

Pourcher T., Deckert M., Bassilana M., Leblanc G. 
Biochem. Biophys. Res. Commun. 178:1176-1181(1991). 



Na+/K+ ATPase C- I This domain is specific to the sodium and potassium AT Pases 
terminus { Na_K- AT P as e) . 

The sodium pump (Na+,K+ ATPase), located in the plasma 
membrane of all animal ceils [1], is an heterotrimer of a catalytic 
subunit (alpha chain), a glycoprotein subunit of about 34 Kd (beta 
chain) and a small hydrophobic protein of about 6 Kd. The beta 
subunit seems [2] to regulate, through the assembly of alpha/beta 
heterodimers, the number of sodium pumps transported to the 
plasma membrane. 

This family is typically found in association with E1 -E2 
ATPase. Uses of these polypeptide includes regulating that ion 
content in a desired cell or organism and can convey salt or ion 
tolerance. 



Na+/K+ ATPase C- 



Accession number: 



PF00690 
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1 


terminus 


Definition: Na+/K+ ATPase C-terminus 

Author: Bateman A 

Alignment method of seed: Ciustaiw 

Source of seed members: Pfam-B_138 {release 2.1) 

Gathering cutoffs: 1 5.6 1 5.6 

Trusted cutoffs: 1 5.60 1 5.60 

Noise cutoffs: 15.10 15.10 

HMM build command line: hmmbuiid -f HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference INTERPRO; IPR000661 ; 

Database reference: PFAMB; PB000031 ; 

Comment: This family is always found in association 

with E1-E2„ATPase. 

Comment: This extension is specific to the Na+/K+ 
ATPase subfamily of 
Comment: ATPases. 
Number of members: 90 


NAD_Gly3P_dh 


PDOC00740 


NAD-dependent glycerol - 
3-phosphate 

dehydrogenase signature 


NAD-dependent glycerol -3-phosphate dehydrogenase (EC 
1.1.1.8) (GPD) catalyzes 

the reversible reduction of dihydroxyacetone phosphate to 
glycerol-3- 

phosphate. It is a eukaryotic cytosolic homodirneric protein of 
about 40 Kd. As 

a signature pattern we selected a glycine-rich region that is 
probably [1] 

involved in NAD-binding. 

Description of pattern (s) and/or profile(s) 

Consensus pattern G-[AT|-[LIVM]-K-[DN]-[LIVM3(2)-A-x-[GA]-x-G- 

[LIVMF]-x- [DE3-G-[LIVM]-x-[LlVMFYW]-G-x-N 

Sequences known to belong to this class detected by the pattern 

ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Pattern and text revised. 

References 

[1] 

Otto J., Argos P., Rossmann M.G. 
Eur. J. Biochem. 109:325-330(1980). 


NifU_N 




NifU-like N terminal 
domain 


Accession number: PF01 592 

Definition: NifU-like N terminal domain 

Author: Bateman A 

Alignment method of seed: Ciustaiw 

Source of seed members: Pfam-B_772 (release 4.1 ) 

Gathering cutoffs: -13-13 

Trusted cutoffs: 1 .20 1 .20 

Noise cutoffs: -28.80 -28.80 

HMM build command line: hmmbuiid -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97032601 

Reference Title: A modular domain of NifU, a nitrogen 
fixation cluster 

Reference Title: protein, is highly conserved in evolution. 
Reference Author: Hwang DM, Dempsey A, Tan KT, Liew 
CC' 

Reference Location: J Mol Evol 1996;43:536-540. 
Database Reference INTERPRO; IPR002871 ; 
Comment: This domain is found in NifU in combination 
with NifU-like. 

Comment: This domain is found on isolated in several 
bacterial species 

Comment: such as Swiss:OS31 56. The nif genes are 
responsible for nitrogen 

Comment: fixation. However this domain is found in 
bacteria that do not 

Comment: fix nitrogen, so it may have a broader 

significance in the cell 

Comment: than nitrogen fixation. 
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Number of members: 32 


NLPC_P60 




MLP/P60 family / 
E 
/ 
/ 

c 

{ 
1 
r 
\ 
\ 
t 
t 
( 
f 
t 
f 


Accession number: PF00877 
definition: NLP/P60 family 
\uthor: Bateman A 

Alignment method of seed: HMMJ>uiltjfrom_alignment 

source of seed members: Pfam-B_292 (release 3.0) 

fathering cutoffs: -9 -9 

rrusted cutoffs: -8.30 -8.30 

vioise cutoffs: -10.40 -10.40 

HMM buiid command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference INTERPRO; IPR000064; 

Database reference: PFAMB; PB024706; 

Comment: The function of this domain is unknown. It is 

ound 

Comment: in several lipoproteins, 
vlumber of members: 54 


NTR 




NTR/C345C module 


Accession number: PF01759 

Definition: NTR/C345C module 

Author: Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: [1] 

Gathering cutoffs: 25 25 

Trusted cutoffs: 57.30 57.30 

Noise cutoffs: 2.80 2.80 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1 1 

Reference Medline: 99379676 

Reference Title: The NTR module: domains of netrins, 
secreted frizzled 

Reference Title: related proteins, and type i procollagen C- 
proteinase 

Reference Title: enhancer protein are homologous with 
tissue inhibitors of 

Reference Title: rnetalloproteases [In Process Citation] 

Reference Author: Banyai L, Patthy L; 

Reference Location: Protein Sci 1999;8:1636-1642. 

Database Reference INTERPRO; IPR001 1 34; 

Database reference: PFAMB; PB005955; 

Comment: We have not included the related TIMP 

family. 

Comment: it has been suggested that the common 
function of these 

Comment: modules is binding to metzincins [1]. A 
subset of this family 

Comment: is known as the C345C domain because it 

occurs in complement 

Comment: C3, C4 and C5. 

Number of members: 64 


Nucleosidetran 




Nucleoside transporter 


Accession number: PF01733 

Definition: Nucleoside transporter 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pfarn- B_21 35 (release 4. 1 ) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 25.50 25.50 

Noise cutoffs: -1 22.50 -1 22.50 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1 ] 

Reference Medline: 98148080 

Reference Title: Cloning of the human equiiibrative, 

Reference Title: nitrobenzylmercaptopurine riboside 

(NBM PR) -insensitive 

Reference Title: nucleoside transporter ei by functional 
expression in a 

Reference Title: transport-deficient cell line. 

Reference Author: Crawford CR, Patel DH, Naeve C, Belt 

JA; 

Reference Location: J Biol Chem 1 998;273:5288-5293. 
Reference Number: [2] 
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Orbi VP6 



Orbivirus helicase VP6 



OSCP 



PDOC00327 



98019212 
Molecular cloning and functional 

nitrobenzyithioinosine (NBMPR)-sensitive 

NBMPR-insensitive (ei) equilibrative 



Young JD; 

J Biol Chem 1997;272:28423-28430. 
INTERPRO; IPR002259; 
This is a family of nucleoside transporters. 
In mammalian ceils nucleoside transporters 



Reference Medline: 
Reference Title: 
characterization of 
Reference Title: 
(es) and 
Reference Title: 
nucleoside transporter 
Reference Title: proteins (rENT1 and rENT2) from rat 
tissues. 

Reference Author: Yao SY, Ng AM, Muzyka WR, Griffiths M, 
Cass CE, Baldwin SA 
Reference Author: 
Reference Location: 
Database Reference 
Comment; 
Comment: 
transport nucleoside 
Comment: across the plasma membrane and are 

essential for nucleotide 

Comment: synthesis via the salvage pathways for cells 

that lack their own 

Comment: de novo synthesis pathways [2]. 

Comment: Also in this family is mouse and human 

nucleolar protein HNP36 

Comment: Swiss:Q1 4542 a protein of unknown 

function; although it has been 

Comment: hypothesized to be a plasma membrane 

nucleoside transporter [2]. 
Number of members: 1 5 



Accession number: PF01 51 6 
Definition: Orbivirus helicase VP6 

Author: Bateman A 

Alignment method of seed: Clustalw 
Source of seed members: Pfam-B_765 (release 4.0) 
Gathering cutoffs: -68 -68 
Trusted cutoffs: -37.1 0 -37.1 0 
Noise cutoffs: -98.90 -98.90 

HMM build command line: hmmbuild -F HMM SEED 
HMM build command line: hmmcalibrate -seed 0 HMM 
[1] 

97456481 

Bluetongue virus VP6 protein binds ATP 



Reference Number: 
Reference Medline: 
Reference Title: 
and exhibits an 
Reference Title: 
helicase activity that 
Reference Title: 
RNA substrates. 
Reference Author: 
Monastyrskaya K, 
Reference Author: 
Reference Location: 
Database Reference 
Comment: 
of the virion 
Comment: 

Number of members. 



ATP synthase delta 
(OSCP) subunit 
signature 



RNA-dependent ATPase function and a 
catalyze the unwinding of double-stranded 

Stauber N, Martinez-Costas J, Sutton G, 

Roy P; 

J Virol 1997;71 :7220-7226. 
INTERPRO; IPR001399; 
The VP6 protein a minor protein in the core 

is probably the virai heiicase [1]. 
27 



ATP synthase (proton-translocating ATPase) (EC 3.6.1 .34) [1 ,2] 
; is a component 

of the cytoplasmic membrane of eubacteria, the inner membrane 
of mitochondria, 

and the thylakoid membrane of chloropiasts. The ATPase 
complex is composed of 

an oligomeric transmembrane sector, called CF(0), which acts 
as a proton 

channel and a catalytic core, termed coupling factor CF(1). 

One of the subunits of the ATPase complex, known as subunit 
delta in bacteria 

and chloropiasts or the Oiigomycin Sensitivity Conferral Protein 
(OSCP) in 

mitochondria, seems to be part of the stalk that links CF(0) to 
CF(1). It . 
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either transmits conformational changes from CF(0) into CF(1 ) or 
s involved 

n proton conduction [3]. 

The different delta/OSCP subunits are proteins of approximately 
200 amino-acid 

residues - once the transit peptide has been removed in the 
chloroplast and 

mitochondrial forms - which show only moderate sequence 
homology. 

The signature pattern used to detect ATPase delta/OSCP 
subunits is based on a 

conserved region in the C-terminai section of these proteins. 
Description of pattern(s) and/or profile(s) 

Consensus pattern [LIVM]-x-[LlVIVIFYT]-x{3)-[LIVMT|-[DENQK]- 

x(2)-[LIVM]-x-[GSA]-G-[LIVMFYGA]-x-[LIVM]-[KRHENQ]-x- 

[GSEN] 

Sequences known to belong to this class detected by the pattern 

ALL, except 3 sequences. 

Other sequence(s) detected in SWISS-PROT 2. 

Last update 

November 1997 / Pattern and text revised. 

References 

[1] 

Futai M., Noumi T. T Maeda M. 

Annu. Rev. Biochem. 58:111-136(1989). 

[2] 

Senior A.E. 

Physiol. Rev. 68:177-231(1988). 
[3] 

Engelbrecht S., Junge W. 

Biochim. Biophys. Acta 1015:379-390(1990). 


OTCace 


PDOC00091 


Aspartate and ornithine 
carbamoyitransferases 
signature 


Aspartate carbamoyltransf erase (EC 2.1 .3.2) (ATCase) catalyzes 
the conversion 

of aspartate and carbamoyl phosphate to carbamoylaspartate, 
the second step 

inthede novo biosynthesis of pyrimidine nucleotides [1]. In 
prokaryotes 

ATCase consists of two subunits: a catalytic chain (gene 
pyrB) and a 

regulatory chain (gene pyri), while in eukaryotes it is a domain in 
a multi- 
functional enzyme (called URA2 in yeast, rudimentary in 
Drosophila, and CAD 

in mammals [2]) that also catalyzes other steps of the 

biosynthesis of 

pyrimidines. 

Ornithine carbamoyitransferase (EC 2.1 .3.3) (OTCase) catalyzes 
the conversion 

of ornithine and carbamoyl phosphate to citrulline. In mammals 
this enzyme 

participates in the urea cycle [3] and is located in the 
mitochondrial 

matrix. In prokaryotes and eukaryotic microorganisms it is 
involved in the 

biosynthesis of arginine. In some bacterial species it is also 
involved in the 

degradation of arginine [43 (the arginine deaminase pathway). 

it has been shown [5] that these two enzymes are evolutionary 
related. The 

predicted secondary structure of both enzymes are similar and 
there are some 

regions of sequence similarities. One of these regions 
includes three 
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-esidues which have been shown, by crystallography studies 
[6], to be 

mplicated in binding the phosphoryl group of carbamoyl 
phosphate. We have 

selected this region as a signature for these enzymes. 

Description of pattern (s) and/or profile(s) 

Consensus pattern F-x-[EK]-x-S-[GT|-R-T [S, R, and the 2nd T 
bind carbamoyl phosphate] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note the residue in position 3 of the pattern allows to distinguish 
between an ATCase (Glu) and an OTCase (Lys). 
Last update 

October 1 993 / Text revised. 
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oxidoredqIN 




NADH-Ubiquinone 
oxidoreductase (complex 
1), chain 5 N-terminus 


Accession number: PF00662 

Definition: NADH-Ubiquinone oxidoreductase (complex 

I), chain 5 N-terminus 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_22 (release 2.1) 

Gathering cutoffs: 18 18 

Trusted cutoffs: 1 9.40 1 9.40 

Noise cutoffs: 16.70 16.70 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmrncalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 931 1 0040 

Reference Title: The NADH:ubiquinone oxidoreductase 
(complex l) of 

Reference Title: respiratory chains. 

Reference Author: Walker JE; 

Reference Location: Q Rev Biophys 1992;25:253-324. 

Database Reference INTERPRO; IPR001 51 6; 

Database reference: PFAMB; PB000410; 

Database reference: PFAMB; PB033295; 

Database reference: PFAMB; PB040550; 

Comment: This sub-family represents an amino 

terminal extension 

Comment: of oxidored_ q1 . Only NADH-Ubiquinone 
chain 5 and 

Comment: eubacterial chain L are in this family. 
Comment: This sub-family is part of complex I which 
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catalyses the 

Comment: transfer of two electrons from NADH to 
ubiquinone in a 

Comment: reaction that is associated with proton 
translocation 

Comment: across the membrane. 
Number of members: 546 


oxidored_q2 




NADH- 

ubiquinone/piastoquinon 
e oxidoreductase chain 
4L 


Accession number: PF00420 

Definition: NADH-ubiquinone/plastoquinone 

oxidoreductase chain 4L 

Author: Finn RD 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 93 (release 1 .0) 

Gathering cutoffs: 25 15 

Trusted cutoffs: 29,70 29.70 

Noise cutoffs: 20 40 20.40 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Database Reference INTERPRO; IPR001 1 33; 

Database reference: PFAMB; PB006066; 

Number of members: 21 9 


PAN 


PDOC00376 


Apple domain 


Plasma kallikrein (EC 3.4.21 .34) and coagulation factor XI (EC 
3.4.21.27) are 

two related plasma serine proteases activated by factor XI IA and 
which share 

the same domain topology: an N-termina! region that contains 
four tandem 

repeats of about 90 amino acids and a C-terminal catalytic 
domain. 

The 90 amino-acid repeated domain contains 6 conserved 
cysteines. It has been 

shown [1 ,2] that three disulfide bonds link the first and sixth, 
second and 

fifth, and third and fourth cysteines. The domain can be drawn in 
the shape of 

an apple (see below) and has been accordingly called the apple 
domain 1 . 

XXX XXX 

x C— C x 
x x x x 
x Cxx x X 

x | x x x Schematic representation of an 
x Cx x x x apple domain. 
X X x x 

X X X X 
X XXX X 
X X 
XX XX 

C-C 

X X 

Apart from the cysteines, there are a number of other conserved 
positions in 

the apple domain. We have developed a pattern, that spans the 
complete domain, 

and which includes these conserved positions. 
Description of pattern (s) and/or profile(s) 

Consensus pattern C-x(3)-[LIVMFY]-x(5)-[LIVMFY]-x(3)-[DENQ]- 
[LIVMFY]-x(1 0)- C-x(3)-C-T-x(4)-C-x-[LIVMFY]-F-x-[FY]-x(1 3,1 4)- 
C-x- [LIVMFY]-[RKI-x-[STj-x(14,1 5)-S-G-x-[ST]-[LIVMFY]-x(2)-C 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

June 1 992 / Pattern and text revised. 
References 
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1] 

ulcMuIlen B.A., Fujikawa K., Davie E.W. 
3iochemistry 30:2050-2056(1991). 

2] 

VlcMulien B.A., Fujikawa K., Davie E.W. 
3iochemistry 30:2056-2060(1 991 ). 


PAP2 




PAP2 superfamily 


Accession number: PF01 569 

Definition: PAP2 superfamily 

Author: Bashton M, Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfam-B__486 (release 4.0) 

Gathering cutoffs: 16 16 

Trusted cutoffs: 22.00 22.00 

Noise cutoffs: 11.40 11.40 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97194074 

Reference Title: Identification of a novel phosphatase 
sequence motif. 

Reference Author: Stukey J, Carman GM; 
Reference Location: Protein Sci 1 997;6:469-472. 
Reference Number: [2] 
Reference Medline: 9740691 6 

Reference Title: An unexpected structural relationship 
between integral 

Reference Titie: membrane phosphatases and soluble 
haloperoxidases. 

Reference Author: Neuwald AF; 

Reference Location: Protein Sci 1997;6:1764-1 767. 

Database Reference INTERPRO; IPR000326; 

Database reference: PFAMB; PB021 1 1 3; 

Database reference: PFAMB; PB040926; 

Database reference: PFAMB; PB041096; 

Database reference: PFAMB; PB041301 ; 

Comment: This family includes the enzyme type 2 

phosphatidic acid 

Comment: phosphatase (PAP2). 
Number of members: 49 


PAPS_reduct 




Phosphoadenosine 
phosphosulfate 
reductase family 


Accession number: PF01507 

Definition: Phosphoadenosine phosphosulfate reductase 
family 

Author: Bashton M, Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfam-B„590 (release 4.0) 

Gathering cutoffs: 49 49 

Trusted cutoffs: 55.40 55.40 

Noise cutoffs: -34.60 -34.60 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 9741 1695 

Reference Title: Crystal structure of phosphoadenylyl 
sulphate (PAPS) 

Reference Title: reductase: a new family of adenine 
nucleotide alpha 

Reference Title: hydrolases. 

Reference Author: Savage H, Montoya G, Svensson C, 
Schwenn JD, Sinning I; 

Reference Location: Structure 1997;5:895-906. 

Reference Number: [2] 

Reference Medline: 96061 968 

Reference Title: Reaction mechanism of thioredoxm: 

Reference Title: 3'-phospho-adenylylsulfate reductase 

investigated by 

Reference Title: site-directed mutagenesis. 
Reference Author: Berendt U, Haverkamp T, Prior A, 
Schwenn JD; 

Reference Location: Eur J Biochem 1 995;233:347-356. 
Reference Number: [3] 
Reference Medline: 91 066949 
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Reference Title: ATP sulphury lase activity of the nodP and 
nodQ gene 

Reference Title: products of Rhizobium meliloti. 

Reference Author: Schwedock J } Long SR; 

Reference Location: Nature 1 990;348:644-647. 

Database Reference: SCOP; 1 sur; fa; [SCOP-USA][CATH- 

PDBSUM] 

Database Reference INTERPRO; IPR002500; 

Database Reference PDB; 1 sur ; 48; 21 5; 

Comment: This domain is found in phosphoadenosine 

phosphosulfate (PAPS) reductase 

Comment: enzymes or PAPS sulfotransf erase. PAPS 
reductase is part of the adenine 

Comment: nucleotide alpha hydrolases superfamily 
also including N type ATP PPases 

Comment: and ATP sulphurylases [1]. The enzyme 
uses thioredoxin as an electron 

Comment: donor for the reduction of PAPS to phospho- 
adenosine-phosphate (PAP) [1 ,2]. 

Comment: It is also found in NodP nodulation protein P 
from Rizobium which has ATP 

Comment: sulpurylase activity (sulfate adenylate 

transferase) [3j. 

Number of members: 48 


PARP 




Poly(ADP-ribose) 
polymerase catalytic 
region 


Accession number: PF00644 

Definition: Poly(ADP-ribose) polymerase catalytic region. 
Author: Bateman A 

Alignment method of seed: HMM_buiKJrom_alignment 

Source of seed members: Bateman A 

Gathering cutoffs: -59.4 -59.4 

Trusted cutoffs: -44.60 -44.60 

Noise cutoffs: -1 80.60 -1 80.60 

HMM build command line: hmmbuiid HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96353841 

Reference Title: Structure of the catalytic fragment of 
poly(AD-ribose) 

Reference Title: polymerase from chicken. 

Reference Author: Ruf A, Mennissier de Murcia J, de Murcia 

G, Schulz GE; 

Reference Location: Proc Natl Acad Sci U S A 1 996;93:7481 - 
7485. 

Reference Number: [2] 
Reference Medline: 93293867 

Reference Title: The carboxyl-termina! domain of human 
poly(ADP-ribose) 

Reference Title: polymerase. Overproduction in Escherichia 
coli, large scale 

Reference Title: purification, and characterization. 
Reference Author: Simonin F, Hofferer L, Panzeter PL, 
Muller S, de Murcia G, 
Reference Author: Althaus FR; 

Reference Location: J Biol Chem 1993;268:1 3454-13461 . 
Database Reference- SCOP; 1 paw; fa; [SCOP-USA] [CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR001 290; 

Database Reference PDB; 1 a26 ; 662; 997; 

Database Reference PDB; 1 pax ; 662; 997; 

Database Reference PDB; 2pax ; 662; 997; 

Database Reference PDB; 3pax ; 662; 997; 

Database Reference PDB; 4pax ; 662; 997; 

Database Reference PDB; 2paw ; 662; 1 009; 

Database reference: PFAMB; PB041409; 

Comment: Poly(ADP-ribose) polymerase catalyses the 

covalent 

Comment: attachment of ADP-ribose units from NAD+ 
to itself and 

Comment: to a limited number of other DNA binding 
proteins, which 

Comment: decreases their affinity for DNA. 
Comment: Poly(ADP-ribose) polymerase is a 
regulatory component 
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Comment: induced by DNA damage. 

Comment: The carboxyi-terminal region is the most 

highly conserved 

Comment: region of the protein. Experiments have 
shown that a 

Comment: carboxyi 40 kDa fragment is still catalytically 
active [2]. 

Number of members: 1 9 


PC_rep 




Proteasome/cyclosome 
repeat 


Accession number: PF01 851 

Definition: Proteasome/cyclosome repeat 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: [1 3 

Gathering cutoffs: 25 0 

Trusted cutoffs: 30.60 3. 00 

Noise cutoffs: 1 5.80 1 5.80 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97348748 

Reference Title: A repetitive sequence in subunits of the 
26S proteasome and 

Reference Title: 20S cyctosome (anaphase-promoting 
complex). 

Reference Author: Lupas A, Baumeister W, Hofmann K; 
Reference Location: Trends Biochem Sci 1 997;22:1 95-1 96. 
Database Reference INTERPRO; IPR00201 5; 
Database reference: PFAMB; PB009978; 
Database reference: PFAMB; PB040656; 
Number of members: 112 


PE 




PE family 


Accession number: PF00934 

Definition: PE family 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B„ 253 (release 3.0) 

Gathering cutoffs: -20 -20 

Trusted cutoffs: -1 0.80 -10.80 

Noise cutoffs: -20.60 -20.60 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98295987 

Reference Title: Deciphering the biology of Mycobacterium 
tuberculosis from 

Reference Title: the complete genome sequence. 
Reference Author: Cole ST, Brosch R, Parkhill J, Gamier T, 
Churcher C, 

Reference Author: Harris D, Gordon SV, Eiglmeter K, Gas S, 
Barry CE 3rd, 

Reference Author: Tekaia F, Badcock K, Basham D, Brown 
D, Chillingworth T, 

Reference Author: Connor R, Davies R, Devlin K, Feltwell T, 
Gentles S, Hamlin 

Reference Author: N, Holroyd S, Hornsby T, Jagels K, Barrell 
BG, et al; 

Reference Location: Nature 1 998; 393: 537-544. 
Database Reference INTERPRO; IPR000084; 
Comment: This family named after a PE motif near to 
the amino 

Comment: terminus of the domain. The PE family of 
proteins 

Comment: all contain an ami no-terminal region of 
about 110 

Comment: amino acids. The carboxyi terminus of this 
family 

Comment: are variable and fall into several classes. 
The 

Comment: largest class of PE proteins is the highly 
repetitive 

Comment: PGRS class which have a high glycine 
content. 

Comment: The function of these proteins is uncertain 
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but it 

Comment. has been suggested that they may be 
related to 

Comment: antigenic variation of Mycobacterium 

tuberculosis [1]. 

Number of members: 90 


Pep deformylase 




Polypeptide deformylase 


Accession number: PF01327 

Definition: Polypeptide deformylase 

Author: Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Sarah Teichmann 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 57.40 1 57.40 

Noise cutoffs: -29.00 -29.00 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 9700201 1 

Reference Title: A new subclass of the zinc 

metalloproteases superfamily 

Reference Title: revealed by the solution structure of peptide 
deformylase. 

Reference Author: Meinnel T, Blanquet S, Dardel F; 

Reference Location: J Mol Biol 1996;262:375-386. 

Reference Number: [2] 

Reference Medline: 98332750 

Reference Title: Solution structure of nickel-peptide 

deformylase. 

Reference Author: Dardel F, Ragusa S, Lazennec C, 
Bianquet S, Meinnel T; 

Reference Location: J Mol Biol 1998;280:501-513. 
Database Reference: SCOP; 1def; fa; [SCOP-USA] [CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR0001 81 ; 
Database Reference PDB; 2def ; 4; 142; 
Database Reference PDB; 1def ; 4; 142; 
Database Reference PDB; 1dff ; 4; 142; 
Database Reference PDB; 1 bsj A; 4; 1 42; 
Database Reference PDB; 1 bsk A; 4; 1 42; 
Database Reference PDB; 1 bs4 A; 4; 1 42; 
Database Reference PDB; 1 bs4 B; 504; 642; 
Database Reference PDB; 1 bs4 C; 1 004; 1 1 42; 
Database Reference PDB; 1 bs5 A; 4; 1 42; 
Database Reference PDB; 1 bs5 B; 504; 642; 
Database Reference PDB; 1 bs5 C; 1 004; 1 1 42; 
Database Reference PDB; 1 bs6 A; 4; 1 42; 
Database Reference PDB; 1 bs6 B; 504; 642; 
Database Reference PDB; 1 bs6 C; 1 004; 1 1 42; 
Database Reference PDB; 1 bs7 A; 4; 1 42; 
Database Reference PDB; 1 bs7 B; 504; 642; 
Database Reference PDB; 1 bs7 C; 1 004; 1 1 42; 
Database Reference PDB; 1 bs8 A; 4; 1 42; 
Database Reference PDB; 1 bs8 B; 504; 642; 
Database Reference PDB; 1 bs8 C; 1 004; 1 1 42; 
Database Reference PDB; 1 bsz A; 4; 1 42; 
Database Reference PDB; 1 bsz B; 504; 642; 
Database Reference PDB; 1 bsz C; 1 004; 1 1 42; 
Database Reference PDB; 1 icj A; 4; 1 42; 
Database Reference PDB; 1 icj B; 504; 642; 
Database Reference PDB; 1 icj C; 1004; 1 142; 
Database reference: PFAMB; PB041 251 ; 
Number of members: 25 


Peptidase C1 5 




Pyrogiutamyi peptidase 


Accession number: PF01470 

Definition: Pyrogiutamyi peptidase 

Author: Bateman A 

Alignment method of seed: Clustalw_manual 

Source of seed members: [1 ] 

Gathering cutoffs: 25 25 

Trusted cutoffs: 436.10 436.10 

Noise cutoffs: -1 55.40 -1 55.40 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 
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Reference Number: [1] 
Reference Medline: 9921 6536 

Reference Title: The crystal structure of pyroglutamyl 
peptidase I from 

Reference Title: bacillus amyloliquefaciens reveals a new 
structure for a 

Reference Title: cysteine protease. 

Reference Author: Odagaki Y, Hayashi A, Okada K, Hirotsu 

K, Kabashima T, Ito 

Reference Author: K, Yoshimoto T, Tsuru D, Sato M, Clardy 
J 

Reference Location: Structure 1 999; 7: 399-41 1 . 

Database Reference: SCOP; 1 aug; fa; [SCOP-USA] [CATH- 

PDBSUM] 

Database Reference MEROPS; C1 5; 
Database Reference I NTERPRO; IPR00081 6; 
Database Reference PDB; 1 a2z A; 2; 209; 
Database Reference PDB; 1 a2z B; 2; 209; 
Database Reference PDB; 1 a2z C; 2; 209; 
Database Reference PDB; 1 a2z D; 2; 209; 
Database Reference PDB; 1 aug A; 3; 204; 
Database Reference PDB; 1 aug B; 21 3; 41 4; 
Database Reference PDB; 1 aug C; 423; 624; 
Database Reference PDB; 1aug D; 633; 834; 
Number of members: 1 0 


Peptidase_M20 


PDOC00613 


ArgE / dapE / ACY1 / 
CPG2 / yscS family 
signatures 


The following enzymes have been shown [1 ,2,3] to be 
evolutionary and 
Functionally related: 

- In the biosynthetic pathway from glutamate to arginme, the 
removal of an 

acetyl group from N2-acetyi ornithine can be catalyzed via two 
distinct 

enzymatic strategies depending on the organism. In some 
bacteria and in 

fungi, the acetyl group is transferred on glutamate by 
giutamate 

acetyltransferase (EC 2.3.1.35) while in enterobacteria such as 
Escherichia 

coli, it is hydrolyzed by acety (ornithine deacetylase (EC 
3.5.1.16) 

(acetylornithinase) (AO) (gene argE). AO is a homodimeric 
cobalt-dependent 

enzyme which displays broad specificity and can also 
deacylates substrates 

such as acetylarginine, acetylhistidine, acetylglutamate 
semialdehyde, etc. 

- Succinyldiaminopimelate desuccinylase (EC 3.5.1 .1 8) (SDAP) 
(gene dapE) is 

the enzyme which catalyzes the fifth step in the biosynthesis 
of lysine 

from aspartate semialdehyde: the hydrolysis of succinyl- 
diaminopimeiate to 

diaminopimelate and succinate. SDAP is an enzyme that 
requires cobalt or 

zinc as a cof actor. 

- Aminoacylase-1 [4] (EC 3.5.1.14) (N-acyl-l-amino-acid 
amidohydrolase) 

(ACY1). ACY1 is a homodimeric zinc-binding mammalian 
enzyme that catalyzes 

the hydrolysis of N-alpha-acyiated amino acids (except for 
aspartate). 

- Carboxypeptidase G2 (EC 3.4.17.11) (folate hydrolase G2) 
(gene cpg2) from 

Pseudomonas strain RS-16. This enzyme catalyzes the 
hydrolysis of reduced 

and non-reduced folates to pteroates and glutamate. G2 is a 
homodimeric 

zinc-dependent enzyme. 

- Vacuolar carboxypeptidase S (EC 3.4.17.4) (yscS) from yeast 
(gene CPS1). 

- Peptidase T (EC 3.4.11.-) (gene pepT) (tripeptidase) from 
bacteria. This 
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enzyme catalyzes a variety of tripeptides containing N-terminat 
nethionine, 

leucine, or phenylalanine. 
- Xaa-His dipeptidase (EC 3.4.13.3) (carnosinase) from 
Lactobacillus (gene 

pepV) [5], a metalloenzyme with activity against beta-alanyl- 
dipeptides 

including carnosine (beta-alanyl-histidine). 

These enzymes share a few characteristics. They hydrolyse 
peptidic bonds in 

Substrates that share a common structure, they are dependent on 
cobalt or zinc 

For their activity and they are proteins of 40 Kd to 60 Kd with a 
number of 

Regions of sequence similarity. 

As signature patterns for these proteins, we selected two of the 
conserved 

Regions. The first pattern contains a conserved histidine 
which could be 

Involved in binding metal ions and the second pattern contains 
a number of 

Conserved charged residues. 
Description of pattern (s) and/or profile(s) 

Consensus pattern ELIV3-[GALMY]-[LlVMF]-x-[GSA]-H-x-D-UV]- 
[STAvl 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWiSS-PROT 6. 

Consensus pattern [GSTAi]-[SANQ]-D-x-K-[GSACN]-x<2)- 
P_IVMA]-x(2)-[UVMFY]- x(1 4,1 7}-[LlVM]-x-[LlVMF]-[LIVMSTAG]- 
[LtVMFA3-x(2)-[DNG]- E-E-x-[GSTN] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note these proteins belong to families M20A/M20B in the 
classification of peptidases [6 3 E1]. 
Last update 

November 1997 / Patterns and text revised. 
References 
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Peptidase _M3 


PDOC00129 


Neutral zinc 

metallopeptidases, zinc- 
binding region signature 


The majority of zinc-dependent metallopeptidases (with the 
notable exception 

Of the carboxypeptidases) share a common pattern of primary 
structure [1 ,2,3] 

in the part of their sequence involved in the binding of zinc, 
and can be 

grouped together as a superfamily^nown as the metzincins, on 
the basis of 

this sequence similarity. They can be classified into a number 
of distinct 

families [4 ,E1] which are listed below along with the proteases 
which are 

currently known to belong to these families. 
Family M1 

- Bacterial aminopeptidase N (EC 3.4. 11 .2) (gene pepN). 

- Mammalian aminopeptidase N (EC 3.4.1 1 .2). 

- Mammalian glutamyl aminopeptidase (EC 3.4 1 1 .7) 
(aminopeptidase A). It may 

play a role in regulating growth and differentiation of early B- 
lineage 
cells. 

- Yeast aminopeptidase yscll (gene APE2). 

- Yeast alanine/arginine aminopeptidase (gene AAP1). 

- Yeast hypothetical protein YIL1 37c. 

- Leukotriene A-4 hydrolase (EC 3.3.2.6). This enzyme is 
responsible for the 

hydrolysis of an epoxide moiety of LTA-4 to form LTB-4; it has 
been shown 

that it binds zinc and is capable of peptidase activity. 
Family M2 

- Angiotensin-converting enzyme (EC 3.4.15.1) (dipeptidyl 
carboxypeptidase I) 

(ACE) the enzyme responsible for hydrolyzing angiotensin i to 
angiotensin 

IL There are two forms of ACE: a testis-specific isozyme and 
a somatic 
isozyme which has two active centers. 

Family M3 

-Thimet oligopeptidase (EC 3.4.24.15), a mammalian enzyme 
involved in the 
cytoplasmic degradation of small peptides. 

- Neurolysin (EC 3.4.24.16) (also known as mitochondrial 
oligopeptidase M or 

microsomal endopeptidase). 

- Mitochondria! intermediate peptidase precursor (EC 3.4.24.59) 
(MIP). It is 

involved the second stage of processing of some proteins 
imported in the 
mitochondrion. 

- Yeast saccharolysin (EC 3.4.24.37) (proteinase yscD). 

- Escherichia coli and related bacteria dipeptidyl 
carboxypeptidase 

{EC 3.4.15.5) (gene dcp). 

- Escherichia coli and related bacteria oligopeptidase A (EC 
3.4.24.70) (gene 

opdA or prIC). 

- Yeast hypothetical protein YKL134c. 

Family M4 

- Thermostable thermolysins (EC 3.4.24.27), and related 
thermolabile neutral 

proteases (baciilolysins) (EC 3.4.24.28) from various species of 
Bacillus. 

- Pseudolysin (EC 3.4.24.26) from Pseudomonas aeruginosa 
(gene iasB). 

- Extracellular eiastase from Staphylococcus epidermidis. 

- Extracellular protease prt1 from Erwinia carotovora. 

- Extracellular minor protease smp from Serratia marcescens. 

- Vibriolysin (EC 3.4.24.25) from various species of Vibrio. 
[ - Protease prtA from Listeria monocytogenes. 
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- Extracellular proteinase proA from Legionella pneumophila. 








Family M5 








- Mycolysin (EC 3.4.24.31) from Streptomyces cacaoi. 








Pamil\/ d/lft 

i ctiiiiiy ivio 








- immune inhibitor A from Bacillus thuringiensis {gene ina). Ina 








degrades two 








classes of insect antibacterial proteins, attacms and cecropins. 








Family M7 








- Streptomyces extracellular small neutral proteases 








Family M8 








- Leishmanolysin (EC 3.4.24.36) (surface glycoprotein gp63), a 








OCH out la^c 








protease from various species of Leishmania. 








Family M9 








- Microbial coilagenase (EC 3.4.24.3) from Clostridium 








perfringens and Vibrio 








alginolyticus. 








Familv M1 OA 
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- Serralysin (EC 3.4.24.40), an extracellular metalioprotease from 








Serratia. 








- Alkaline metal loproteinase from Pseudomonas aeruginosa 








(gene aprA). 








- Secreted proteases A, B, C and G from Erwinia chrysanthemi. 








- Yeast hypothetical protein YIL108w. 








Family Mi OB 








- Mammalian extracellular matrix metailoproteinases (known as 








matrixins) [5]: 








MMP-1 (EC 3.4.24.7) (interstitial coilagenase), MMP-2 (EC 








3.4.24.24) (72 Kd 








gelatinase), MMP-9 (EC 3.4.24.35) (92 Kd gelatinase), MMP-7 








(EC 3.4.24.23) 








(matrylisin), MMP-8 (EC 3.4.24.34) (neutrophil 








coilagenase), MMP-3 








(EC 3.4 24.17) (stromelysin-1), MMP-1 0 (EC 3.4.24.22) 








(stromelysin-2), and 








MMP-1 1 (stromelysin-3), MMP-12 {EC 3.4.24.65) (macrophage 








1 1 1 1; Id 1 liJ g I uC3 Ido t; j . 








- Sea urchin hatching enzyme (envefysin) (EC 3.4.24.12). A 








protease that 








piinwc; thp pmhrvo to diaest the Drotective envelope derived 








from the egg 








extracellular matrix. 








- Soybean metal loendoproteinase 1 . 








Family M11 








- Chlamydomonas reinhardtii gamete lytic enzyme (GLE). 








Family M12A 








- Astacin (EC 3.4.24.21), a crayfish endoprotease. 








- Meprin A (EC 3.4.24.1 8), a mammalian kidney and intestinal 








brush border 








metalloendopeptidase. 








- Bone morphogenic protein 1 (BMP-1), a protein which induces 








cartilage and 








bone formation and which expresses metalloendopeptidase 








activity. The 








Drosophila homolog of BMP-1 is the dorsal-ventral 








patterning protein 








toltoid. 








- Blastula protease 10 (BP 10) from Paracentrotus lividus and 








the related 








protein SpAN from Strongylocentrotus purpuratus. 








- Caenorhabditis elegans protein toh-2. 








- Caenorhabditis elegans hypothetical protein F42A10.8. 








- Choriolysins L and H (EC 3.4.24.67) (also known as 








embryonic hatching 








proteins LCE and HCE) from the fish Oryzias lapides. 
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These proteases 

participates in the breakdown of the egg envelope, which is 
derived from 

the egg extracellular matrix, at the time of hatching. 
Family M12B 

- Snake venom metailoproteinases [6]. This subfamily mostly 
groups proteases 

that act in hemorrhage. Examples are: adamalysin ll (EC 
3.4.24.46), 

atrolysinC/D (EC 3.4.24.42), atrolysin E (EC 3.4.24.44), 
fibrolase 

(EC 3.4.24.72), trimerelysm I (EC 3.4.25.52) and li (EC 
3.4.25.53). 

- Mouse cell surface antigen MS2. 
Family M13 

- Mammalian neprilysin (EC 3.4.24.11) (neutral endopeptidase} 
(NEP). 

- Endothelin-converting enzyme 1 (EC 3.4.24.71) (ECE-1), which 
process the 

precursor of endothelin to release the active peptide. 

- Keli blood group glycoprotein, a major antigenic protein of 
erythrocytes. 

The Kell protein is very probably a zinc endopeptidase. 

- Peptidase O from Lactococcus lactis (gene pepO). 

Family M27 

- Clostridial neurotoxins, including tetanus toxin (TeTx) and the 
various 

botulinum toxins (BoNT). These toxins are zinc proteases 
that block 

neurotransmitter release by proteolytic cleavage of synaptic 
proteins such 
as synaptobrevins, syntaxin and SNAP-25 [7,8]. 

Family M30 

- Staphylococcus hyicus neutral metalloprotease. 
Family JVI32 

- Thermostable carboxypeptidase 1 (EC 3.4.1 7.1 9) 
(carboxypeptidase Taq), an 

enzyme from Therm us aquaticus which is most active at high 
temperature. 

Family M34 

-Lethal factor (LF) from Bacillus anthracts, one of the three 
proteins 
composing the anthrax toxin. 

Family M35 

- Deuteroiysin (EC 3.4.24.39) from Peniciilium citrinum and 
related proteases 

from various species of Aspergillus. 

Family M36 

- Extracellular elastinolytic metailoproteinases from Aspergillus. 

From the tertiary structure of thermolysin, the position of the 
residues 

acting as zinc ligands and those involved in the catalytic activity 
are known. 

Two of the zinc ligands are histidines which are very close 
together in the 

sequence; C-terminal to the first histidine is a glutamic acid 
residue which 

acts as a nucleophile and promotes the attack of a water 
molecule on the 

carbonyl carbon of the substrate. A signature pattern which 
includes the two 

histidine and the glutamic acid residues is sufficient to detect 
this 

superfarnily of proteins. 
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Description of pattern(s) and/or profile(s) 








Consensus pattern [GSTALIVN]-x(2)-H-E-[LlVMFYWHDEHRKP}- 
H-x-[LIVMFYWGSPQ] P~he two H's are zinc ligands] [E is the 
active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL, except for members of families M5, M7 amd M1 1 . 
Other sequence(s) detected in SWISS-PROT 57; including 
Neurospora crassa conidiation-specific protein 13 which could be 
a zinc-protease. 
Last update 

July 1999 / Text revised. 
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Peptidase_M48 




Peptidase family M48 


Accession number: PF01435 

Definition: Peptidase family M48 

Author: Bateman A 

Alignment method of seed: Clusta!w_manual 

Source of seed members: Swiss-Prot 

Gathering cutoffs: -35 -35 

Trusted cutoffs: -34.00 -34.00 

Noise cutoffs: -42.20 -42.20 

HMM build command tine: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Database Reference MEROPS; M48; 

Database Reference INTERPRO; IPR001 91 5; 

Database reference: PFAMB; PB008839; 

Database reference: PFAMB; PB041497; 

Number of members: 28 


Peptidase_S24 




Peptidase family S24 


Accession number: PF00717 

Definition: Peptidase family S24 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_61 6 (release 2.1) 

Gathering cutoffs: 1 1 

Trusted cutoffs: 2.00 2.00 

Noise cutoffs: -9.00 -9.00 
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HMM build command line: hmmbuild HMM SEED 

-IMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference MEROPS; S24; 

Database Reference: SCOP; 1 umu; fa; [SCOP-USA] [CATH- 

3 DBSUM] 

Database Reference INTERPRO; 1PR000129; 
Database Reference PDB; 1adr ; 72; 76; 
Database Reference PDB; limb 3; 89; 92; 
Database Reference PDB; 1 1mb 4; 89; 92; 
Database Reference PDB; 1 leb ; 66; 72; 
Database Reference PDB; 1umu A; 32; 123; 
Database Reference PDB; 1umu B; 32; 123; 
Database Reference PDB; 1ay9 A; 32; 123; 
Database Reference PDB; 1 ay9 B; 32; 1 23; 
Database reference: PFAMB; PB005958; 
Database reference: PFAMB; PB041 1 46; 
Database reference: PFAMB; PB041823; 
dumber of members: 42 


Peptidase_S8 


PDOC00125 


Serine proteases, 
subtilase family, active 
sites 


Subtilases [1,2] are an extensive family of serine proteases 
whose catalytic 

activity is provided by a charge relay system similar to that of the 
trypsin 

family of serine proteases but which evolved by independent 
convergent 

evolution. The sequence around the residues involved in the 
catalytic triad 

(aspartic acid, serine and histidine) are completely different from 
that of 

the analogous residues in the trypsin serine proteases and can 
be used as 

signatures specific to that category of proteases. 

The subtilase family currently includes the following proteases: 

- Subtilisins (EC 3.4.21 .62), these alkaline proteases from 
various Bacillus 

species have been the target of numerous studies in the past 
thirty years. 

- Alkaline elastase YaB from Bacillus sp. {gene ate). 

- Alkaline serine exoprotease A from Vibrio aiginolyticus (gene 
proA). 

- Aqualysin I from Thermus aquaticus (gene pstl). 

- AspA from Aeromonas salmonicida. 

- Bacillopeptidase F (esterase) from Bacillus subtil is (gene bpf). 

- C5A peptidase from Streptococcus pyogenes (gene scpA). 

- Cell envelope-located proteases PI, PI I, and Pill from 
Lactococcus lactis. 

- Extracellular serine protease from Serratia marcescens. 

- Extracellular protease from Xanthomonas campestris. 

- Intracellular serine protease (ISP) from various Bacillus. 

- Minor extracellular serine protease epr from Bacillus subtil is 
(gene epr) . 

- Minor extracellular serine protease vpr from Bacillus subtilis 
(gene vpr). 

- Nisin leader peptide processing protease nisP from Lactococcus 
lactis. 

- Serotype-specific antigene 1 from Pasteurella haemolytica 
(gene ssal). 

- Thermitase (EC 3.4.21 .66) from Thermoactinomyces vulgaris. 

- Calcium -dependent protease from Anabaena variabilis (gene 
prcA). 

- Halolysin from halophilic bacteria sp. 172p1 (gene hly). 

- Alkaline extracellular protease (AEP) from Yarrowia lipolytica 
(gene xpr2). 

- Alkaline proteinase from Cephalosporium acremonium (gene 
alp). 

- Cerevisin (EC 3.4.21 .48) (vacuolar protease B) from yeast 
(gene PRB1). 

- Cuticle-degrading protease (pr1) from Metarhizium anisopliae. 

- KEX-1 protease from Kluyveromyces lactis. 



Attorney No. 2750-1237P 



966 



Pfam 1 


^roslte J 


-uli Name 


description 








- Kexin (EC 3.4.21 .61 ) from yeast (gene KEX-2). 

- Oryzin (EC 3.4 21 .63) (alkaline proteinase) from Aspergillus 
;gene alp). 

- Proteinase K (EC 3.4.21 .64) from Tritirachium album (gene 
proK). 

- Proteinase R from Tritirachium album (gene proR). 

- Proteinase T from Tritirachium album (gene proT). 

- Subtilisin-like protease Ml from yeast (gene YSP3). 

- Thermomycolin (EC 3.4.21 .65) from Malbranchea sulfurea. 

- Furin (EC 3.4.21 .85), neuroendocrine convertases 1 to 3 
(NEC-1 to -3) and 

PACE4 protease from mammals, other vertebrates, and 
nvertebrates. These 

proteases are involved in the processing of hormone 
precursors at sites 

comprised of pairs of basic amino acid residues [3]. 
-Tripeptidyl-peptidase II (EC 3.4.14.10) (tripeptidyi 
aminopeptidase) from 

Human. 

- Prestatk-specific proteins tagB and tagC from slime mold [4]. 
Both proteins 

consist of two domains: a N-terminal subtilase catalytic 
domain and a C- 
terminal ABC transporter domain (see <PDOC00185>). 

Description of pattern (s) and/or profile(s) 

Consensus pattern [STAIV]-x-[LIVMFl-[LlVM]-D-[DSTA]-G- 
[LIVMFC]-x(2,3)-[DNH] [D is the active site residue] 
Sequences known to belong to this class detected by the pattern 
the majority of subtilases with a few exceptions. 
Other sequence(s) detected in SWISS-PROT 44. 

Consensus pattern H-G-[STM]-x-[VIC3-[STAGC]-[GS]-x-[LiVMA]- 
[STAGCLV]-[SAGM] [H is the active site residue] 
Sequences known to belong to this class detected by the pattern 
ALL, except for aspA and ssal which both seem to lack the 
histidine active site. 

Other sequence(s) detected in SWISS-PROT adenylate cyclase 
type VIII. 

Consensus pattern G-T-S-x-[SA]-x-P-x{2)-[STAVC]-[AG] [S is the 
active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL, except for nisP, tagC and S.marcescens extracellular serine 
protease. 

Other sequence(s) detected in SWISS-PROT 6. 

Note if a protein includes at least two of the three active site 
signatures, the probability of it being a serine protease from the 
subtilase family is 100% 

Note these proteins belong to family S8 in the classification of 
peptidases [5,E1]. 
Expert(s) to contact by email 
Brannigan J. jab5@vaxa.york.ac.uk 

Siezen R.J. siezen@nizo.nl 

Last update 

November 1997 / Patterns and text revised. 
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Peptidase_S9 


PDOC00587 


Prolyl oligopeptidase 
family serine active site 


The prolyl oligopeptidase family [1,2,3] consist of a number of 
evolutionary 

related peptidases whose catalytic activity seems to be provided 
by a charge 

relay system similar to that of the trypsin family of serine 
proteases, but 

which evolved by independent convergent evolution. The known 
members of this 
family are listed below. 

- Prolyl endopeptidase (EC 3.4.21.26) (PE) (also called post- 
proline cleaving 

enzyme). PE is an enzyme that cleaves peptide bonds on the 
C-terminai side 

of prolyl residues. The sequence of PE has been obtained from 
a mammalian 

species (pig) and from bacteria (Flavobacterium 
meningosepticum and 

Aeromonas hydrophila); there is a high degree of sequence 
conservation 

between these sequences. 

- Escherichia coli protease I! (EC 3.4.21.83) (oligopeptidase B) 
(gene prtB) 

which cleaves peptide bonds on the C-terminal side of lysyl 
and argininyl 
res id ues 

- Dipeptidyl peptidase IV (EC 3.4.14.5) (DPP IV). DPP IV is an 
enzyme that 

removes N-terminal dipeptides sequentially from 
polypeptides having 

unsubstituted N -termini provided that the penultimate residue is 
proline. 

- Yeast vacuolar dipeptidyl aminopeptidase A (DPAP A) (gene: 
STE13) which is 

responsible for the proteolytic maturation of the alpha-factor 
precursor. 

- Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene: 
DAP2). 

- Acytamino-acid-releasing enzyme (EC 3.4.19.1) (acyl-peptide 
hydrolase). 

This enzyme catalyzes the hydrolysis of the ammo-termina! 
peptide bond of 

an N-acetylated protein to generate a N-acetylated amino acid 
and a protein 

with a free amino-terminus. 

A conserved serine residue has experimentally been shown (in 
E.coli protease 

lias well as in pig and bacterial PE) to be necessary for the 
catalytic 

mechanism. This serine, which is part of the catalytic triad (Ser, 
His, Asp), 

is generally located about 1 50 residues away from the C-terminal 
extremity of 

these enzymes (which are all proteins that contains about 700 

to 800 amino 

acids). 



Attorney No. 2750-1237P 



968 



Ptam 


Prosfte 


Full Name 


description 








Description of pattern (s) and/or profile(s) 

Consensus pattern D-x(3)-A-x(3)-[LlVMPW]-x(14)-G-x-S-x-G-G- 

;LIVMFYW](2) [S is the active site residue] 

Sequences known to belong to this class detected by the pattern 

ALL, except for yeast DPAP A. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note these proteins belong to families S9A/S9B/S9C in the 
classification of peptidases [4,E1]. 
Last update 

November 1997 / Text revised. 
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Peptidase family U7 


Accession number: PF01 343 

Definition: Peptidase family U7 

Author: Bateman A 

Alignment method of seed: Ciustaiw 

Source of seed members: Pfam-B„707 (release 2.1 ) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 47.60 47.60 

Noise cutoffs: -55.60 -55.60 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcafibrate -seed 0 HMM 

Database Reference MEROPS; U7; 

Database Reference I NTERPRO; IPR0021 42; 

Number of members: 37 


PEP-utilizers 


PDOC00527 


PEP-utiiizing enzymes 
signatures 


A number of enzymes that catalyze the transfer of a 
phosphoryi group from 

phosphoenol pyruvate (PEP) via a phospho-histidine intermediate 
have been shown 

to be structurally related [1,2,3,4]. These enzymes are: 

- Pyruvate,orthophosphate dikinase (EC 2.7.9.1) (PPDK). 
PPDK catalyzes the 

reversible phosphorylation of pyruvate and phosphate by 
ATP to PEP and 

diphosphate. In plants PPDK function tn the direction of the 
formation of 

PEP, which is the primary acceptor of carbon dioxide in C4 and 
crassulacean 

acid metabolism plants. In some bacteria, such as 
Bacteroides symbiosus, 

PPDK functions in the direction of ATP synthesis. 

- Phosphoenolpyruvate synthase (EC 2.7.9.2) (pyruvate,water 
dikinase). This 

enzyme catalyzes the reversible phosphorylation of pyruvate by 
ATP to form 

PEP, AMP and phosphate, an essential step in 
gluconeogenesis when pyruvate 

and lactate are used as a carbon source. 

- Phosphoenolpyruvate-protein phosphotransferase (EC 2.7.3.9). 
This is the 

first enzyme of the phosphoenol pyruvate-dependent sugar 
phosphotransferase 
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system (PTS), a major carbohydrate transport system in 
bacteria. The PTS 

catalyzes the phosphorylation of incoming sugar substrates 
concomitant 

with their translocation across the cell membrane. The general 
mechanism 

of the PTS is the following: a phosphoryl group from PEP is 
transferred 

to enzyme-l (EI) of PTS which in turn transfers it to a 

phosphoryl carrier 
protein (HPr). Phospho-HPr then transfers the phosphoryl 

group to a sugar- 
specific permease. 

All these enzymes share the same catalytic mechanism: they 
bind PEP and 

transfer the phosphoryl group from it to a histidine residue. The 
sequence 

around that residue is highly conserved and can be used as a 
signature pattern 

for these enzymes. As a second signature pattern we selected 
a conserved 

region in the C-terminal part of the PEP-utiiizing enzymes. The 
biological 

significance of this region is not yet known. 



Description of pattern (s) and/or profile(s) 

Consensus pattern G-[GA]-x-[STN]-x-H-[STA]-[STAV]-[LiVM](2)- 
[STAVHRG] [H is phosphorylated] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequenced) detected in SWISS-PROT NONE. 

Consensus pattern [DEQSK]-x-[LlVMF]-S-[LIVMF3-G-[ST[-N-D- 
[LIVM]-x-Q- [LlVMFYGT]-[STALIV]-[LlVMFY]-[GASl-x(2)-R 
Sequences known to belong to this ciass detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1999 / Patterns and text revised. 
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Putative peptidoglycan 
binding domain 



Accession number: PF01476 

Definition: Putative peptidoglycan binding domain 

Author: Bateman A 

Alignment method of seed: HMM_buift_from_alignment 

Source of seed members: Bateman A 

Gathering cutoffs: 22 22 

Trusted cutoffs: 22.40 22.10 

Noise cutoffs: 21.10 21.10 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 92324582 
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Reference Title: Modular design of the Enterococcus hirae 
nuramidase-2 and 

q e f erence Title: Streptococcus faecatis autolysm. 
Reference Author: Joris B, Englebert S, Chu CP, Kariyama 
=1, Daneo-Moore L, 

Reference Author: Shockman GD, Ghuysen JM; 

Reference Location: FEMS Microbiol Lett 1 992;70:257-264. 

Database Reference INTERPRO; 1PR002482; 

Database reference: PFAMB; PB019287; 

Database reference: PFAMB; PB040847; 

Database reference: PFAMB; PB040977; 

Comment: This domain is about 40 residues iong. It is 

Found in a variety 

Comment: of enzymes involved in bacterial cell wall 
degradation [1]- This 

Comment: domain may have a general peptidoglycan 

binding function. 

Number of members: 197 


phoslip 


PDOC00109 


Phospholipase A2 active 
sites signatures 


Phospholipase A2 (EC 3.1 .1 .4) (PA2) [1 ,2] is an enzyme which 
releases fatty 

acids from the second carbon group of glycerol. PA2*s are 
small and rigid 

proteins of 120 amino-acid residues that have four to seven 
disulfide bonds. 

PA2 binds a calcium ion which is required for activity. The side 
chains of two 

conserved residues, a histidine and an aspartic acid, 
participate in a 
'catalytic network'. 

Many PA2's have been sequenced from snakes, lizards, bees 
and mammals. In the 

latter, there are at least four forms: pancreatic, membrane- 
associated as weil 

as two less characterized forms. The venom of most snakes 
contains multiple 

forms of PA2. Some of them are presynaptic neurotoxins 
which inhibit 

neuromuscular transmission by blocking acetylcholine release 

from the nerve 

termini. 

We derived two different signature patterns for PA2's. The first is 
centered 

on the active site histidine and contains three cysteines 
involved in 

disulfide bonds. The second is centered on the active site 
aspartic acid and 

also contains three cysteines involved in disulfide bonds. 
Description of pattern(s) and/or profile(s) 

Consensus pattern C-C-x(2)-H-x(2)-C [H is the active site residue] 
Sequences known to belong to this class detected by the pattern 
ALL known functional PA2's. However, this pattern will not detect 
some snake toxins homologous with PA2 but which have lost their 
catalytic activity as well as otoconin-22, a Xenopus protein from 
the aragonitic otoconia which is also unlikely to be enzymatically 
active. 

Other sequence(s) detected in SWISS-PROT 15. 

Consensus pattern [LIVMA]-C-{LiVMFYWPCST}-C-D-x(5)~C [D is 
the active site residue] 

Sequences known to belong to this class detected by the pattern 

the majority of functional and non-functional PA2's. Undetected 

sequences are bee PA2, giia monster PA2's, PA2 PL-Xfrom habu 

and PA2 PA-5 from mulga. 

Other sequence(s) detected in SWISS-PROT 12. 

Expert(s) to contact by email 

Seilhamer J. J. ieff@incyte.com 
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PI3_PI4_kinase 


PDOC00710 


Phosphatidylinositol 3- 
and 4-kinases signatures 


Phosphatidylinositol 3-kinase (PI3-kinase) (EC 2.7.1.137) [1] is 
an enzyme 

that phosphorylates phosphoinositides on the 3-hydroxyi group of 
the inositol 

ring. The exact function of the three products of P!3-kinase - 
PI-3-P, 

PI-3,4-P(2) and PI-3,4,5-P(3) - is not yet known, although it is 
proposed that 

:hey function as second messengers in ceil signalling. Currently, 
three forms 

of PI3-kinase are known: 

- The mammalian enzyme which is a heterodimer of a 1 10 Kd 
catalytic chain 

(p110)and an 85 Kd subunit (p85) which allows it to bind to 
activated 

tyrosine protein kinases. There are at least two different types 
of p100 
subunits (alpha and beta). 

-Yeast TOR1/DRR1 and TOR2/DRR2 [2], PI3-kinases 
required for cell cycle 
activation. Both are proteins of about 280 Kd. 

- Yeast VPS34 [3], a PI3-kinase involved in vacuolar sorting and 
segregation. 

VPS34 is a protein of about 100 Kd. 

- Arabidopsis thaiiana and soybean VPS34 homologs. 

Phosphatidylinositol 4-kinase (PI4-kinase) (EC 2.7.1.67) [4] is 
an enzyme 

that acts on phosphatidyl inositol (PI) in the first committed step 
in the 

production of the second messenger inositol-1 ,4,5,- 

trisphosphate. Currently 

the following forms of PI4-kinases are known: 

- Human PI4-kinase alpha. 

- Yeast PIK1 , a nuclear protein of 120 Kd. 

- Yeast STT4, a protein of 214 Kd. 

The PI3- and P!4-kinases share a well conserved domain at 
their C-terminal 

section; this domain seems to be distantly related to the catalytic 
domain of 

protein kinases [2]. We developed two signature patterns from 
the best 

conserved parts of this domain. 

Four additional proteins belong to this family: 

- Mammalian FKBP-rapamycin associated protein (FRAP) [5], 
which acts as the 

target for the cell-cycle arrest and immunosuppressive 
effects of the 
FKBP1 2-rapamycin complex. 

- Yeast protein ESR1 [6] which is required for cell growth, DNA 
repair and 

meiotic recombination. 

- Yeast protein TEL1 which is involved in controlling telomere 
length. 

- Yeast hypothetical protein YHR099w, a distantly related 
member of this 

family. 
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- Fission yeast hypothetical protein SpAC22E12.16C. 
Description of pattern (s) and/or profile(s) 

Consensus pattern [LIVMFAC]-K-x(1,3)-[DEA]-[DE]-[LIVMC]-R-Q- 
[DE]-x(4)-Q 

Sequences known to belong to this class detected by the pattern 

ALL, except for yeast YHR099w. 

Other sequence(s) detected in SWISS- PROT NONE. 

Consensus pattern [GS]-x-[AV]-x(3)-[LIVM]-x(2)-[FYHHLIVM](2)- 
x-[LIVMF]-x- D-R-H-x<2)-N 

Sequences known to belong to this class detected by the pattern 

ALL, except for yeast YHR099W. 

Other sequence(s) detected in SWISS-PROT NONE. 

Last update 

November 1997 / Patterns and text revised. 
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P-ll 


PDOC00439 


P-ll protein signatures 


The p_|j protein (gene glnB) is a bacterial protein important for 
the control 

ofglutamine synthetase [1,2,3]. In nitrogen-limiting conditions, 
when the 

ratio of giutamine to 2-ketoglutarate decreases, P-l! is 
uridylyiated on a 

tyrosine residue to form P-H-UMP. P-II-UMP allows the 
deadenylation of 

giutamine synthetase (GS), thus activating the enzyme. 
Conversely, in nitrogen 

excess, P-II-UMP is deuridylated and then promotes the 
adenylation of GS. P-ll 

also indirectly controls the transcription of the GS gene (glnA) by 
preventing 

NR-li (ntrB) to phosphorylate NR-I (ntrC) which is the 
transcriptional 

activator of glnA. Once P-ll is uridylyiated, these events are 
reversed. 

P-ll is a protein of about 110 amino acid residues extremely well 
conserved. 

The tyrosine which is urydylated is located in the central part 

of the 

protein. 
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pilin 



PDOC00342 



Prokaryotic N-terminal 
methylation site 



In cyanobacteria, P-lt seems to be phosphorylated on a serine 
residue rather 
than being urydylated. 

In methanogenic archaebacteria, the nitrogenase iron protein 
gene (nifH) is 

followed by two open reading frames highly similar to the 
eubacterial P-H 

protein [4]. These proteins could be involved in the regulation of 

nitrogen 

fixation. 

In the red alga, Porphyra purpurea, there is a ginB homolog 
encoded in the 
chloroplast genome. 

Other proteins highly similar to ginB are: 

- Bacillus subtil is protein nrgB [5]. 

- Escherichia coli hypothetical protein ybal [6]. 

We developed two signature patterns for P-ll protein. The first 
one is a 

conserved stretch (in eubacteria) of six residues which 
contains the 

urydylated tyrosine, the other is derived from a conserved 
region in the C- 

terminai part of the P-ll protein. 

Description of pattern(s) and/or profile(s) 

Consensus pattern Y-[KR]-G-[AS]-[AE]-Y [The second Y is 
uridylated] 

Sequences known to belong to this class detected by the pattern 

ALL glnB's from eubacteria. 

Other sequence(s) detected in SWISS-PROT 4. 

Consensus pattern [STl-x(3)-G-[DY!-G-[KR]-E[V]-[FW]-[LlVM3-x{2)- 
[LIVM] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Patterns and text revised. 
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A number of bacteria express filamentous adhesins known as pili. 
The pili are 

polar flexible filaments of about 5.4 nm diameter and 2500 nm 
average length; 
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they consist of a single polypeptide chain (called pilin or fimbrial 
protein) 

arranged in a helical configuration of five subunits per turn in the 
assembled 

pilus. Gram-negative bacteria produce piiin which are 
characterized by the 

presence of a very short leader peptide of 6 to 7 residues, 
followed by a 

methylated N-terminal phenylalanine residue and by a highly 
conserved sequence 

of about 24 hydrophobic residues. This class of pilin is often 
referred to as 

NMePhe or type-4 pilt [1 ,2]. 

Recently a number of bacterial proteins have been sequenced 
which share the 

following structural characteristics with type-4 pili [3]: 

a) The N -terminal residue, which is methylated, is hydrophobic 
{generally a 

phenylalanine or a methionine); 

b) The leader peptide is hydrophilic, consists of 5 to 10 residues 
{with two 

exceptions, see below) and ends with a glycine; 

c) The fifth residue of the mature sequence is a giutamate which 
seems to be 

required for the methylation step; 

d) The first twenty residues of the mature sequence are highly 
hydrophobic. 

These proteins are listed below: 

- Four proteins in an operon involved in a general secretion 
pathway (GSP) 

for the export of proteins (also called the type II pathway) [4]. 
These 

proteins have been assigned a different gene name in each of 
the species 
where they have been sequenced: 



Species 



Gene names 



Aeromonas hydrophila exeG exeH exel exeJ 
Erwinia chrysanthemi outG outH outl out J 
Escherichia coii hofG hofH yheH yhel 

Klebsiella pneumoniae pulG pulH pull pulJ 
Pseudomonase aeruginosa xcpT xcpU xcpV xcpW 
Vibrio cholerae epsG epsH epsl epsJ 

Xanthomonas campestris xpsG xpsH xpsl xpsJ 

- Vibrio cholerae toxin co-regulated pilin (gene tcpA). This pilin 
has a much 

longer putative leader peptide (25 residues). 

- Bacillus subtilis comG competence operon proteins 3, 4, and 
5 which are 

involved for the uptake of DNA by competent Bacillus subtilis 
cells. 

- ppdA, ppdB and ppdC, three Escherichia coli hypothetical 
proteins found in 

the thyA-recC intergenic region. 

- ppdA, a hypothetical protein near the groeLS operon of 
Clostridium 

perfringens. The putative leader peptide is 23 residues long. 

We developed a signature pattern based on the N-terminal 
conserved region of 
all these proteins. 



Description of pattern (s) and/or profile(s) 

Consensus pattern [KRHEQSTAG]-G-[FYLlVMHST]-[LTHiJVP]- 
E-PJVMFWSTAGK14) {The residue after the G is methylated] 
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Sequences known to belong to this class detected by the pattern 
KLL. 

}ther sequence(s) detected in SWISS-PROT NONE. 
_ast update 

November 1 995 / Text revised. 
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PLA2_B 




Lysophospholipase 
catalytic domain 


Accession number: PF01735 

Definition: Lysophospholipase catalytic domain 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_21 27 (release 4. 1 ) 

Gathering cutoffs: -283 -283 

Trusted cutoffs: -1 85.70 -1 85.70 

Noise cutoffs: -380.50 -380.50 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 94299545 

Reference Title: Delineation of two functionally distinct 
domains of 

Reference Title: cytosolic phosphoiipase A2, a regulatory 
Ca(2+)-dependent 

Reference Title: lipid-binding domain and a Ca(2+)- 
independent catalytic 
Reference Title: domain. 

Reference Author: Nalefski EA, Sultzman LA, Martin DM, 
Krtz RW, Towier PS, 

Reference Author: Knopf JL, Clark JD; 

Reference Location: J Biol Chem 1 994;269:1 8239-1 8249. 

Reference Number: [2] 

Reference Medline: 9432751 3 

Reference Title: The Saccharomyces cerevisiae PLB1 gene 
encodes a protein 

Reference Title: required for lysophospholipase and 

phosphoiipase B 

Reference Title: activity. 

Reference Author: Lee KS, Patton JL, Fido M, Hines LK, 
Kohlwein SD, Paltauf 

Reference Author: F, Henry SA, Levin DE; 

Reference Location: J Bio! Chem 1994;269:1 9725-19730. 

Database Reference: SCOP; 1 rlw; fa; [SCOP-USA][CATH- 

PDBSUM] 

Database Reference INTERPRO; IPR002642; 

Database Reference PDB; 1 bci ; 1 1 0; 1 38; 

Database Reference PDB; 1 cjy B; 1 1 10; 1 430; 

Database Reference PDB; 1 cjy A; 1 1 0; 498; 

Database Reference PDB; 1 riw ; 11 0; 1 40; 

Database Reference PDB; icjy B; 1463; 1497; 

Database Reference PDB; 1cjy B; 1 539; 1 71 7; 

Database Reference PDB; 1 cjy A; 539; 721 ; 

Comment: This family consists of Lysophospholipase / 

phosphoiipase B 

Comment: EC:3.1 .1 .5 and cytosolic phosphoiipase A2 
EC.3,1 .4 which also 

Comment: has a C2 domain C2. 

Comment: Phosphoiipase B enzymes catalyse the 

release of fatty acids from 
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Comment: lysophsopholipids and are capable in vitro of 
hydrolyzing ail 

Comment: phospholipids extractable form yeast cells 
PI- 

Comment: Cytosolic phospholipase A2 associates with 
natural membranes in 

Comment: response to physiological increases in Ca2+ 
and selectively 

Comment; hydrolyses arachidonyl phospholipids [2], 
the aligned region 

Comment. corresponds the the carboxy-terminal Ca2+- 
independent catalytic 

Comment: domain of the protein as discussed in [2]. 
Number of members: 23 


PLAT 




PLAT/LH2 domain 


Accession number: PF01477 

Definition: PLAT/LH2 domain 

Author: Bateman A 

Alignment method of seed: Manual 

Source of seed members: Bateman A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 29.40 29.40 

Noise cutoffs. -7.90 -7.90 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Database Reference: SCOP; 1 1pa; fa; [SCOP-USA][CATH- 

PDBSUM] 

Database reference: PROSITE^PROFILE; PS50095; 
Database Reference INTERPRO; 1PR001024; 
Database Reference PDB; 1 lox ; 2; 1 1 2; 
Database Reference PDB; 1 hpl B; 336; 445; 
Database Reference PDB; 1 hpi A; 338; 447; 
Database Reference PDB; 1 eth C; 337; 403; 
Database Reference PDB; 1 eth A; 339; 405; 
Database Reference PDB; 1 eth C; 403; 445; 
Database Reference PDB; 1 eth A; 405; 447; 
Database Reference PDB; 1 rp1 ; 339; 449; 
Database Reference PDB; 1 bu8 A; 340; 407; 
Database Reference PDB; 1 bu8 A; 41 5; 452; 
Database Reference PDB; 1 gpl ; 322; 334; 
Database Reference PDB; 1ca1 ; 256; 370; 
Database Reference PDB; 1 qm6 A; 256; 370; 
Database Reference PDB; 1 qm6 B; 256; 370; 
Database Reference PDB; 1 qmd A; 256; 370; 
Database Reference PDB; 1qmd B; 256; 370; 
Comment: This domain is found in a variety of 
membrane or 

Comment: lipid associated proteins, tt is called the 
PLAT 

Comment: {Polycystin-1 , Lipoxygenase, Alpha-Toxin) 
domain or 

Comment: LH2 (Lipoxygenase homology) domain. The 
known structure 

Comment: of pancreatic lipase shows this domain 
binds to procolipase 

Comment. Colipase, which mediates membrane 
association. 

Comment: So it appears possible that this domain 
mediates membrane 

Comment: attachment via other protein binding 
partners. The 

Comment: structure of this domain is known for many 
members of the 

Comment: family and is composed of a beta sandwich. 
Number of members: 82 


PLRV_ORF5 




Potato leaf roll virus 
readthrough protein 


Accession number: PF01690 

Definition: Potato leaf roll virus readthrough protein 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam -B 1 335 (release 4. 1 ) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 1 6.40 1 1 6.40 

Noise cutoffs: -285.50 -285.50 
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HMM build command line: hmmbuild -F HMM SEED 
HMM build command line: hmmcalibrate -seed 0 HMM 
Reference Number: [1] 
Reference Medline: 94233771 

Reference Title: Changes in the amino acid sequence of the 
coat protein 

Reference Title: readthrough domain of potato leaf roll 
uteovirus affect the 

Reference Title: formation of an epitope and aphid 
transmission. 

Reference Author: Jolly CA, Mayo MA; 

Reference Location: Virology 1994;201 : 182-1 85. 

Database Reference INTERPRO; IPR002929; 

Comment: This family consists mainly of the potato leaf 

roil virus 

Comment: readthrough protein. This is generated via a 
readthrough 

Comment: of open reading frame 3 a coat protein 
allowing transcription 

Comment: of open reading frame 5 to give an extended 
coat protein 

Comment: with a large c-terminal addition or read 
through domain [1]. 

Comment: The readthrough protein is thought to play a 
role in the 

Comment: circufative aphid transmission of potato leaf 
roll virus [1]. 

Comment: Also in the family is open reading frame 6 
from beet western 

Comment: yellows virus and potato leaf roil virus both 
i uteovirus and 

Comment: an unknown protein from cucurbit aphid- 
borne yellows virus a 
Comment: ciosterovirus. 
Number of members: 28 


PMSR 




Peptide methionine 
sulfoxide reductase 


Accession number: PF01625 

Definition: Peptide methionine sulfoxide reductase 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B 1111 (release 4.1 } 

Gathering cutoffs: -62 -62 

Trusted cutoffs: -28.00 -28.00 

Noise cutoffs: -96.70 -96.70 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number; [1] 

Reference Medline: 96353931 

Reference Title: Peptide methionine sulfoxide reductase 
contributes to the 

Reference Title: maintenance of adhesins in three major 
pathogens. 

Reference Author: Wizemann TM, Moskovitz J, Pearce BJ, 
Cundell D, Arvidson 

Reference Author: CG, So M, Weissbach H, Brot N, Masure 
HR' 

Reference Location: Proc Natl Acad Sci USA 1 996;93:7985- 
7990. 

Reference Number: [2] 
Reference Medline: 9631 2545 

Reference Title: Cloning the expression of a mammalian 
gene involved in the 

Reference Title: reduction of methionine sulfoxide residues 
in proteins. 

Reference Author: Moskovitz J, Weissbach H, Brot N; 
Reference Location: Proc Natl Acad Sci U S A 1 996;93:2095- 
2099. 

Database Heterence in i trirrto, irrtuu^DDi?, 

Comment: This enzyme repairs damaged proteins. 

Methionine sulfoxide in proteins 

Comment: is reduced to methionine. 

Number of members: 28 


Pollen allerg 2 




Ribonuclease (pollen 


Accession number: PF01620 
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Definition: Ribonuclease (pollen allergen) 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 050 (release 4.1 ) 

Gathering cutoffs: -3 -3 

Trusted cutoffs: 23.1 0 23.1 0 

Noise cutoffs: -29.40 -29.40 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 



Pyruvate 

flavodoxin/ferredoxin 
oxid ©reductase (N 
terminus) 



Reference Number: 
Reference Medline: 
Reference Title: 
novel pollen 
Reference Title: 
Reference Author: 
M, Becker WM; 
Reference Location: 
Database Reference 
Database reference: 
Comment: 
group V. 
Comment: 
ribonuclease 
Comment: 

Number of members: 



[13 

95246885 

Major allergen Phi p Vb in timothy grass is a 



RNase. 
Bufe A, Schramm G, 



Keown MB, Schiaak 



Febs left 1995;363:6-12. 
INTERPRO; IPR002914; 
PFAMB; PB037130; 
This family contains grass pollen proteins of 

Swiss: Q40963 has been shown to possess 

activity [1]. 
27 



Accession number: PF01855 

Definition: Pyruvate flavodoxin/ferredoxin oxidoreductase 

(N terminus) 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_323 (release 4.2) 

Gathering cutoffs: -116-116 

Trusted cutoffs: -1 1 3.60 -1 1 3.60 

Noise cutoffs: -119.50 -119.50 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 961 25254 

Reference Title: Molecular and phylogenetic 

characterization of pyruvate and 

Reference Title: 2-ketoisovalerate ferredoxin 

oxidoreductases from 

Reference Title: Pyrococcus furiosus and pyruvate 
ferredoxin oxidoreductase 
Reference Title: from Thermotoga maritima. 
Reference Author: Kletzin A, Adams MW; 
Reference Location: J Bacteriol 1 996;1 78:248-257. 
[21 

94022264 

Growth of the cyanobacterium Anabaena 



Reference Number: 
Reference Medline: 
Reference Title: 
on molecular 
Reference Title: 
limited. 

Reference Author: 
Reference Location: 
8816. 

Reference Number: 
Reference Medline: 
Reference Title: 
enzyme 
Reference Title: 
and in complex 
Reference Title: 
Reference Author: 
Pieulle L, Hatchikian 
Reference Author: 
Reference Location: 
Database Reference: 
PDBSUM] 

Database Reference: 
PDBSUM] 

Database Reference 
Database Reference 



nitrogen: NifJ is required when iron is 

Bauer CC, Scappino L, Haselkorn R; 
Proc Natl Acad Sci U S A 1 993;90:881 2- 

[3] 

99140300 
Crystal structures of the key anaerobic 

pyruvate:ferredoxin oxidoreductase, free 

with pyruvate. 
Chabriere E, Charon MH, Voibeda A, 

EC, Fontecilla-Camps JC; 
Nat Struct Biol 1999;6:182-190. 
SCOP; 2pda; fa; [SCOP-USA][CATH- 

SCOP; 2pda; fa; [SCOP-USA][CATH- 

INTERPRO; IPR002880; 

PDB; 1b0p A; 43; 328; 
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Database Reference PDB; 1 bOp B; 43; 328; 

Database Reference PDB; 2pda A; 43; 328; 

Database Reference PDB; 2pda B; 43; 328; 

Database reference: PFAMB; PB01 4847; 

Comment: This family includes the N terminal region of 

the pyruvate ferredoxin 

Comment: oxidoreductase, corresponding to the first 
two structural domains. 

Comment: This region is involved in inter subunit 
contacts [3]. Pyruvate 

Comment: oxidoreductase (POR) catalyses the final 
step in the fermentation 

Comment: of carbohydrates in anaerobic 
microorganisms [1]. This involves the 

Comment: oxidative decarboxylation of pyruvate with 
the participation of 

Comment: thiamine followed by the transfer of an 
acetyl moiety to coenzyme 

Comment: A for the synthesis of acetyl-CoA [1 ]. The 
family also includes 

Comment: pyruvate flavodoxin oxidoreductase as 
encoded by the nifJ gene in 

Comment: cyanobacterium which is required for growth 
on molecular nitrogen 

Comment: when iron is limited [2]. 
Number of members: 55 


PPE 




PPE family 


Accession number: PF00823 

Definition: PPE family 

Author: Bateman A 

Alignment method of seed: Ciustalw_manual 

Source of seed members: Pfam-B_297 (release 3.0) 

Gathering cutoffs: -90 -90 

Trusted cutoffs: -88.20 -88.20 

Noise cutoffs: -105.30 -1 05.30 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98295987 

Reference Title: Deciphering the biology of Mycobacterium 
tuberculosis from 

Reference Title: the complete genome sequence. 
Reference Author: 

Reference Location: Nature 1 998; 393: 537-544. 

Database Reference INTERPRO; IPR000030; 

Database reference: PFAMB; PB040834; 

Comment: This family named after a PPE motif near to 

the amino 

Comment: terminus of the domain. The PPE family of 
proteins 

Comment: all contain an ammo-terminal region of 
about 180 

Comment: amino acids. The carboxyl terminus of this 
family 

Comment: are variable, and on the basis of this region 
fall 

Comment: into at least three groups. The MPTR 
subgroup has 

Comment: tandem copies of a motif NXGXGNXG. The 
second subgroup 

Comment: contains a conserved motif at about position 
350. 

Comment: The third group are only related in the amino 
terminal 

Comment: region. 

Comment: The function of these proteins is uncertain 
but it 

Comment: has been suggested that they may be 
related to 

Comment: antigenic variation of Mycobacterium 

tuberculosis [1]. 

Number of members: 75 


PRA-CH 




Phosphoribosvl-AM P 


Accession number: PF01502 
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cyclohydrolase 


Definition: Phosphoribosyi-AMP cyclohydrolase 

Author: Bateman A 

Aiignment method of seed: Clustalw 

Source of seed members: Pfam-B_782 (release 4.0) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 88.20 88.20 

Noise cutoffs: -44.30 -44.30 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Mediine: 991 29952 

Reference Title: N1-(5 l -phosphoribosyl)adenosine-5 , - 
monophosphate 

Reference Title: cyclohydrolase: purification and 
characterization of a 

Reference Title: unique metalloenzyme. 

Reference Author: D'Ordine RL, Klem TJ, Davisson VJ; 

Reference Location: Biochemistry 1 999;38: 1 537-1 546. 

Database Reference INTERPRO; 1PR002496; 

Comment: This enzyme catalyses the third step in the 

histidine 

Comment: biosynthetic pathway. It requires Zn ions for 
activity. 

Number of members: 28 


PRA-PH 




Phosphoribosyl-ATP 
pyrophosphohydrolase 


Accession number: PF01503 

Definition: Phosphoribosyl-ATP pyrophosphohydrolase 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_784 {release 4.0) 

Gathering cutoffs: 6 6 

Trusted cutoffs: 12.10 12.10 

Noise cutoffs: 1.00 1.00 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Mediine: 7921 6449 

Reference Title: The product of the his4 gene cluster in 
Saccharomyces 

Reference Title: cerevisiae. A trifunctional polypeptide. 
Reference Author: Keesey JK Jr, Bigelis R, Fink GR; 
Reference Location: J Biol Chem 1979 Aug 10;254:7427- 
7433. 

Reference Number: [2] 

Reference Medline: 8631 0274 

Reference Title: Primary and secondary structural 

homologies between the 

Reference Title: HIS4 gene product of Saccharomyces 
cerevisiae and the hislE 

Reference Title: and hisD gene products of Escherichia coli 
and Salmonella 

Reference Title: typhimurium. 

Reference Author: Bruni CB, Carlomagno MS, Formisano S, 
Paolella G; 

Reference Location: Mol Gen Genet 1 986;203:389-396. 
Database Reference INTERPRO; 1PR002497; 
Comment: This enzyme catalyses the second step in 
the histidine 

Comment: biosynthetic pathway. 
Number of members: 32 


Pseudollsynth 1 




tRNA pseudouridine 
synthase 


Accession number: PF01416 

Definition: tRNA pseudouridine synthase 

Previous Pfam IDs: PseudoU_synt; 

Author: Howe K 

Alignment method of seed: Clustalw 

Source of seed members: swissprot 

Gathering cutoffs: 30 30 

Trusted cutoffs: 39.1 0 39.1 0 

Noise cutoffs: -55.00 -55.00 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98254513 
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Reference Title: Transfer RNA-pseudouridine synthetase 
Pus1 of Saccaromyces 

Reference Title: cerevisiae contains one atom of zinc 
essential for its 

Reference Title: native conformation and tRNA recognition. 
Reference Author: Arluison V, Hountondji C, Robert B, 
Grosjean H; 

Reference Location: Biochemistry 1 998;37: 7268-7276. 

Database Reference INTERPRO; IPR001 406; 

Database reference: PFAMB; PB027500; 

Comment: Involved in the formation of pseudouridine at 

the anticodon stem 

Comment: and loop of transfer-RNAs 

Comment: Pseudouridine is an isomer of uridine (5- 

(beta-D-rtbofuranosyl) 

Comment: uracil, and id the most abundant modified 

nucleoside found in 

Comment: all cellular RNAs. 

Comment: The TruA-like proteins also exhibit a 

conserved sequence with 

Comment: a strictly conserved aspartic acid, likely 
involved in catalysis 
Number of members: 31 


PseudoU_synth_2 




RNA pseudouridylate 
synthase 


Accession number: PF00849 

Definition: RNA pseudouridylate synthase 

Previous Pfam IDs: YABO; 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_421 (release 3.0) 

Gathering cutoffs: 20 20 

Trusted cutoffs: 20.90 20.90 

Noise cutoffs: -44.40 -44.40 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96079974 

Reference Title: A dual -specificity pseudouridine synthase: 
an Escherichia 

Reference Title: coli synthase purified and cloned on the 
basis of its 

Reference Title: specificity for psi 746 in 23S RNA is also 
specific for psi 

Reference Title: 32 in tRNA(phe). 

Reference Author: Wrzesinski J, Nurse K, Bakin A ? Lane BG, 
Ofengand J; 

Reference Location: RNA 1 995;1 :437-448. 
Database Reference: PROSITE; PDOC00869 
Database Reference: PROSITE; PDOC00885 
Database Reference INTERPRO; IPR00061 3; 
Database reference: PFAMB; PB041 1 60; 
Database reference: PFAMB; PB041232; 
Comment: Members of this family are involved in 
modifying bases in RNA molecules. 

Comment: They carry out the conversion of uracil 

bases to pseudouridine. This family 

Comment: includes RluD Swiss: P33643, a 

pseudouridylate synthase that converts 

Comment: specific uracils to pseudouridine in 23S 

rRNA. RiuA from E. coli 

Comment: converts bases in both rRNA and tRNA [1]. 
Number of members: 78 


PW1 




PWI domain 


Accession number: PF01 480 

Definition: PWI domain 

Author: Bateman A 

Alignment method of seed: Clustalw_jnanual 

Source of seed members: [1 ] 

Gathering cutoffs: 25 25 

Trusted cutoffs: 64.40 64.40 

Noise cutoffs: -3.50 -3.50 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 
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Reference Medline: 1 0322432 

Reference Title: The PWi motif: a new protein domain in 
splicing factors. 

Reference Author: Biencowe BJ, Ouzounis CA; 
Reference Location: Trends Biochem Sci 1 999; 24:1 79-1 80. 
Database Reference INTERPRO; IPR002483; 
Number of members: 11 


R3H 




R3H domain 


Accession number: PF01424 

Definition: R3H domain 

Author: Bateman A 

Alignment method of seed* Manual 

Source of seed members: Mediine:99003905 

Gathering cutoffs: 25 25 

Trusted cutoffs: 59.30 59.30 

Noise cutoffs: 5.10 5.10 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 99003905 

Reference Title: The R3H motif: a domain that binds single- 
stranded nucleic 
Reference Title: acids. 
Reference Author: Grishin NV; 

Reference Location : Trends Biochem Sci 1 998;23:329-330. 
Database Reference INTERPRO; IPR001374; 
Database reference: PFAMB; PB041444; 
Comment: The name of the R3H domain comes from 
the characteristic spacing 

Comment: of the most conserved arginine and histidine 
residues. The 

Comment: function of the domain is predicted to be 

binding ssDNA. 

Number of members: 28 


RepB^protein 




initiator RepB protein 


Accession number: PF01051 

Definition: initiator RepB protein 

Author: Finn RD, Bateman A 

Alignment method of seed; Clustalw 

Source of seed members: PfanvB_313 (release 3.0) 

Gathering cutoffs: 14 14 

Trusted cutoffs: 19.00 16.20 

Noise cutoffs: 1 1 .80 1 2.90 

HMM build command iine: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98284148 

Reference Title: Replication and control of circular bacterial 
piasmids. 

Reference Author: del Solar G, Giraido R, Ruiz-Echevarria 
MJ, Espinosa M, 

Reference Author: Diaz-Orejas R; 

Reference Location: Microbiol Mol Biol Rev 1998;62:434-464. 
Reference Number: [2] 
Reference Medline: 97324207 

Reference Title: Initiation of replication of plasmid pMV158: 
mechanisms of 

Reference Title: DNA strand-transfer reactions mediated by 
the initiator 

Reference Title: RepB protein. 

Reference Author: Moscoso M f Eritja R, Espinosa M; 

Reference Location: J Mol Biol 1 997;268:840-856. 

Database Reference INTERPRO; 1PR000525; 

Database Reference PDB; 1rep C; 198; 240; 

Database reference: PFAMB; PB000509; 

Comment: This protein is an initiator of plasmid 

replication. 

Comment: RepB possesses nicking-closing 
(topoisomerase I) like activity. 

Comment: it is also able to perform a strand transfer 
reaction on ssDNA 

Comment: that contains its target. 
Number of members: 51 
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Rhomboid 




Rhomboid family 


Accession number: PF01694 

Definition: Rhomboid family 

Author: Sohrmann M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 399 (release 4.1 ) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 43.60 1 43.60 

Noise cutoffs: -43.60 -43.60 

HMM build command line: hmmbuiid HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 90249726 

Reference Title: rhomboid, a gene required for dorsoventral 
axis 

Reference Title: establishment and peripheral nervous 
system development in 

Reference Title: Drosophila melanogaster. 
Reference Author: Bier E, Jan I_Y, Jan YN; 
Reference Location: Genes Dev 1 990;4:1 90-203. 
Database Reference INTERPRO; IPR002610; 
Database reference: PFAMB; PB041 1 1 3; 
Comment: This family contains integral membrane 
proteins that are 

Comment: related to Drosophila rhomboid protein 
Swiss :P20350. Members 

Comment: of this family are found in bacteria and 
eukaryotes. These 

Comment: proteins contain three strongly conserved 
histidines in the 

Comment: putative transmembrane regions that may 
be involved in the 

Comment: as yet unknown function of these proteins. 
Number of members: 27 


RibosomaLLI 8ae 




Ribosomal L1 8ae protein 
family 


Accession number: PF01775 

Definition: Ribosomal L1 8ae protein family 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: PSI-BLAST Q02543 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 36.70 1 36.70 

Noise cutoffs: -99.80 -99.80 

HMM build command line: hmmbuiid HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference INTERPRO; IPR002670; 

Number of members: 1 1 


Ribosomal L21 p 


PDOC00899 


Ribosomal protein L21 
signature 


Ribosomal protein L21 is one of the proteins from the large 
ribosomal subunit. 

In Escherichia coli, L21 is known to bind to the 23S rRNA in the 
presence of 

L20. It belongs to a family of ribosomal proteins which, on the 
basis of 

sequence similarities, groups: 

- Eubacterial L21 . 

- Marchantia polymorpha chloroplast L21 . 

- Cyanelle L21 . 

- Spinach chloroplast L21 (nuclear-encoded). 

Eubacterial L21 is a protein of about 100 amino-acid residues, the 
mature form 

of the spinach chloroplast L21 has 200 residues. As a signature 
pattern, we 

selected a conserved region located in the C-terminal section 

of these 

proteins. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [IVT|-x(3)-[KR]-x(3)-[KRQ]-K-x(6)-G-[HF]-R- 
FRQl-x{2)-[ST| 
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Sequences known to belong to this class detected by the pattern 
ALL 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

July 1999 / Pattern and text revised. 


Ribosomal L22e 




Ribosomal L22e protein 
family 


Accession number: PF01 776 

Definition: Ribosomal L22e protein family 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: PSI-BLAST P56628 

Gathering cutoffs: 25 25 

Trusted cutoffs: 262.80 262.80 

Noise cutoffs: -52.00 -52.00 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference INTERPRO; IPR002671 ; 

Number of members: 1 1 


RibosomaLL27e 




Ribosomal L27e protein 
family 


Accession number: PF01777 

Definition: Ribosomal i_27e protein family 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: PS i -BLAST P51 41 9 

Gathering cutoffs: 25 25 

Trusted cutoffs: 326.90 326.90 

Noise cutoffs: -47.80 -47.80 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Database Reference INTERPRO; IPR001 1 41 ; 

Number of members: 9 


Ribosomat_L29 


PDOC00501 


Ribosomal protein L29 
signature 


Ribosomal protein L29 is one of the proteins from the targe 
ribosomal subunit. 

L29 belongs to a family of ribosomal proteins which, on the basis 
of sequence 
similarities [1], groups: 

- Eubacteriat L29. 

- Red algal L29. 

- Archaebacterial L29. 

- Mammalian L35 

- Caenorhabditis elegans L35 (ZK652.4). 

- Yeast L35. 

L29 is a protein of 63 to 138 amino-acid residues. As a signature 
pattern, we 

selected a conserved region located in the central section of L29. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [KNQS)-[PSTLN]-x(2)-[LIMFA3-[KRGSAN3-x- 
[LIVYSTA]-IKR]- [KRHQS]-[DESTANRL]-[LIV]-A-[KRCQVT1- 
[LIVMA] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 2. 
Last update 

December 1999 / Pattern and text revised. 

References 

[1] 

Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 


RtbosomaLL31 e 


PDOC00881 


Ribosomal protein L31 e 
signature 


A number of eukaryotic and archaebacterial ribosomal proteins 
can be grouped 

on the basis of sequence similarities. One of these families 
consists of: 

- Mammalian L31 [1]. 

- Chlamydomonas reinhardtii L31 . 

- Yeast L34. 
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- Halobacterium marismortui HL30 [2]. 

These proteins have 87 to 1 28 amino-acid residues. As a 
signature pattern, we 

selected a conserved region located in the central section. 

Description of pattern (s) and/or profile(s) 

Consensus pattern V-[KR]"[LIVM]-x(3)-[LIVM]-N-x-[AKH]-x-W-x- 
[KR]-G 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

July 1999/ Pattern and text revised. 

References 

[ 1] 

Tanaka T. ( Kuwano Y., Kuzumaki T., ishikawa K., Ogata K. 
Eur. J. Biochem. 162:45-48(1987). 

[2] 

Bergmann U., Arndt E. 

Biochim. Biophys. Acta 1050:56-60(1990). 


Ribosomal_L35Ae 


PDOC00849 


Ribosomal protein L35Ae 
signature 


A number of eukaryotic and archaebacteria! ribosomal proteins 
can be grouped 

on the basis of sequence similarities. One of these families 
consists of: 

- Vertebrate L35A. 

- Caenorhabditis eiegans L35A (F10E7.7). 

- Yeast L37A/L37B (Rp47). 

- Pyrococcus woesei L35A homolog [1 ] . 

These proteins have 87 to 1 10 amino-acid residues. As a 
signature pattern, we 

se j ec ted a highly conserved stretch of 22 residues in the C- 
terminal part of 
these proteins. 

Description of pattern (s) and/or profile(s) 

Consensus pattern G-K-[UVM]-x-R-x-H-G-x(2)-G-x-V-x-A-x-F- 
x(3HLIJ-P 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Pattern and text revised. 

References 

[1] 

Ouzounis C. f Kyrpides N., Sander C. 
Nucleic Acids Res. 23:565-570(1995). 


Ribosomal_L35p 


PDOC00721 


Ribosomal protein L35 
signature 


Ribosomal protein L35 is one of the proteins from the large 
subunit of the 

ribosome. It belongs to a family of ribosomal proteins which, on 
the basis of 

sequence similarities [1], groups: 

- Eubacterial L35. 

- Plant chloropiast L35 (nuclear-encoded). 

- Red algal chloropiast L35. 

- Cyaneile L35. 

L35 is a basic protein of 60 to 70 ammo-acid residues. As a 
signature pattern 

we selected a conserved region in the N-terminal section. 
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Description of pattern(s) and/or profile(s) 

Consensus pattern [LIVM]-K-lTV]-x(2)-[GSA]-[SAlLV]-x-K-R- 
[LIVMFY]-[KRLS] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1 999 / Pattern and text revised. 

References 

[1] 

Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 


Ribosomal_L36e 


PDOC00916 


Ribosomal protein L36e 
signature 


A number of eukaryotic ribosomai proteins can be grouped on 
the basis of 

sequence similarities. One of these families consists of 

- Mammalian L36 [1]. 

- Drosophila L36 (M(1)1B). 

- Caenorhabditis elegans L36 (F37C12.4). 

- Candida albicans L39. 

- Yeast YL39. 

These proteins have 99 to 104 amino acids. As a signature 
pattern, we 

selected a conserved region in the central part of these proteins. 

Description of pattern (s) and/or profile(s) 

Consensus pattern P-Y-E-[KR]-R-x-[LIVM3-pE]-[LIVM]{2)-[KR3 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / First entry. 

References 

[11 

Chan Y.-L, Paz V., Olvera J., Woo! I.G. 

Biochem. Biophys. Res. Commun. 192:849-853(1993). 


Ribosomal_L37ae 




Ribosomai L37ae protein 
family 


Accession number: PF01780 

Definition: Ribosomai L37ae protein family 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: PSI-BLAST P54051 

Gathering cutoffs: 25 25 

Trusted cutoffs: 145.10 145.10 

Noise cutoffs: -46.90 -46.90 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference INTERPRO; 1PR002674; 

Comment: This ribosomai protein is found in 

archaebacteria and 

Comment: eukaryotes. It contains four conserved 
cysteine 

Comment: residues that may bind to zinc. 
Number of members: 1 5 


Ribosomal_L37e 


PDOC00827 


Ribosomai protein L37e 
signature 


A number of eukaryotic and archaebacterial ribosomai proteins 
can be grouped 

on the basis of sequence similarities. One of these families 
consists of: 

- Mammalian L37 [1]. 

- Leishmania infantum L37 [2]. 

- Fission yeast YL35 [3]. 

- Halobacterium marismortui L37e (L35e) [4]. 

These proteins have 56 to 96 amino-acid residues. As a 
signature pattern, we 
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selected a highly conserved region located in the N-terminal 

part of these 

proteins. 

Description of pattern(s) and/or profile(s) 

Consensus pattern G-T-x-[SA]-x-G-x-[KR]-x{3)-[STLR]-x{0,1)-H- 
x(2)-C-x-R-C-G 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

Juiy 1999 / Pattern and text revised. 

References 

[1] 

Chan Y.-L., Paz V., Olvera J., Wool i.G. 

Biochem. Biophys. Res. Commun. 192:590-596(1993). 

[2] 

Myler P.J., Tripp C.A., Thomas L, Venkataraman G.M., Merlin G., 
Stuart K. 

Mol. Biochem. Parasrtol. 62:147-152(1993). 
[3] 

OtakaE., Higo K.-I., itoh T. 

Mol. Gen. Genet 191:519-524(1983). 

[4] 

Bergmann U., Wittmann-Liebold B. 
Biochim. Biophys. Acta 1173:195-200(1993). 


Ribosomal L38e 




Ribosomal L38e protein 
family 


Accession number: PF01 781 

Definition: Ribosomal L38e protein family 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: PSI -BLAST P2341 1 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 27.60 1 27.60 

Noise cutoffs: -24.50 -24.50 

HMM build command tine: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1 ] 

Reference Medline: 91 207349 

Reference Title: The primary structure of rat ribosomal 
protein L38. 

Reference Author: Kuwano Y, Olvera J, Wool iG; 
Reference Location: Biochem Biophys Res Commun 
1991;175:551-555. 

Database Reference INTERPRO; IPR002675; 
Number of members: 8 


Ribosomal_L39 


PDOC00050 


Ribosomal protein L39e 
signature 


A number of eukaryotic and archaebacteriai ribosomal proteins 
can be grouped 

on the basis of sequence similarities. One of these families 
consists of: 

- Mammalian L39 [1]. 

- Plants L39. 

- Yeast L46 [2]. 

- Archebacterial L39e [3]. 

These proteins are very basic. About 50 residues long, they are 
the smallest 

proteins of eukaryotic-type ribosomes. As a signature pattern, 
we selected a 

conserved region in the C-termina! section of these proteins. 
Description of pattern (s) and/or profile(s) 

Consensus pattern fKRAl-T-x(3)-rLIVMl-fKRQFl-x-[NHSl-x(3)-R- 
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I 

< 

< 


NHY]-W-R-R 

Sequences known to belong to this class detected by the pattern 
ML. 

Other sequence(s) detected in SWISS-PROT NONE. 
_ast update 

July 1998 / Pattern and text revised. 
References 

[1] 

Lin A., McNally J., Wool I.G. 

J. Biol. Chem. 259:487-490(1984). 

[2] 

Leer R.J., van Raamsdonk-Duin M.M.C., Kraakman P., Mager 
W.H., Planta R.J. 

Nucleic Acids Res. 13:701-709(1985). 
[3] 

Ramirez C, Louie K.A., Matheson A.T. 
FEBS Lett. 250:416-418(1989}. 


RibosomaLL4 


PDOC00724 


Ribosomal protein L1 e 
signature 


A number of eukaryotic and archaebacterial ribosomal proteins 
can be grouped 

on the basis of sequence similarities. One of these families 
consists [1 ,2,3, 
4] of: 

- Vertebrate L1 (L4). 

- Drosophila L1 . 
-Plant L1. 

- Yeast L2 (Rp2). 

- Fission yeast L2. 

- Halobacterium marismortui HmaL4 (HL6). 

- Methanococcus jannaschii MJ0177. 

These proteins have 246 (archaebacteria) to 427 (human) 
amino acids. As a 

signature pattern, we selected a conserved region in the N- 
terminal part of 
these proteins. 

Description of pattern(s) and/or profile(s) 

Consensus pattern N-x(3)-[KRM]-x(2)-A-[LIVTi-x-S-A-lLIVl-x-A- 
[STHSGA]- x(7)-[RK]-[GS]-H 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Pattern and text revised. 
References 

Rafti F., Gargiulo G., Manzi A., Malva C, Graziani F. 
Nucleic Acids Res. 17:456-456(1989). 

[2] 

Presutti C, Villa T., Bozzoni t. 

Nucleic Acids Res. 21 :3900-3900(1993). 

[3] 

Bagni C., Mariottini P., Annesi F., Amaldi F. Arndt E., Kroemer W. 
Hatakeyama T. 

Biochim. Biophys. Acta 1216:475-478(1993). J. Biol. Chem. 
265:3034-3039(1990). 


Ribosomai_S20p 




Ribosomal protein S20 


Accession number: PF01649 

Definition: Ribosomal protein S20 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfarn- B 1 685 (release 4.1 ) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 57.30 57.30 

Noise cutoffs: -25.50 -25.50 





Attorney No. 2750-1237P 



989 



Pfam 



Ribosomal_S27e 



Prosite 



Fufl Name 



PDOC00898 



Ribosomal protein S27e 
signature 



D&scripSon 



HMM build command 
HMM build command 
Reference Number: 
Reference Medline: 
Reference Title: 
with 1 6 S 
Reference Title: 
Reference Author: 
Noller HF; 
Reference Location: 
Database Reference 
Comment: 
with 16S rRNA [1]. 
Number of members: 



line: hmmbuild -F HMM SEED 
line: hmmcaiibrate --seed 0 HMM 
[1] 

88230452 

Interaction of proteins S16, S17 and S20 

ribosomal RNA. 
Stern S, Changchien LM, Craven GR, 

J Mol Biol 1988;200:291-299. 
INTERPRO; IPR002583; 
Bacterial ribosomal protein S20 interacts 

29 



Ribosomal S3_C 



PDOC00474 



A number of eukaryotic and archaebacteriaf ribosomal proteins 
can be grouped 

on the basis of sequence similarities. One of these families 
consists of [1]: 

Mammalian S27 (human S27 was originally known as 
metallopan-stimulin 1). 
Chlamydomonas reinhardtii S27. 
Entamoeba histolytica S27> 
Yeast S27. 

Archaebacterial S27e. 

These proteins have from 62 to 87 amino acids. They contain, in 
their central 

section, a putative zinc-finger region of the type C-x(2)-C-x(1 4)-C- 
x(2)-C. We 

have selected that region as a signature pattern. 



Description of pattern(s) and/or profile(s) 

Consensus pattern [QKT]-C-x(2)-C-x(6)-F-[GSD]-x-[PSA]-x{5)-C- 
x(2)-C-[GSA]- x(2)-[LV]-x{2)-P-x-G [The four C's are potential zinc 
ligands] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1999 / Pattern and text revised. 
References 

^ ] 

Chan Y.-L., Suzuki K. t Oivera J., Wool I.G. 
Nucleic Acids Res. 21 :649-655(1993). 



Ribosomal protein S3 
signature 



Ribosomal protein S3 is one of the proteins from the small 
ribosomal subunit. 

In Escherichia coli, S3 is known to be involved in the binding of 
initiator 

Met-tRNA. It belongs to a family of ribosomal proteins which, on 
the basis of 

sequence similarities [1], groups: 



- Eubacterial S3. 

- Algal and plant chloroplast S3. 

- Cyanelle S3. 

- Archaebacterial S3. 

- Plant mitochondrial S3. 

- Vertebrate S3. 

- Insect S3. 

- Caenorhabditis elegans S3 (C23G10.3). 
-Yeast S3 (Rp13). 

S3 is a protein of 209 to 559 amino-acid residues. As signature 
patterns, we 

selected a conserved region located in the C-terminal section. 
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Description of pattern (s) and/or profiie(s) 

Consensus pattern [GSTA]-[KR]-x(6)-G-x-[LIVMTf-x{2)-[NQSCH]- 

<(1 ,3)-[LIVFCA]- x(3)-[L!V]-EDENQ]-x(7)-[LMTl-x(2)-G-x(2)-[GS] 

Sequences known to belong to this class detected by the pattern 

\LL, except for some mitochondrial S3. 

3ther sequence(s) detected in SWISS-PROT NONE. 

Expert(s) to contact by emaii 

Hailick R.B. hallick@arizona edu 

_ast update 

December 1999 / Pattern and text revised. 

References 

[1] 

Dtaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 


RibosomaLS3_N 


PDOC00474 


Ribosomal protein S3 
signature 


Ribosoma! protein S3 is one of the proteins from the small 
ribosomal subunit. 

in Escherichia coli, S3 is known to be involved in the binding of 
initiator 

Met-tRNA. It belongs to a family of ribosomal proteins which, on 
the basis of 

sequence similarities [1], groups: 

- Eubacterial S3. 

- Algal and plant chioroplast S3. 

- Cyanelle S3. 

- Archaebacteriai S3. 

- Plant mitochondrial S3. 

- Vertebrate S3. 

- Insect S3. 

- Caenorhabditis elegans S3 (C23G10.3). 
-Yeast S3 (Rp13). 

S3 is a protein of 209 to 559 amino-acid residues. As signature 
patterns, we 

selected a conserved region located in the Oterminal section. 
Description of pattern (s) and/or profiie(s) 

Consensus pattern [GSTA]-[KR3-x(6)-G-x-[LIVMT]-x(2)-[NQSCH]- 

x(1 ,3)-[LIVFCA]- x(3)-[LIV]-[DENQ]-x(7)-[LMT]-x(2)-G-x(2)-[GS] 

Sequences known to belong to this class detected by the pattern 

ALL, except for some mitochondrial S3. 

Other sequence(s) detected in SWISS-PROT NONE. 

Expert(s) to contact by email 

Hailick R.B. haltick@arizona.edu 

Last update 

December 1999 / Pattern and text revised. 

References 

[1] 

Otaka E. ; Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 


RimM 




RimM 


Accession number: PF01782 

Definition: RimM 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: PSl-BLAST P51419 

Gathering cutoffs: 25 25 

Trusted cutoffs: 49.00 49.00 

Noise cutoffs: -66.1 0 -66.1 0 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1 ] 

Reference Medline: 98083058 

Reference Title: RimM and RbfA are essential for efficient 
processing of 16S 

Reference Title: rRNA in Escherichia coli. 

Reference Author: Bylund GO, Wipemo LC, Lundberg LA, 
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Mkstrom PM; 

Reference Location: J Bacteriol 1 998;1 80:73-82. 
Database Reference INTERPRO; IPR002676; 
Comment: The RimM protein is essential for efficient 
processing of 16S rRNA [1]. 

Comment: The RimM protein was shown to have 
affinity for free ribosomal 30S 

Comment: subunits but not for 30S subunits in the 70S 
ibosomes [1]. 

dumber of members: 1 4 




RNA dep RNA pol 


! 


RNA dependent RNA > 
aoiymerase 

i 


Accession number: PF00680 

Definition: RNA dependent RNA polymerase 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_32 (release 2. 1 ) 

Gathering cutoffs: -1 27 -1 27 

Trusted cutoffs: -1 1 7.00 -1 1 7.00 

Noise cutoffs: -1 37.30 -1 37.30 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference: SCOP; 1 rdr; fa; [SCOP-USA][CATH- 

PDBSUM] 

Database Reference INTERPRO; IPR001205; 
Database Reference PDB; 1 rdr ; 12; 37; 
Database Reference PDB; 1 rdr ; 1 82; 460; 
Database Reference PDB; 1 rdr ; 67; 97; 
Database reference: PFAMB; PB039844; 
Database reference: PFAMB; PB040630; 
Database reference: PFAMB; PB040631 ; 
Database reference: PFAMB; PB040844; 
Database reference: PFAMB; PB041022; 
Database reference: PFAMB; PB041498; 
Number of members: 271 












RNA_dep_RNApol2 




RNA dependent RNA 
polymerase 


Accession number: PF00978 

Definition: RNA dependent RNA polymerase 

Author: Finn RD, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_13 (release 3.0} 

Gathering cutoffs: 8.5 0 

Trusted cutoffs: 8.50 0.20 

Noise cutoffs: 8.40 8.40 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 931 881 40 

Reference Title: Roles of nonstructural potyproteins and 
cleavage products 

Reference Title: in regulating Sindbis virus RNA replication 
and 

Reference Title: transcription. 
Reference Author: Lemm J A, Rice CM; 
Reference Location: J Virol 1993;67:1916-1926. 
Reference Number: [2] 
Reference Medline: 963231 43 

Reference Title: Complete replication in vitro of tobacco 
mosaic virus RNA 

Reference Title: by a template-dependent, membrane-bound 
RNA polymerase. 

Reference Author: Osman TA, Buck KW; 

Reference Location: J Virol 1 996;70: 6227-6234. 

Reference Number: [3] 

Reference Mediine: 94047331 

Reference Title: Bromovirus RNA replication and 

transcription require 

Reference Title: compatibility between the polymerase- and 
helicase-tike 

Reference Title: viral RNA synthesis proteins. 
Reference Author: Dinant S, Janda M, Kroner PA, Ahlquist P 
Reference Location: J Virol 1 993;67:71 81 -71 89. 
Reference Number: [4] 
Reference Medline: 94094568 

Reference Title: Evolution and taxonomy of positive-strand 
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RNA viruses: 

Reference Title: implications of comparative analysis of 
amino acid 

Reference Title: sequences. 

Reference Author: Koonin EV, Doija W; 

Reference Location: Crit Rev Biochem Mo! Biol 1993;28:375- 

430. 

Database Reference INTERPRO; IPR001 788; 
Database reference: PFAMB; PB000096; 
Database reference: PFAMB; PB006751 ; 
Comment: This family may represent an RNA 
dependent RNA polymerase. 

Comment: The family contains the following proteins: 
Comment: 2A protein from bromoviruses 
Comment: putative RNA dependent RNA polymerase 
from tobamoviruses 

Comment: Non structural poiy protein from togaviruses 
Number of members: 1 25 


RNA_pol 


PDOC00410 


Bacteriophage-type RNA 
polymerase family active 
site signatures 


Many forms of RNA polymerase (EC 2.7 7.6) are known. Most 
RNA polymerases are 

multimeric enzymes, but there is a family of single chain 
polymerases, which 

are evolutionary related, and which originate from 
bacteriophages or from 

mitochondria. The RNA polymerases that belong to this family 
are [1]: 

- Podoviridae bacteriophages T3, T7, and K1 1 polymerase. 

- Bacteriophage SP6 polymerase. 

- Vertebrate mitochondrial polymerase (gene POLRMT). 

- Fungal mitochondrial polymerase (gene RP041). 

- Polymerases encoded on mitochondrial linear DNA plasm ids 
in various fungi 

and plants: Agaricus bitorquis pEM, Ciaviceps purpurea pC!K1 , 
Neurospora 

crassa Kaiilo; Neurospora intermedia Maranhar and maize S-2). 

Two conserved aspartate and one lysine residue have been 
shown [2,3] to be 

part of the active site of T7 polymerase. We have used the 
regions around the 

first aspartate and around the lysine as signature patterns for this 

family of 

polymerases. 

Description of pattern (s) and/or profile(s) 

Consensus pattern P-[LlVM]-x(2)-D-[GA]-[ST]-[ACHSN]-[GA]- 
[LIVMFY]-Q [D is the active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern [LIVMF]-x-R-x(3)-K-x(2)-[LIVMF]-M-[PT]-x(2}- 
Y [K is the active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

July 1 999 / Text revised. 

References 

E1] 

McAllister W.T., Raskin C.A. 
Mol. Microbiol. 10:1-6(1993). 

[2] 

Maksimova T.G., Mustayev A.A., Zaychikov E.F., Lyakhov D.L., 
Tunitskaya V.L., Akbarov A.K., Luchin S.V., Rechinsky V.O., 
Chernov B.K., Kochetkov S.N. 
Eur. J. Biochem. 195:841-847(1991). 
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Sousa R., Chung Y.J., Rose J.P., Wang B.-C. 
Nature 364:593-599(1993). 


RNA poi A 




3NA pofym erase alpha 
subunit 


Accession number: PF00623 

Definition: RNA polymerase alpha subunit 

Author: Bateman A 

Alignment method of seed: HMM_builtJrom_aiignment 

Source of seed members: Pfam-B„3 (release 2.1 ) 

Gathering cutoffs: 9 0 

Trusted cutoffs: 1 3.50 2.90 

Noise cutoffs: 8.50 8.50 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97066998 

Reference Title: Structural modules of the large subunits of 
RNA polymerase. 

Reference Title: Introducing archaebacterial and chloroplast 
split sites in 

Reference Title: the beta and beta' subunits of Escherichia 
coli RNA 

Reference Title: polymerase. 

Reference Author: Severinov K, Mustaev A, Kukarin A, 

Muzzin O, Bass I, Darst 

Reference Author: SA, Goldfarb A; 

Reference Location: J Biol Chem 1 996;271 :27969-27974. 

Database Reference INTERPRO; 1PR000722; 

Database reference: PFAMB; PB00321 8; 

Comment: -!- RNA polymerases catalyse the DNA 

dependent polymerisation 

Comment: of RNA. Prokaryotes contain a single RNA 
polymerase 

Comment: compared to three in eukaryotes (not 
including mitochondrial. 

Comment: and chloroplast polymerases). 
Comment: -!- Members of this family include: 
Comment: A subunit from eukaryotes 
Comment: gamma subunit from cyanobacteria 
Comment: beta' subunit from eubacteria 
Comment: A 1 subunit from archaebacteria 
Comment: B" from chloroplasts 
Number of members: 202 


RNA_poi_A2 




RNA polymerase 
A/beta'/A" subunit 


Accession number: PF01854 

Definition: RNA polymerase A/beta'/A" subunit 

Author: Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pfam-B_288 (release 4.2) 

Gathering cutoffs: -1 20 -1 20 

Trusted cutoffs: -116.50 -116.50 

Noise cutoffs: -1 25.00 -1 25.00 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 88335550 

Ref ereriC e Title: Related ness of archaebacterial RNA 
polymerase core subunits 

Reference Title: to their eubacterial and eukaryotic 
equivalents. 

Reference Author: Berghofer B, Krockel L, Kortner C, Truss 

M, Schailenberg J, 

Reference Author: Klein A; 

Reference Location: Nucleic Acids Res 1988;16:8113-8128. 

Database Reference INTERPRO; 1PR002879; 

Database reference: PFAMB; PB000546; 

Database reference: PFAMB; PB000846; 

Database reference: PFAMB; PB000984; 

Database reference: PFAMB; PB001 1 68; 

Comment: RNA polymerases catalyse the DNA 

dependent polymerisation 

Comment: of RNA. Prokaryotes contain a single RNA 
polymerase 

Comment: compared to three in eukaryotes (not 
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including mitochondrial. 

Comment: and chloroplast polymerases). 

Comment: This family includes a region of about 400 

amino acids. 

Comment: This family includes the whole 

archaebacterial A" subunit, 

Comment: but only the C terminal region of the A 

subunit from eukaryotes 

Comment: and the beta' subunit from eu bacteria- 

Number of members: 1 05 



On the basis of sequence similarities, the following bacterial and 
eukaryotic 

proteins seem to form a family: 

Escherichia coli and related bacteria ribonuclease II (EC 
3.1.13.1) (RNase 

II) (gene rnb) [1]. RNase II is an exonuclease involved in 
mRNA decay. It 

degrades mRNA by hydrolyzing single-stranded 
polyribonucleotides 

processively in the 3' to 5' direction. 

Bacterial ribnuclease R [2], a 3'-5'exoribonuciease that 
participates in an 

essential cell function. 

- Yeast protein SSD1 (or SRK1) which is implicated in the control 
of the eel! 

cycle G1 phase. 

Yeast protein DIS3 [3], which binds to ran (GSP1 ) and 
ehances the the 
nucleotide-releasing activity of RCC1 on ran. 

- Fission yeast protein dis3, which is implicated in mitotic control. 

- Neurospora crassa cyt-4 t a mitochondrial protein required for 
RNA 5' and 3 l 

end processing and splicing. 

Yeast protein MSU1 , which is involved in mitochondrial 
biogenesis. 

Synechocystis strain PCC 6803 protein zam [4], which control 
resistance to 
the carbonic anhydrase inhibitor acetazolarnide. 
Caenorhabditis elegans hypothetical protein F48E8.6. 

The size of these proteins range from 644 residues (rnb) to 1 250 
(SSD1). While 

their sequence is highly divergent they share a conserved 
domain in their C- 

terminal section [5]. It is possible that this domain plays a role 
in a 

putative exonuclease function that would be common to all these 
proteins. We 

have developed a signature pattern based on the core of this 
conserved domain. 



Description of pattern (s) and/or profile(s) 

Consensus pattern [HI]-[FYE]-[GSTAM]-[LiVM]-x{4 f 5)-Y-[STALV]- 
x-[FWVAC]-[TV]- [SA]-P-[LIVMA]-[RQ]-[KR]-[FY]-x-D-x(3)-[HQ] 
Sequences known to belong to this class detected by the pattern 
ALL. 

j Other sequenced) detected in SWiSS-PROT NONE. 
Last update 

December 1999 / Pattern and text revised. 

References 

[1] 

Zilhao R. f Camelo L, Arraiano CM. 
Mol. Microbiol. 8:43-51(1993). 

[2] 

Cheng Z.-F., Zuo Y., Li Z. 5 Rudd K.E., Deutscher M.P. 
J. Biol. Chem. 273:14077-14080(1998). 

[3] 
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vjoguchi E., Hayashi N., Azuma Y., Seki T., Nakamura M., 
Makashima N., Yanagida M., He X. 5 Mueller U., Sazer S., 
Mishimoto T. 

EMBO J. 15:5595-5605(1996). 
4] 

3euf L, Bedu S., Cami B., Joset F. 
3 lant Mol. Biol. 27.779-788(1995). 

vlian I.S. 

Mucleic Acids Res. 25:3187-3195(1997). 


RRF 




Ribosome recycling 
factor 


Accession number: PF01765 

Definition: Ribosome recycling factor 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pfam-B 949 (release 4.2) 

Gathering cutoffs: -35 -35 

Trusted cutoffs: -34.90 -34.90 

Noise cutoffs: -76.20 -76.20 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 942401 1 5 

Reference Title: Ribosome recycling factor (ribosome 
releasing factor) is 

Reference Title: essential for bacterial growth. 
Reference Author: Janosi L, Shimizu i, Kaji A; 
Reference Location: Proc Natl Acad Sci U S A 1 994;91 :4249- 
4253. 

Database Reference INTERPRO; 1PR002661 ; 

Comment: The ribosome recycling factor (RRF / 

ribosome release factor) dissociates 

Comment: the ribosome from the mRNA after 

termination of translation, and is 

Comment: essential bacterial growth [1]. Thus 

ribosomes are "recycled" and ready 

Comment: for another round of protein synthesis. 
Number of members: 27 


rve 




Integrase core domain 


Accession number: PF00665 

Definition: Integrase core domain 

Author: Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pfarn-B_1 0 (release 2. 1 ) 

Gathering cutoffs: 9.3 9.3 

Trusted cutoffs: 9.30 9.30 

Noise cutoffs: 9.20 9.20 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 95099322 

Reference Title: Crystal structure of the catalytic domain of 
HIV-1 

Reference Title: integrase: similarity to other polynucleotidyl 
transferases 

Reference Title: [see comments] 

Reference Author: Dyda F, Hickman AB, Jenkins TM, 

Engelman A, Craig ie R, 

Reference Author: Davies DR; 

Reference Location: Science 1994;266:1 981 -1 986. 

Database Reference: SCOP; 2itg; fa; [SCOP-USA] [CATH- 

PDBSUM] 

Database Reference INTERPRO; IPR001 584; 
Database Reference PDB; 1 cxu A; 56; 1 98; 
Database Reference PDB; 1vsh ; 54; 199; 
Database Reference PDB; 1vsi ; 54; 199; 
Database Reference PDB; 1vsj ; 54; 199; 
Database Reference PDB; 1cxq A; 53; 198; 
Database Reference PDB; 1a5v ; 54; 199; 
Database Reference PDB; 1 a5w ; 54; 1 99; 
Database Reference PDB; 1 a5x ; 54; 1 99; 
Database Reference PDB; 1 asv ; 54; 1 99; 
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Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database Reference 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Database reference: 
Comment: 
copy of the virai 
Comment: 



PDB; 1vsm A; 54; 199; 
PDB; 1czbA; 53; 198; 
PDB; 1asw ; 53; 201; 
PDB; 1cz9 A; 59; 197; 
PDB; 1vsk ; 54; 199; 
PDB; 1vsl A; 54; 199; 
PDB; 1asu ; 53; 207; 
PDB; 1c0m A; 53; 21 3; 
PDB; 1vsd;54; 88; 
PDB; 1vse;54; 88; 
PDB; 1c1aB; 55; 213; 
PDB; 1c0m B; 54; 213; 
PDB; 1c0m D; 54; 213; 
PDB; 1c1aA; 53; 213; 
PDB; 1c0m C; 53; 213; 
PDB; 1 bhl ; 57; 201 ; 
PDB; 1bi4 B; 57; 201; 
PDB; 1bl3B; 57; 201; 
PDB; 1 b9f A; 56; 201; 
PDB; 1 bis B; 56; 201 ; 
PDB; 1qs4B; 56; 201; 
PDB; 1qs4 C; 56; 201; 
PDB; 1bizA; 54; 201; 
PDB; 1itg ; 55; 201; 
PDB; 1 bi4 C; 53; 201 ; 
PDB; 1bl3C; 53; 201; 
PDB; 2itg;53; 201; 
PDB; 1b9d A; 57; 189; 
PDB; 1bi4A;57; 201; 
PDB; 1b!3A; 57; 201; 
PDB; Ibis A; 56; 201; 
PDB; 1biu A; 56; 201; 
PDB;1biu B; 56; 201; 
PDB;1biu C; 56; 201; 
PDB; 1qs4A; 56; 201; 
PDB; 1 b92 A; 56; 201; 
PDB; 1biz B; 58; 201; 
PDB; 1b9d A; 382; 390; 
PDB; 1wjb A; 53; 55; 
PDB; 1w]b B; 53; 55; 
PDB;1wjd A; 53; 55; 
PDB; 1w]d B; 53; 55; 
PDB; 1wjf A; 53; 55; 
PDB; 1 wjf B; 53; 55; 
PFAMB; PB000048; 
PFAMB; PB007709; 
PFAMB; PB013923; 
PFAMB; PB013938; 
PFAMB; PB018509; 
PFAMB; PB020302; 
PFAMB; PB025327; 
PFAMB; PB028352; 
PFAMB; PB032740; 
PFAMB; PB040612; 
PFAMB; PB040636; 
PFAMB; PB040684; 
PFAMB; PB040695; 
PFAMB; PB040730; 
PFAMB; PB040824; 
PFAMB; PB041112; 
PFAMB; PB041143; 
PFAMB; PB041275; 
PFAMB; PB041356; 
PFAMB; PB041375; 
PFAMB; PB041456; 
PFAMB; PB041459; 
PFAMB; PB041522; 
PFAMB; PB041665; 
PFAMB; PB041761; 
PFAMB; PB041816; 
PFAMB; PB041 885; 
Integrase mediates integration of a DNA 

genome into the host chromosome. 
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ntegrase is composed of 

Comment: three domains. The ammo-terminal domain 
s a zinc binding 

Comment: domain lntegrase_Zn. This domain is the 
central catalytic 

Comment: domain. The carboxyi terminal domain that 
s a non-specific 

Comment: DNA binding domain integrase. 
Comment: The catalytic domain acts as an 
endonuclease when two 

Comment: nucleotides are removed from the 3' ends of 
the blunt-ended 

Comment: viral DNA made by reverse transcription. 
This domain also 

Comment: catalyses the DNA strand transfer reaction 
of the 3' ends 

Comment: of the viral DNA to the 5' ends of the 

integration site [1], 

Number of members: 1147 


S4 




S4 domain 


Accession number: PF01479 

Definition: S4 domain 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Medline:991 931 78 

Gathering cutoffs: 17 17 

Trusted cutoffs: 1 7.20 1 7.20 

Noise cutoffs: 16.70 16.70 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 991931 78 

Reference Title: Novel predicted RNA-binding domains 
associated with the 

Reference Title: translation machinery. 
Reference Author: Aravind L, Koonin EV; 
Reference Location: J Mol Evol 1 999;48:291 -302. 
Reference Number: [2] 
Reference Medline: 98372721 

Reference Title: The crystal structure of ribosornal protein 
S4 reveals a 

Reference Title: two-domain molecule with an extensive 
RNA-binding surface: 

Reference Title: one domain shows structural homology to 
the ETS DNA-binding 
Reference Title: motif. 

Reference Author: Davies C, Gerstner RB, Draper DE, 
Ramakrishnan V, White SW; 

Reference Location: EM BO J 1 998;1 7:4545-4558. 
Database Reference: SCOP; 1 c06; fa; [SCOP-USA] [CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR002942; 

Database Reference PDB; 1 c05 A; 51 ; 98; 

Database Reference PDB; 1 c06 A; 51 ; 98; 

Database Reference PDB; 1 dm9 A; 9; 55; 

Database Reference PDB; 1dm9 B; 9; 55; 

Database reference: PFAMB; PB001 751 ; 

Database reference: PFAMB; PB041 1 47; 

Database reference: PFAMB; PB041 1 48; 

Comment: The S4 domain is a small domain consisting 

of 60-65 amino acid residues 

Comment: that was detected in the bacterial ribosornal 
protein S4, eukaryotic 

Comment: ribosornal S9, two families of pseudouridine 
synthases, a novel family 

Comment: of predicted RNA methylases, a yeast 
protein containing a pseudouridine 

Comment: synthetase and a deaminase domain, 
bacterial tyrosyl-tRNA synthetases, 

Comment: and a number of uncharacterized, small 
proteins that may be involved in 

Comment: translation regulation [1]. The S4 domain 
probably mediates binding to 
Comment: RNA. 
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Number of members: 256 


SAA^proteins 


PDOC00762 


Serum amyloid A 
proteins signature 


The serum amyloid A (SAA) proteins comprise a family of 
vertebrate proteins 

that associate predominantly with high density lipoproteins (HDL) 
[1 ,2]. The 

synthesis of certain members of the family is greatly increased 
(as much as a 

1000 fold) in inflammation; thus making SAA a major acute 
phase reactant. 

While the major physiological function of SAA is unclear, 
prolonged elevation 

of plasma SAA levels, as in chronic inflammation, however, 
results in a 

pathological condition, called amyloidosis, which affects the 
liver, kidney 

and spleen and which is characterized by the highly insoluble 
accumulation of 
SAA in these tissues. 

SAA are proteins of about 110 amino acid residues. As a 
signature pattern, we 

selected the most highly conserved region, which is located in 

the central 

part of the sequence. 

Description of pattern(s) and/or proftle(s) 

Consensus pattern A-R-G-N-Y-[ED]-A-x-[QKR]-R-G-x-G-G-x-W-A 
Sequences known to belong to this ciass detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Last update 

June 1994 / First entry. 

References 

[1] 

Matle E., Steinmetz A., Raynes J.G. 
Atherosclerosis 102:131 -146(1993). 

[23 

Uhlar CM., Burgess C.J., Sharp P.M., Whitehead A.S. 
Genomics 19:228-235(1994). 


SAM 




SAM domain (Sterile 
alpha motif) 


Accession number: PF00536 

Definition: SAM domain (Sterile alpha motif) 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: [1],[2] 

Gathering cutoffs: 1 1 0 

Trusted cutoffs: 1 1 .00 3.70 

Noise cutoffs: 1 0.90 1 0.90 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 961 00659 

Reference Title: SAM: A novel motif in yeast sterile alpha 
and Drosophila 

Reference Title: polyhomeotic proteins 

Reference Author: Ponting CP; 

Reference Location: Prot Sci 1 995;4:1 928-1 930. 

Reference Number: [2] 

Reference Medline: 97160498 

Reference Title: SAM as a protein interaction domain 
involved in 

Reference Title: developmental regulation. 

Reference Author: Shultz J, Ponting CP, Hofmann K, Bork P; 

Reference Location: Prot Sci 1 997;6:249-253. 

Reference Number: [3] 

Reference Medline: 991 01 382 

Reference Title: The crystal structure of an Eph receptor 
SAM domain reveals 

Reference Title: a mechanism for modular dimerization. 
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Reference Author: Stapleton D, Balan 1, Pawson T, Sicheri F; 

Reference Location: Nat Struct Biol 1 999;6:44-49. 

Database reference: SMART; SAM; 

Database Reference: SCOP; 1 bOx; fa; [SCOP-USA]{CATH- 

PDBSUM] 

Database Reference INTERPRO; IPR001660; 
Database Reference PDB; 1 bOx A; 91 0; 973; 
Database Reference PDB; 1sgg ; 7; 70; 
Database Reference PDB; 1 b4f A; 7; 71 ; 
Database Reference PDB; 1 b4f C; 7; 71 ; 
Database Reference PDB; 1 b4f E; 7; 71 ; 
Database Reference PDB; 1 b4f D; 7; 71 ; 
Database Reference PDB; 1 b4f H; 7; 71 ; 
Database Reference PDB; 1 b4f F; 7; 71 ; 
Database Reference PDB; 1 b4f G; 7; 71 ; 
Database Reference PDB; 1 b4f B; 7; 71 ; 
Database reference: PFAMB; PB008631 ; 
Database reference: PFAMB; PB040678; 
Database reference: PFAMB; PB041 111; 
Database reference: PFAMB; PB041385; 
Comment: It has been suggested that SAM is an 
evolutionariiy conserved protein 

Comment: binding domain that is involved in the 
regulation of numerous 

Comment: developmental processes in diverse 

pi ika rvntp^ 

Comment: The SAM domain can potentially function as 
a protein interaction 

Comment: module through its ability to homo- and 
heterooligomerise with 
Comment: other SAM domains- 
Number of members: 110 


SAM_decarbox 




Adenosyl methionine 
decarboxylase 


Accession number: PF01536 

Definition: Adenosyimethionine decarboxylase 

Author: Bashton M ; Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfam-B_600 (release 4.0) 

Gathering cutoffs: 1111 

Trusted cutoffs: 1 7.90 1 7.90 

Noise cutoffs: 5.70 5.70 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98098079 

Reference Title: Cloning, mapping and mutational analysis 
of the 

Reference Title: S-adenosylmethionine decarboxylase gene 
in Drosophila 

Reference Title: meianogaster. 

Reference Author: Larsson J, Rasmuson-Lestander A; 

Reference Location: Mol Gen Genet 1997;256:652-660. 

Database Reference: SCOP; 1jen; fa; [SCOP-USA] [CATH- 

PDBSUM] 

Database Reference INTERPRO; IPR001985; 

Database Reference PDB; 1]en C; 69; 328, 

Database Reference PDB; 1jen A; 69; 329; 

Database Reference PDB; 1jen B; 4; 67; 

Database Reference PDB; 1jen D; 5; 66; 

Comment: This is a family of S-adenosylmethionine 

decarboxylase (SAM DC) proenzymes. 

Comment: In the biosynthesis of poiyamines SAMDC 
produces decarboxyiated 

Comment: S-adenosylmethionine, which serves as the 
aminopropyi moiety necessary 

Comment: for spermidine and spermine biosynthesis 
from putrescine [1]. The Pfam 

Comment: alignment contains both the alpha and beta 

chains that are cleaved to 

Comment: form the active enzyme. 

Number of members: 34 


SBF 




Sodium Bile acid 
symporter family 


Accession number: PF01 758 

Definition: Sodium Bile acid symporter family 
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Author- Bashton M, Bateman A 

Aiignment method of seed: Clustalw 

Source of seed members: Pfam-B_697 (release 4.2) 

Gathering cutoffs: -19-19 

Trusted cutoffs: -12.50 -12.50 

Noise cutoffs: -26.40 -26.40 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command Hne: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97377989 

Reference Title: Isolation of three contiguous genes, ACR1 , 
ACR2 and ACR3, 

Reference Title: involved in resistance to arsenic 
compounds in the yeast 

Reference Title: Saccharomyces cerevisiae. 

Reference Author: Bobrowicz P, Wysocki R, Owsianik G, 

Goffeau A, Ulaszewski 

Reference Author: S; 

Reference Location: Yeast 1997;13:819-828. 

Reference Number: [2] 

Reference Medline: 92073340 

Reference Title: Functional expression cloning and 

characterization of the 

Reference Title: hepatocyte Na+/bile acid cotransport 
system. 

Reference Author: Hagenbuch B t Stieger B, Foguet M, 
LubbertH, Meier PJ; 

Reference Location: Proc Natl Acad Sci U S A 
1991;88:10629-10633. 

Database Reference INTERPRO; IPR002657; 
Database reference: PFAMB; PB041594; 
Comment: This family consists of Na+/bife acid co- 
transporters. 

Comment: These transmembrane proteins function in 
the liver 

Comment: in the uptake of bile acids from portal blood 
plasma 

Comment: a process mediated by the co-transport of 
Na+ [2]. 

Comment: Also in the family is ARC3 from S. 
cerevisiae Swiss:Q06598 

Comment: this is a putative transmembrane protein 
involved in 

Comment: resistance to arsenic compounds [1]. 
Number of members: 29 


Sec7 




Sec7 domain 


Accession number: PF01369 

Definition: Sec7 domain 

Author: Bateman A 

Alignment method of seed: Clustalw_manual 

Source of seed members: Pfam-B_1629 (release 3.0) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 01 .50 1 01 .50 

Noise cutoffs: 13.20 13.20 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98169075 

Reference Title: Structure of the Sec7 domain of the Art 

exchange factor 

Reference Title: ARNO. 

Reference Author: Cherfits J, Menetrey J, Mathieu M, Le 
Bras G, Robineau S, 

Reference Author: Beraud-Dufour S, Antonny B, Chardin P; 
Reference Location: Nature 1998;392:101 -1 05. 
Reference Number: [2] 
Reference Medline: 971 00951 

Reference Title: A human exchange factor for ARF contains 
Sec7- and 

Reference Title: pleckstrin- homology domains. 

Reference Author: Chardin P, Paris S, Antonny B, Robineau 

S, Beraud-Dufour S, 

Reference Author: Jackson CL, Chabre M 
Reference Location: Nature 1 996;384:481 -484. 
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database Reference: SCOP; tpb^^TSCOP-USA][CATH- 

PDBSUM] ^ 

Database Reference INTERPRO; IPR000904; 

Database Reference PDB; 1 pbv ; 58; 243; 

Database Reference PDB; 1 bc9 ; 59; 244; 

Comment: The Sec7 domain is a guanine-nucleotide- 

exchange-f actor (GEF) 

Comment: for the arf family [2]. 

Number of members: 32 


Seedstore 2S 




2S seed storage family 


Accession number: PF01631 

Definition: 2S seed storage family 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 154 (release 4.1) 

Gathering cutoffs: 25 25 

Trusted cutoffs: 95.1 0 95.1 0 

Noise cutoffs: -0.20 10.10 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command tine: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97121264 

Reference Title: 1 H NMR assignment and global fold of 
napin Bnlb, a 

Reference Title: representative 2S albumin seed protein. 
Reference Author: Rico M, Bruix M, Gonzalez C, Monsalve 
Rl, Rodriguez R; 

Reference Location: Biochemistry 1996;35:15672-15682. 
Database Reference: SCOP; 1 pnb; fa; [SCOP-USA][CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR00061 7; 

Database reference: PFAMB; PB029622; 

Comment: Members of this family are composed of two 

chains (both included in 

Comment: the alignment), these are co-translated and 
later cleaved. The two 

Comment: chains are disulphide linked together. 
Number of members: 27 


SH2 


PDOC50001 


Src homology 2 (SH2) 
domain profile 


The Src homology 2 (SH2) domain is a protein domain of about 
100 amino-acid 

residues first identified as a conserved sequence region 
between the 

oncoproteins Src and Fps [1]. Similar sequences were later found 
in many other 

intracellular signal-transducing proteins [2]. SH2 domains 
function as 

regulatory modules of intracellular signalling cascades by 
interacting with 

high affinity to phosphotyrosine-containing target peptides in a 
sequence- 
specific and strictly phosphorylation-dependent manner [3,4,5,6]. 

The SH2 domain has a conserved 3D structure consisting of 
two alpha helices 

and six to seven beta-strands. The core of the domain is 
formed by a 

continuous beta-meander composed of two connected beta- 
sheets [7]. 

So far, SH2 domains have been identified in the following 
proteins: 

- Many vertebrate, invertebrate and retroviral cytoplasmic (non- 
receptor) 

protein tyrosine kinases. In particular in the Src, Abl, Bkt, Csk 
and ZAP70 
families of kinases. 

- Mammalian phosphatidyl inositol-specific phospholipase C 
gamma-1 and -2. Two 

copies of the SH2 domain are found in those proteins in 
between the 
catalytic 'X-' and 'Y-boxes* (see <PDOC50007>). 

- Mammalian phosphatidyl inositol 3-kinase regulatory p85 
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subunit. 

- Some vertebrate and invertebrate protein-tyrosine 
phosphatases. 

- Mammalian Ras GTPase-activating protein (GAP). 

- Adaptor proteins mediating binding of guanine nucleotide 
exchange factors 

to growth factor receptors: vertebrate GRB2, Caenorhabditis 
eiegans sem-5 
and Drosophila DRK. 

- Mammalian Vav oncoprotein, a guanine-nucleotide 
exchange factor of the 

CDC24 family. 

- Miscelianous proteins interacting with vertebrate receptor 
protein 

tyrosine kinases: oncoprotein Crk, mammalian cytoplasmic 
proteins Nek, She. 

- STAT proteins (signal transducers and activators of 
transcription). 

- Chicken tensin. 

- Yeast transcriptional control protein SPT6. 








The profile developed to detect SH2 domains is based on a 
structural alignment 

consisting of 8 gap-free blocks and 7 linker regions totaling 

92 match 

positions. 








Description of pattern (s) and/or profile(s) 








Sequences known to belong to this class detected by the profile 
ALL. 

Other sequence(s) detected in SWISS-PROT protein tyrosine 
kinases JAK1 and JAK2. 
Expert(s) to contact by email 
Zvelebil M. marketa@ludwig.ucl.ac.uk 








Last update 

November 1 995 / First entry. 
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ShikimateDH 




Shikimate / quinate 5- 
dehydrogenase 


Accession number: PF01488 

Definition: Shikimate / quinate 5-dehydrogenase 

Author: Bashton M, Bateman A 

; Alignment method of seed: Clustalw 

Source of seed members: Pfam -B 336 (release 4.0) 
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Gathering cutoffs: -50 -50 

Trusted cutoffs: -48.00 -48.00 

Noise cutoffs: -82.00 -82.00 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command iine: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96048023 

Reference Title: The molecular biology of multidomain 

proteins. Selected 

Reference Title: examples. 

Reference Author: Hawkins AR, Lamb HK; 

Reference Location: Eur J Biochem 1995;232:7-1 8. 

Database Reference INTERPRO; IPR002907; 

Comment: This family contains both shikimate and 

quinate dehydrogenases. 

Comment: Shikimate 5-dehydrogenase catalyses the 
conversion of 

Comment: shikimate to 5-dehydroshikimate. This 
reaction is part of 

Comment: the shikimate pathway which is involved in 
the biosynthesis 

Comment: of aromatic amino acids. 

Comment: Quinate 5-dehydrogenase catalyses the 

conversion of 

Comment: quinate to 5-dehydroquinate. This reaction 
is part of 

Comment: the quinate pathway where quinic acid is 
exploited as 

Comment: a source of carbon in prokaryotes and 

miprnhif^l 

1 1 Nut UulUl 

Comment: eukaryotes. 

Comment: Both the shikimate and quinate pathways 
share two common 

Comment: pathway metabolites 3-dehydroquinate and 

dehydroshikimate . 

Number of members: 58 


Sigma54 factors 


PDOC00593 


Sigma-54 factors family 
signatures and profile 


Sigma factors [1] are bacterial transcription initiation factors that 
promote 

the attachment of the core RNA polymerase to specific initiation 
sites and are 

then released. They alter the specificity of promoter 
recognition. Most 

bacteria express a multiplicity of sigma factors. Two of these 
factors, sigma- 

70 (gene rpoD), generally known as the major or primary sigma 
factor, and 

sigma-54 (gene rpoN or ntrA) direct the transcription of a wide 
variety of 

genes. The other sigma factors, known as alternative sigma 
factors, are 

required for the transcription of specific subsets of genes. 

With regard to sequence similarity, sigma factors can be 
grouped into two 

classes: the sigma-54 and sigma-70 families. The sigma-70 
family has many 

different sigma factors {see the relevant entry <PDOC00592>). 
The sigma-54 

family consists exclusively of sigma-54 factor [2 } 3] required 
for the 

transcription of promoters that have a characteristic -24 and -12 
consensus 

recognition element but which are devoid of the typical -10,-35 
sequences 

recognized by the major sigma factors. The sigma-54 factor 
is also 

characterized by its interaction with ATP-dependent positive 
regulatory 

proteins that bind to upstream activating sequences. 
Structurally sigma-54 factors consist of three distinct regions: 
- A relatively well conserved N-terminal glutamine-rich region of 
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about 50 

residues that contains a potential leucine zipper motif. 

- A region of variable length which is not well conserved. 

- A well conserved C-terminal region of about 350 residues that 
contains a 

second potential leucine zipper, a potential DNA-binding "helix- 
turn-helix' 

motif and a perfectly conserved octapeptide whose function is 
not known. 

We developed two signature patterns for this family ofsigma 
factors. The 

first starts two residues before the N-terminal extremity of the 
helix-turn- 
helix region and ends two residues before its C-terminal extremity. 
The second 

is the conserved octapeptide. A profile has aiso been designed 

that covers the 

whole Oterminal region. 

Description of pattern (s) and/or profile(s) 

Consensus pattern P-[LlVM]-x-[LIVM]-x(2)-[LIVM]-A-x{2)~ 
[LIVMFT]-x(2)-|HS]-x-S-T-[LIVM]-S-R 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern R-R-T-[IV3-[ATN]-K-Y-R 

Sequences known to belong to this class detected by the pattern 

ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Sequences known to belong to this ciass detected by the profile 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note this documentation entry is linked to both a signature pattern 
and a profile. As the profile is much more sensitive than the 
pattern, you should use it if you have access to the necessary 
software tools to do so. 
Last update 

July 1999 / Patterns and text revised. 

References 

[1] 

Helmann J.D., Chamberlin M.J. 

Annu. Rev. Biochem. 57:839-872(1988). 

F 21 

Thoeny B., Hennecke H. 

FEMS Microbiol. Rev. 5:341-358(1989). 

[3] 

Merrick M.J. 

Mol. Microbiol. 10:903-909(1993). 


SLH 


PDOC00823 


S-!ayer homology 
domain signature 


S-layers are paracrystalline mono-layered assemblies of 
(glyco) proteins which 

coat the surface of bacteria [1]. Several S-layer proteins and 
some other cell 

wail proteins contain one or more copies of a domain of about 50- 
60 residues, 

which has been called SLH (for S-layer homology) [2]. There is 
strong evidence 

that this domain serves as an anchor to the peptidoglycan [3]. 
The SLH domain 
has been found in: 

- S-layer glycoprotein of Acetogenium kivui (3 copies). 

- S-layer 1 25 Kd protein of Bacillus sphaericus (3 copies). 

- S-layer protein of Bacillus anthracis (3 copies). 

- S-layer protein of Bacillus iicheniformis (3 copies). 
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- S-layer protein (HWP) from Bacillus brevis strain HPD31 (3 
copies). 

- Middle ceil wall protein (MWP) from Bacillus brevis strain 47 (3 
copies). 

- S-iayer protein (p100) of Thermus thermophilus (1 copy). 

- Outer membrane protein Omp-alpha from Thermotoga marrtima 
(1 copy). 

- Cellulosome anchoring protein (gene ancA), outer layer protein 
B (OlpB) and 

a further potential cell surface glycoprotein from Clostridium 
thermocellum 

(3 copies; the first copy is missing its N-terminal third which is 
appended 

to the end of the third copy; may have arisen by circular 
permutation). 

- Amyiopullulanase (gene amyB) from Thermoanaerobacter 
thermosuifurogenes (3 

copies) 

- Amyiopullulanase (gene aapT) from Bacillus strain XAL-601 {3 
copies). 

- Endoglucanasefrom Bacillus strain KSM-635 (3 copies). 

- Exoglucanase (gene xynX) from Clostridium thermocellum (3 
copies). 

- Xyianase A (gene xynA) from Thermoanaerobacter 
saccharolyticum (2 copies; 3 

copies if a frameshift is taken into account). 

- Protein involved in butirosin production (ButB) from Bacillus 
circulans (2 

incomplete copies; 3 copies If three frameshifts are taken into 
account). 

- Two hypothetical proteins from Synechocystis strain PCC 6803 
(1 copy each). 

- A hypothetical protein with sequence similarity to 
amylopulluianases found 

3 1 of amylase gene from Bacillus circulans (fragment of 1 copy; 
3 copies if 
two frameshifts are taken into account). 

SLH domains are found at the N- or C-termini of mature proteins. 
They occur in 

single copy followed by a predicted coiled coil domain, or in three 
contiguous 

copies. Structurally, the SLH domain is predicted to contain two 
alpha-helices 

flanking a beta strand. The SLH sequences are fairly divergent 
with an average 

identity of about 25%. It is however possible to build a sequence 
pattern that 

starts at the second position of the domain and that spans 3/4 of 
its length. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [LVFYT>x-[DA]-x(2,5)-[DNGSATPHY]- 
[FYWPDA]-x(4)-[LIV]-x(2)- [GTALVj-x(4,6)-[l-IVFYC]-x{2)-G-x- 
[PGSTA]-x(2,3)-[MFYAj-x- [PGAV]-x(3,10)-[LIVMA]-[STKR]-[RY]- 
x-[EQ]-x-[STALIVM] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Expert(s) to contact by email 

Lupas A.N. Iupas@vms.biochem.mpg.de 

Last update 

November 1997 / Pattern and text revised. 

References 

HI 

Beveridge T.J. 

Curr. Opin. Struct. Biol. 4:204-212(1994). 
[2] 

Lupas A., Engelhardt H., Peters J., Santarius U., Voiker S., 
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Baumeister W. 

J. Bacterid. 176:1224-1233(1994). 
[3] 

Lemaire M., Ohayon H., Gounon P., Fujino T., Beguin P. 
J. Bacteriof. 177:2451-2459(1995). 
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Smr domain 


Accession number: PF01 71 3 

Definition: Smr domain 

Author: Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: [1] 

Gathering cutoffs: 0 0 

Trusted cutoffs: 1 .40 1 .40 

Noise cutoffs: -7.90 -7.90 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 10431 1 72 

Reference Title: Smr: a bacterial and eukaryotic homologue 
of the C-terminal 

Reference Title: region of the MutS2 family. 
Reference Author: Moreira D, Philippe H; 
Reference Location: Trends Biochem Sci 1999;24:298-300. 
Database Reference INTERPRO; IPR002625; 
Comment: This family includes the Smr (Small MutS 
Related) proteins, 

Comment: and the C-terminai region of the MutS2 
protein. It has been 

Comment: suggested that this domain interacts with 
the MutS1 

Comment: Swiss: P23909 protein in the case of Smr 
proteins and with 

Comment: the N-terminal MutS related region of MutS2 

Swiss:P94545 [1]. 

Number of members: 1 4 


SRF-TF 


PDOC00302 


MADS-box domain 
signature and profile 


A number of transcription factors contain a conserved domain of 
56 amino-acid 

residues, sometimes known as the MADS-box domain [E1]. They 
are listed below: 

-Serum response factor (SRF) [1], a mammalian transcription 
factor that 

binds to the Serum Response Element (SRE). This is a short 
sequence of dyad 

symmetry located 300 bp to the 5' end of the transcription 
initiation site 

of genes such as c-fos. 

- Mammalian myocyte-specific enhancer factors 2A to 2D 
(MEF2A to MEF2D). 

These proteins are transcription factor which binds specifically 
to the 

MEF2 element present in the regulatory regions of many 
muscle-specific 
genes. 

- Drosophita myocyte-specific enhancer factor 2 (MEF2). 
-Yeast GRM/PRTF protein (gene MCM1)[2], a transcriptional 
regulator of 

mating-type-specific genes. 

- Yeast arginine metabolism regulation protein I (gene ARGR1 or 
ARG80). 

- Yeast transcription factor RLM1 . 

- Yeast transcription factor SMP1 . 

- Arabidopsis thaliana agamous protein (AG) [3], a probable 
transcription 

factor involved in regulating genes that determines stamen 
and carpel 

development in wild-type flowers. Mutations in the AG gene 
result in the 

replacement of the stamens by petals and the carpets by a new 
flower. 

-Arabidopsis thaliana horn eotic proteins Apetalal (AP1), 
Apetala3 (AP3) and 
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Pistiilata (PI) which act locally to specify the identity of the 
floral 

meristem and to determine sepal and petal devefopment [4]. 

- Antirrhinum majus and tobacco homeotic protein deficiens 
(DEFA) and globosa 

(GLO) [5]. Both proteins are transcription factors involved in the 
genetic 

control of flower development. Mutations in DEFA or GLO 
cause the 

transformation of petals into sepals and of stamina into carpels. 

- Arabidopsis thaliana putative transcription factors AGL1 to 
AGL6 [6]. 

- Antirrhinum majus morphogenetic protein DEF H33 (squamosa). 

In SRF, the conserved domain has been shown [1] to be involved 
in DNA-binding 

and dimerization. We have derived a pattern that spans the 
complete length of 

the domain. The profile also spans the length of the MADS-box. 

Description of pattern(s) and/or profile(s) 

Consensus pattern R-x-[RK]-x(5)-l-x-[DNGSK3~x(3)-[KR3-x(2)-T- 

[FY]-x-[RK] (3)- x(2)-[LlVM3-x-K(2)-A-x-E-[UVM]-[STA]-x-L~x(4)- 

[LlVM]-x- [LlVM](3)-x(6)-[LIVMF]-x(2)-[FY] 

Sequences known to befong to this class detected by the pattern 

ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Sequences known to belong to this class detected by the profile 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note this documentation entry is linked to both signature patterns 
and a profile. As the profile is much more sensitive than the 
patterns, you should use it if you have access to the necessary 
software tools to do so. 
Last update 

July 1999/ Pattern and text revised. 

References 

[1] 

Norman C, Runswick M., Pollock R., Treisman R. 
Cell 55:989-1003(1988). 

[2] 

Passmore S., Maine G.T., Elble R., Christ C, Tye B.-K. 
J. Mol. Biol. 204:593-606(1988). 

[3] 

Yanofsky M., Ma H., Bowman J., Drews G., Feldmann K.A., 
Meyerowitz E.M. 
Nature 346:35-39(1990). 

[4] 

Goto K., Meyerowitz E.M. 
Genes Dev. 8:1548-1560(1994). 

[5] 

Troebner W., Ramirez L, Motte P., Hue I., Huijser P., Loennig W.- 
E., Saedier H., Sommer H., Schwartz-Sommer Z. 
EMBO J. 1 1 : 4693-4704(1 992). 

[6] 

Ma H., Yanofsky M.F., Meyerowitz E.M. 
Genes Dev. 5:484-495(1991). 

[E1] 

http://transfac.gbf-braunschweig.de/cgi-bin/qt/getEntry.pl7C0014 


SRP19 




SRP19 protein 


Accession number: PF01922 
Definition: SRP19 protein 
Author: Enright A, Ouzounis C, Bateman A 
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Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 31 .20 31 .20 

Noise cutoffs: -28.50 -28.50 

HMM buiid command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 89041 541 

Reference Title: Isolation and characterization of a cDNA 
clone encoding the 

Reference Title: 1 9 kDa protein of signal recognition particle 
(SRP): 

Reference Title: expression and binding to 7SL RNA. 

Reference Author: Lingelbach K, Zwieb C, Webb JR, 

Marshallsay C 3 Hoben PJ, 

Reference Author: Walter P, Dobberstein B; 

Reference Location: Nucleic Acids Res 1988;16:9431-9442. 

Reference Number: [2] 

Reference Medline: 92220168 

Reference Title: SEC65 gene product is a subunit of the 
yeast signal 

Reference Title: recognition particle required for its integrity. 
Reference Author: Hann BC, Stirling CJ, Walter P; 
Reference Location: Nature 1992;356:532-533. 
Reference Number: [3] 
Reference Medline: 922201 69 

Reference Title: The S. cerevisiae SEC65 gene encodes a 
component of yeast 

Reference Title: signal recognition particle with homology to 
human SRP19. 

Reference Author: Stirling CJ, Hewitt EW; 
Reference Location: Nature 1992;356:534-537. 
Database Reference INTERPRO; IPR002778; 
Comment: The signal recognition particle (SRP) binds 
to the signal peptide of 

Comment: proteins as they are being translated. The 
binding of the SRP halts 

Comment: translation and the complex is then 
transported to the endoplasmic 

Comment: reticulum's cytoplasmic surface. The SRP 
then aids translocation of 

Comment: the protein through the ER membrane. The 
SRP is a ribonucleoprotein 

Comment: that is composed of a small RNA and 
several proteins. One of these 

Comment: proteins is the SRP1 9 protein [1] (Sec65 in 
yeast [2,3]). 

Number of members: 13 


SSB 


PDOC00602 


Single-strand binding 
protein family signatures 


The Escherichia coli single-strand binding protein [1] (gene ssb), 
also known 

as the helix-destabilizing protein, is a protein of 177 amino 
acids, it 

binds tightly, as a homotetramer, to single-stranded DNA (ss- 
DNA) and plays an 

important role in DNA replication, recombination and repair. 

Closely related variants of SSB are encoded in the genome of 
a variety of 

large self-transmissible plasm ids. SSB has also been 
characterized in bacteria 

such as Proteus mirabilis or Serratia marcescens. 

Eukaryotic mitochondrial proteins that bind ss-DNA and are 
probably involved 

in mitochondrial DNA replication are structurally and evolutionary 
related to 

prokaryotic SSB. Proteins currently known to belong to this 
subfamily are 
listed below [2]. 

- Mammalian protein Mt-SSB (P16). 

- Xenopus Mt-SSBs and Mt-SSBr. 
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- Drosophila MtSSB. 

- Yeast protein RIM1. 

We have developed two signature patterns for these proteins. 
The first is a 

conserved region in the N-terminal section of the SSB's. The 
second is a 

centrally located region which, in Escherichia coli SSB, is 
known to be 

involved in the binding of DNA, 
Description of pattern (s) and/or profile(s) 

Consensus pattern [LIVMF]-[NST]-EKRHST|-[LIVM]-x-[LIVMF=](2)- 
G-[NHRK]- [LiVMA]-[GST]-x-[DENT] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern T-x-W-[HY]-[RNS]-[LIVM]-x-[LIVMF]^EFY]- 
[NGKR] 

Sequences known to belong to this class detected by the pattern 
A majority. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1999 / Patterns and text revised. 

References 

[13 

Meyer R.R., Laine P.S. 
Microbiol Rev. 54:342-380(1990). 

[2] 

Stroumbakis N.D., Li Z., Tolias P.P. 
Gene 143:171-177(1994). 
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START domain 


Accession number: PF01852 
Definition: START domain 
Author: SMART 
Alignment method of seed: Manual 

Source of seed members: Alignment kindly provided by SMART 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 06.20 1 06.20 

Noise cutoffs: -20.90 -20.90 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 99257451 

Reference Title: START: a lipid-binding domain in StAR, 
HD-ZIPand 

Reference Title: signalling proteins 

Reference Author: Ponting CP, Aravind L; 

Reference Location: Trends Biochem Sci 1 999;24:1 30-1 32. 

Database reference: SMART; START; 

Database Reference INTERPRO; IPR00291 3; 

Number of members: 41 


Sterol desat 




Sterol desaturase 


Accession number: PF01598 

Definition: Sterol desaturase 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_905 (release 4.1 ) 

Gathering cutoffs: -13-13 

Trusted cutoffs: 1 2.90 1 2.90 

Noise cutoffs: -44.50 -44.50 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1 ] 

Reference Medline: 91 323727 

Reference Title: Cloning, disruption and sequence of the 
gene encoding yeast 

Reference Title: C-5 sterol desaturase. 

Reference Author: Arthington BA, Bennett LG, Skatrud PL, 
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Guynn CJ, Barbuch 

Reference Author: RJ, Ulbright CE, Bard M; 
Reference Location: Gene 1 991 ;1 02:39-44. 
Reference Number: [2] 
Reference Medline: 961 33902 

Reference Title: Cloning and characterization of ERG25, the 
Saccharomyces 

Reference Title: cerevisiae gene encoding C-4 sterol methyl 
oxidase. 

Reference Author: Bard M, Bruner DA, Pierson CA, Lees 

ND, Biermann B, Frye L, 

Reference Author: Koegei C, Barbuch R; 

Reference Location: Proc Natl Acad Sci U S A 1 996;93:186- 

190. 

Reference Number: [3] 
Reference Medline: 96351930 

Reference Title: Molecular characterization of the CER1 
gene of arabidopsis 

Reference Title: involved in epicuticular wax biosynthesis 
and pollen 

Reference Title: fertility. 

Ref erence Author: Aarts MG, Keijzer CJ, Stiekema WJ, 
Pereira A; 

Reference Location: Plant Cell 1 995;7:21 1 5-21 27. 
Database Reference INTERPRO; IPR001541 ; 
Database reference: PFAMB; PB041 851 ; 
Comment: This family includes C-5 sterol desaturase 
and C-4 sterol methyl 

Comment: oxidase. Members of this family are 
involved in cholesterol biosynthesis 

Comment: and biosynthesis a plant cuticular wax. 
These enzymes contain many 

Comment: conserved histidine residues. Members of 

this family are integral 

Comment: mebrane proteins. 

Number of members: 34 


Sulfatase 


PDOC00117 


Sulfatases signatures 


Sulfatases (EC 3.1 .6.-) are enzymes that hydrolyze various sulfate 
esters. The 

sequence of different types of sulfatases are available. These 
enzymes are: 

- Arylsulfatase A {EC 3.1.6.8) (ASA), a lysosomal enzyme which 
hydrolyzes 

cerebroside sulfate. 

- Arylsulfatase B {EC 3.1.6.12) (ASB), a lysosomal enzyme 
which hydrolyzes 

the sulfate ester group from N-acetylgalactosamine 4-sulfate 
residues of 
dermatan sulfate. 

- Arylsulfatase C (ASD). 

- Arylsulfatase E (ASE). 

- Steryf-sulfatase {EC 3.1.6.2) (STS) (arylsulfatase C), a 
membrane bound 

microsomal enzyme which hydrolyzes 3-beta-hydroxy steroid 
sulfates. 

- Iduronate 2-sulfatase precursor (EC 3.1 .6.13) (IDS), a 
lysosomal enzyme 

that hydrolyzes the 2-sulfate groups from non-reducing- 
terminal iduronic 
acid residues in dermatan sulfate and heparan sulfate. 

- N-acetylgalactosamine-6-sulfatase (EC 3.1.6.4), an enzyme 
that hydrolyzes 

the6-sulfate groups of the N-acetyl-D-gaiactosamine 6-sulfate 
units of 

chondroitin sulfate and the D-gaiactose 6-sulfate units of keratan 
sulfate. 

-Choline sulfatase (EC 3.1.6.6) (gene betC), a bacterial 
enzyme that 
converts choline-O-sulfate to choline. 

- Giucosamine-6-sulfatase (EC 3.1.6.14) (G6S) , a lysosomal 
enzyme that 

hydrolyzes the N-acetyl-D-glucosamine 6-sulfate units of 
heparan sulfate 
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and keratan sulfate. 

- N-sulphoglucosamine suiphohydrofase (EC 3.10.1.1) 
(sulphamidase), the 

lysosomal enzyme that catalyzes the hydrolysis of N-sulfo-d- 
glucosamine into 
glucosamine and sulfate. 

- Sea urchin embryo arylsulfatase (EC 3.1 .6.1 ). 

-Green alga arylsulfatase (EC 3.1.6.1), an enzyme which plays 
an important 
role in the mineralization of sulfates. 

Arylsulfatase (EC 3.1.6.1) from Escherichia col i (gene asIA), 
Klebsiella 

aerogenes (gene atsA) and Pseudomonas aeruginosa (gene 
atsA). 

Escherichia coli hypothetical protein yidJ. 

It has been shown that all these sulfatases are structurally related 
[1.2,3]. 

As signature patterns for that family of enzymes we have selected 
the two best 

conserved regions. Both regions are located in the N-terminai 
section of these 

enzymes. The first region contains a conserved arginine which 
couid be 

implicated in the catalytic mechanism; it is located four residues 
after a 

position that, in eukaryotic sulfatases, is a conserved cysteine 
which has 

been shown [4] to be modified to 2-amino-3-oxopropionic acid. In 
prokaryotes, 

this cysteine is replaced by a serine. 
Description of pattern (s) and/or profile (s) 

Consensus pattern [SAPl-[LiVMST]-[CS]-[STAq~P-[STA3-R-x(2)- 
[LIVMFWJ{2)- P~AR]-G [R is a putative active site residue] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern G-[YV]-x-[ST]-x(2)-[iVAS]-G-K-x(0,1)- 
[FYWMK]-[HL] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequenced) detected in SWISS-PROT NONE. 
Last update 

December 1999 / Patterns and text revised. 

References 
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Vingron M., Meyer H.E., Pohlmann R., von Figura K. 
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[2] 

Wilson P.J., Morris CP., Anson D.S., Occhiodoro T., Bieiicki J., 

Clements P.R., Hopwood J.J. 

Proc. Natl. Acad. Sci. USA 87:8531-8535(1990). 

E3] 

de Hostos E.L. ; Schilling J., Grossman A.R. 
Mol. Gen. Genet. 218:229-239(1989). 
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Selmer T., Halimann A., Schmidt B., Sumper M., von Figura K. 
Eur. J. Biochem. 238:341-345(1996). 



Sulfate transporters 
signature 



A number of proteins involved in the transport of sulfate across a 
membrane 

as well as some yet uncharacterized proteins have been 
shown [1 ,2] to be 

evolutionary related. These proteins are: 
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- Neurospora crassa sulfate permease II (gene cys-14). 

- Yeast sulfate permeases (genes SUL1 and SUL2). 

- Rat sulfate anion transporter 1 (SAT-1). 

- Mammalian DTDST, a probable sulfate transporter which, in 
Human, is 

involved in the genetic disease, diastrophic dysplasia (DTD). 

- Sulfate transporters 1 , 2 and 3 from the legume Stylosanthes 
lamata. 

- Human pendrin (gene PDS), which is involved in a number of 
-tearing loss 

genetic diseases. 

- Human protein DRA (Down-Regulated in Adenoma). 

- Soybean early nodulin 70. 

- Escherichia coil hypothetical protein ychM. 

- Caenorhabditis eiegans hypothetical protein F41D9.5. 

tK$ expected by their transport function, these proteins are highly 
hydrophobic 

and seem to contain about 1 2 transmembrane domains. The best 
conserved region 

seems to be located in the second transmembrane region and 
s used as a 
signature pattern. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [PAV]-x-Y-[GS]-L-Y-[STAG}{2)-x(4)-[UVFYA]- 
[LIVST]-[YI]- x(3)-[GA]-[GST]-S-[KR] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

July 1999/ Pattern and text revised. 

References 

[1] 

Sandal N.N., Marcker K.A. 

Trends Biochem. Sci. 19:19-19(1994). 

E2] 

Smith F.W., Hawkesford M.J., Prosserl.M., Clarkson D.T. 
Mol. Gen. Genet. 247:709-715(1995). 
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Synuclein 


Accession number: PF01387 

Definition: Synuclein 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: [1] 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 97.80 1 97.80 

Noise cutoffs: -33.80 -33.80 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Ref erence Medline: 98424410 

Reference Title: The synuclein family. 

Reference Author: Lavedan C; 

Reference Location: Genome Res 1998;8:871-880. 

Database Reference INTERPRO; IPR001 058; 

Comment: There are three types of synucleins in 

humans, these 

Comment: are called alpha, beta and gamma. Alpha 
synuclein has 

Comment: been found mutated in families with 
autosomal dominant 

OOmment narKlllfaUn a Uracctofci. r\ pe^JLiuc ui ia 

synuclein has 

Comment: also been found in amyloid plaques in 
Alzheimer's 

Comment: patients. 
Number of members: 1 2 



Attorney No. 2750-1237P 



1013 



Pfarn f 




r ufi Name E 


description 


TEA f 


3 DOC00479 " 


PEA domain signature 1 
t 
\ 
1 
r 

< 

1 

1 


l~he TEA domain [1 5 E1] is a DNA-binding region of about 66 to 
58 amino acids 

vhich has been found in the N-terminal section of the 
oil owing nuclear 
egulatory proteins: 

- Mammalian enhancer factor TEF-1 . TEF-1 can bind to two 
jistinct sequences 

in the SV40 enhancer and is a transcriptional activator. 

- Mammalian TEF-3, TEF-4 and TEF-5 [2], putative 
ranscriptional activators 

highly similar to TEF-1 . 

- Drosophiia scalloped protein {gene sd), a probable 
ranscription factor 

that functions in the regulation of cell-specific gene expression 
during 

Drosophiia development, particularly in the differentiation of the 
nervous 
system [3]. 

- Emericella nidulans regulatory protein abaA. AbaA is 
nvolved in the 

regulation of con id iation (asexual spore); its expression leads 
to the 

cessation of vegetative growth. 

- Yeast trans-acting factor TEC1 . TEC1 is involved in the 
activation of the 

Ty1 retrotransposon. 

- Caenorhabditis elegans hypothetical protein F28B12.2. 

As a signature pattern, we have used positions 39 to 67 of the 
TEA domain. 

Description of pattern(s) and/or profile(s) 

Consensus pattern G-R-N-E-L-l-x(2)-Y-l-x(3)-|TC]-x(3)-R-T- 
[RK] (2)-Q-[LIVMj- S-S-H-[LI VMJ-Q-V 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Pattern and text revised. 

References 

[1] 

Buerglin T.R. 

Cell 66:11-12(1991). 

[2] 

Jacquemin P., Hwang J.-J., Martial J.A., Dolle P., Davidson I. 
J. Biol. Chem. 271 :21 775-21 785(1 996). 

[3] 

Campbell S.D., tnamdar M., Rodngues V., Raghavan V., 
Palazzolo M., Chovnick A. 
Genes Dev. 6:367-379(1992). 

[E1] 

http://transfac.gbf-braunschweig.de/cgi-bin/qt/getEntry.pl7C0024 
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Queuine tRNA- 
ribosyltransferase 


Accession number: PF01702 

Definition: Queuine tRNA-ribosy transferase 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pfam-B^ 1 643 (release 4.1 ) 

Gathering cutoffs: -132 -132 

Trusted cutoffs: -1 10.00 -1 10.00 

Noise cutoffs: -1 55.40 -1 55.40 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96256303 

Reference Title: Crystal structure of tRNA-guanine 

transglycosylase: RNA 
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=}ef erence Title: modification by base exchange. 
Reference Author: Romier C, Reuter K, Suck D, Ficner R; 
Reference Location: EM BO J 1 996;1 5:2850-2857. 
Reference Number: [2] 
Reference Medline: 932871 1 6 

Reference Title: tRNA-guanine transglycosylase from 
Escherichia coli. 

Reference Title: Overexpression, purification and quaternary 
structure. 

Reference Author: Garcia GA, Koch KA, Chong S; 
Reference Location: J Mol Biol 1 993;231 : 489-497. 
Database Reference: SCOP; 1 pud; fa; [SCOP-USA][CATH- 
PDBSUM] 

Database Reference INTERPRO; 1PR002616; 
Database Reference PDB; 1 efe A; 1 38; 379; 
Database Reference PDB; 1enu A; 138; 379; 
Database Reference PDB; 1 pud ; 138; 379; 
Database Reference PDB; 1wkd ; 138; 379; 
Database Reference PDB; 1wke ; 138; 379; 
Database Reference PDB; 1wkf ; 138; 379; 
Database reference: PFAMB; PB037884; 
Comment: This is a family of queuine tRNA- 
rtbosyltransferases 

Comment: EC:2.4.2.29, also known as tRNA-guanine 
transglycosylase 

Comment: and guanine insertion enzyme. 
Comment: Queuine tRNA-ribosyltransf erase modifies 
tRNAs for asparagine, 

Comment: aspartic acid, histidine and tyrosine with 
queuine. 

Comment: It catalyses the exchange of guanine-34 at 
the wobble position with 

Comment: 7-aminomethyf-7-deazaguanine, and the 
addition of a cyclopentenediol 

Comment: moiety to 7-aminomethyl-7-deazaguanine- 
34 tRNA; giving a hypermodtfied 

Comment: base queuine in the wobble position [1 ,2]. 
Comment: The aligned region contains a zinc binding 
motif C-x-C-x2-C-x29-H, 

Comment: and important tRNA and 7-aminomethyl- 
7deazaguanine binding residues [1]. 
Number of members: 24 
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Thi4 family 


Accession number: PF01 946 
Definition: Thi4 family 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 526.80 526.80 

Noise cutoffs: -1 05.00 -1 05.00 

HMM build command line: hmrnbuild -F HMM SEED 

HMM build command tine: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 95050223 

Reference Title: Cloning, nucleotide sequence, and 

regulation of 

Reference Title: Schizosaccharomyces pombe thi4, a 
thiamine biosynthetic 
Reference Title: gene. 

Reference Author: Zurlinden A, Schweingruber ME; 
Reference Location: J Bacterid 1 994;1 76:6631 -6635. 
Database Reference INTERPRO; IPR002922; 
Comment: This family includes Swiss:P3231 8 a 
putative thiamine biosynthetic 
Comment: enzyme. 
Number of members: 1 4 




ThiC 




ThiC family 


Accession number: PF01964 

Definition: ThiC family 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 
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rrusted cutoffs: 1 047.20 1 047.20 

Moise cutoffs: -338.20 -338.20 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 931 63063 

Reference Title: Structural genes for thiamine biosynthetic 
snzymes 

Reference Title: (thiCEFGH) in Escherichia coli K-1 2. 
Reference Author: Vander Horn PB, Backstrom AD, Stewart 
v/, Beg ley TP; 

Reference Location: J Bacteriol 1993;1 75:982-992. 

Reference Number: [2] 

Reference Medline: 9931 1 269 

Reference Title: Thiamin biosynthesis in prokaryotes. 

Reference Author: Begley TP, Downs DM, Eaiick SE, 

McLafferty FW, Van Loon AP, 

Reference Author: Taylor S, Campobasso N, Chiu HJ, 
<insland C, Reddick JJ, Xi 
Reference Author: J; 

Reference Location: Arch Microbiol 1 999; 1 71 :293-300. 
Reference Mumber: [3] 
Reference Mediine: 97284509 

Reference Title: Characterization of the Bacillus subtilis thiC 
operon 

Reference Title: involved in thiamine biosynthesis. 
Reference Author: Zhang Y, Taylor SV, Chiu HJ, Begley TP; 
Reference Location: J Bacteriol 1 997;1 79:3030-3035. 
Database Reference INTERPRO; IPR00281 7; 
Comment: ThiC is found within the thiamine 
biosynthesis operon. ThiC is 

Comment: involved in pyrimidtne biosynthesis [2]. 
Comment: ThiC catalyzes the substitution of the 
pyrophosphate of 

Comment: 2-methyl-4-amino~5- 

hydroxymethylpyrimidine pyrophosphate by 

Comment: 4-methyl-5-(beta-hydroxyethyl)thiazole 

phosphate to yield thiamine 

Comment: phosphate [3]. 

Number of members: 1 2 


ThiJ 




ThiJ/Pfpl family 


Accession number: PF01965 

Definition: ThU/Pfpl family 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: -40.2 -40.2 

Trusted cutoffs: -40.20 -40.20 

Nosse cutoffs: -47.00 -47.00 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97039868 

Reference Title: The thiJ locus and its relation to 

phosphorylation of 

Reference Title: hydroxymethyipyrimidine in Escherichia 
coli. 

Reference Author: Mizote T, Tsuda M, Nakazawa T, 
Nakayama H; 

Reference Location: Microbiology 1 996; 1 42:2969-2974. 
Reference Number: [2] 
Reference Medline: 961 961 68 

Reference Title: Sequence, expression in Escherichia coli, 
and analysis of 

Reference Title: the gene encoding a novel intracellular 
protease (Pfpl) 

Reference Title: from the hyperthermophilic archaeon 
Pyrococcus furiosus. 

Reference Author: Halio SB, Blumentals II, Short SA, Merrill 
BM, Kelly RM; 

Reference Location: J Bacteriol 1996;178:2605-2612. 
Database Reference INTERPRO; IPR002818; 
Database reference: PFAMB; PB002774; 
Database reference: PFAMB; PB007213, 
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Database reference: PFAMB; PB041 784; 
Comment: This family includes ThiJ a thiamine 
biosynthesis 

Comment: enzyme [1] that catalyses the 
phosphorylation of 

Comment: hydroxymethytpyrimidine (HMP) to HMP 
monophosphate EC;2.7.1 .49. 

Comment: The family also includes a the protease Pfpl 

Swtss:Q51732 [2]. 

Number of members: 34 


Thr„dehydrat_C 




C-terminal domain of 
Threonine dehydratase 


Accession number: PF00585 

Definition: C-terminal domain of Threonine dehydratase 

Previous Pfam IDs: Thr dehydratase ^; 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Bateman A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 99.90 51 .30 

Noise cutoffs; -1.10-1.10 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmrncalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98230745 

Reference Title: Structure and control of pyridoxal 

phosphate dependent 

Reference Title: allosteric threonine deaminase. 
Reference Author: Gallagher DT, Giliiland GL, Xiao G, 
Zondlo J, Fisher KE, 

Reference Author: Chinchilla D, Eisenstetn E; 
Reference Location: Structure 1998;6:465-475. 
Database Reference: SCOP; 1 tdj; fa; [SCOP-USA][CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR001721 ; 

Database Reference PDB; ltd] ; 424; 512; 

Database Reference PDB; ltd] ; 329; 419; 

Comment: -!- Threonine dehydratases PALP all contain 

a carboxy 

Comment: terminal region. This region may have a 
regulatory role. 

Comment: Some members contain two copies of this 
region. 

Number of members: 30 


thymidylatsynt 


PDOC00086 


Thymidylate synthase 
active site 


Thymidylate synthase (EC 2.1 .1 .45) [1 ,2} catalyzes the reductive 
methylation 

of dUMP to dTMP with concomitant conversion of 5 ; 10- 
m ethylenetetrahyd rof olate 

to dihydrofotate. Thymidylate synthase plays an essential role 
in DNA 

synthesis and is an important target for certain chemotherapeutic 
drugs. 

Thymidylate synthase is an enzyme of about 30 to 35 Kd in most 
species except 

in protozoan and plants where it exists as a Afunctional enzyme 
that includes 

a dihydrofolate reductase domain. 

A cysteine residue is involved in the catalytic mechanism {it 
covalently binds 

the 5 f 6-dihydro-dUMP intermediate). The sequence around the 
active site of 

this enzyme is conserved from phages to vertebrates. 
Description of pattern{s) and/or profile(s) 

Consensus pattern R-x(2)-[UVM]-x(3)-[FWJ-[QN]-x{8,9)-tLV]-x-P- 
C-[HAVM]- x(3)-[GMTHFYVV]-x-[LV] [C is the active site residue] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
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_ast update 

viovember 1997 / Pattern and text revised. 
References 
1] 

3enkovic S.J. 

\r\nu. Rev. Biochem. 49:227-251(1980). 
2] 

=toss P., O'Gara F., Condon S. 

<\ppl. Environ. Microbiol. 56:2156-2163(1990). 


Top6A 




Type II DN A 
topoisomerase 


Accession number: PF01 962 

Definition: Type II DNA topoisomerase 

Author: Enright A t Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: -99 -99 

Trusted cutoffs: -40.40 -40.40 

Noise cutoffs: -1 58.40 -1 58.40 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97238688 

Reference Title: An atypical topoisomerase II from Archaea 
with implications 

Reference Title: for meiotic recombination [see comments] 

Reference Author: Bergerat A, de Massy B, Gadelie D, 

Varoutas PC, Nicolas A, 

Reference Author: Forterre P; 

Reference Location: Nature 1 997;386:41 4-41 7. 

Database Reference: SCOP; 1 d3y; fa; [SCOP-USAJ[CATH- 

PDBSUM] 

Database Reference INTERPRO; IPR00281 5; 
Database Reference PDB; 1 d3y A; 77; 363; 
Database Reference PDB; 1d3y B; 77; 363; 
Comment: Members of this family are the A subunit 
from type II DNA 

Comment: topoisom erases. Type II DNA 
topoisom erases catalyse the relaxation 
Comment: of DNA supercoiling by causing transient 
double strand breaks. 

Comment: The family includes topoisomerase VI 
subunit A from archaebacteria 

Comment: Swiss:Q57815 EC:5.99.1 .3 and SPQ1 1 
from yeast Swiss:P23179. 

Comment: A conserved tyrosine is thought to be 
involved in breaking the 
Comment: double stranded DNA [1 ]. 
Number of members: 9 


Topoisom_bac 


PDOC00333 


Prokaryotic DNA 
topoisomerase ! active 
site 


DNA topoisomerase i (EC 5.99.1.2) [1,2,3,4,E1] ts one of the 
two types of 

enzyme that catalyze the interconversion of topological DNA 
isomers. Type I 

topoisomerases act by catalyzing the transient breakage of DNA, 
one strand at 

a time, and the subsequent rejoining of the strands. When a 
prokaryotic type I 

topoisomerase breaks a DNA backbone bond, it simultaneously 
forms a protein- 

DNA link where the hydroxyl group of a tyrosine residue is 
joined to a 5 1 - 

phosphate on DNA, at one end of the enzyme-severed DNA 
strand. 

Prokaryotic organisms, such as Escherichia coli, have two type 1 
topoisomerase 

isozymes: topoisomerase 1 (gene topA) ana topoisomerase hi 
(gene topB). 

Eukaroytes also contain homologs of prokaryotic topoisomerase 
III. 

There are a number of conserved residues in the region around 
the active site 
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Translin 



Translin family 



iPesohpiion 



tyrosine; we used this region as a signature pattern. 



Description of pattern (s) and/or profiie(s) 

Consensus pattern [EG]-x-L-Y-[DEQST]-x(3,12HLIV]-[ST|-Y-x-R- 
[STHDEGS] [The second Y is the active site tyrosine] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1999 / Pattern and text revised. 

References 

[13 

Sternglanz R. 

Curr. Opin. Cell Biol. 1:533-535(1990). 
[2] 

Sharma A., Mondragon A. 

Curr. Opin. Struct. Biol. 5:39-47(1995). 

[3] 

Bjornsti M.-A. 

Curr. Opin. Struct Biol. 1:99-103(1991). 
[4] 

Roca J. 

Trends Biochem. Sci. 20:156-160(1995). 
[E1] 

http://ellington.pharm.arizona.edu/~bear/top/topo.html 



Accession number: PF00537 

Definition: long chain scorpion toxins 

Author: Bateman A 

Alignment method of seed: Manual 

Source of seed members: Arne Elofsson. 

Gathering cutoffs: 25 25 

Trusted cutoffs: 59.50 59.50 

Noise cutoffs: -3.80 -3.80 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference: SCOP; 2sn3; fa; [SCOP-USA][CATH- 

PDBSUM] 

Database Reference INTERPRO; IPR002061 ; 

Comment: -!- Scorpion toxins bind to sodium channels 

and inhibit the activation 

Comment: mechanisms of the channels, thereby 

blocking neuronal transmission. 
Number of members: 77 



Accession number: PF01997 

Definition: Translin family 

Previous Pfam IDs: DUF130; 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 299.50 299.50 

Noise cutoffs: -72.40 -72.40 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 



Reference Number 
Reference Medline: 
Reference Title: 
encoding a 
Reference Title: 
Reference Author: 
Reference Location: 
Database Reference 
Comment 



[1) 

97165975 

Isolation and characterization of a cDNA 



Translin-like protein, TRAX. 
Aoki K, Ishida R, Kasai M; 
FEBS Lett 1997;401:109-112. 
INTERPRO; IPR002848; 
Members of this family include Translin 
Swiss:Q15631 that interacts 

Comment: with DNA and forms a ring around the DNA. 
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This family also includes 

Comment: Swiss:Q99598, that was found to interact 

with translin with yeast 

Comment: two-hybrid screen [1]. 

Number of members: 1 0 


Transposase 1 9 




Transposase 19 


Members of this family are capable of in vitro and/or in vivo 
insertion of a donor polynucleotide into a target polynucleotide. 
Such biological activity is useful for inserting DNA into host 
genome, for example, for cloning purposes to generate a desired 
vector in vitro. 


TRANSPOSASE IS 
30 


PDOC00801 


Transposases, IS30 
family, signature 


Autonomous mobile genetic elements such as transposon or 
insertion sequences 

{IS) encode an enzyme, called transposase, required for excising 
and inserting 

the mobile element. On the basis of sequence similarities, 
transposases can be 

grouped into various families. One of these families has been 
shown [1 ,2] to 

consist of transposases from the following elements: 

- Is30 from Escherichia coli. 

- Is1 086 from Alcaligenes eutrophus. 

- Is1 161 from Streptococcus salivarius. 

- Is4351 (Tn4551) from Bacteroides fragilis. 

These transposases are proteins of 340 to 380 amino acids. The 
best conserved 

region is located in their C-terminal section and is used as a 

signature 

pattern. 

Description of pattern (s) and/or profile{s) 








Consensus pattern R-G-x(2)-E-N-x-N-G-[LIVM](2)-R-[QE]- 
[LIVMFY](2)-P-K 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1 995 / First entry. 

References 

[1] 

Dong Q., Sadouk A., van der Lelie D., Taghavi S., Ferhat A., 
Nuyten J.M., Borremans B., Mergeay M., Toussaint A. 
J. Bacteriol. 174:8133-8138(1992). 














[2] 

Giffard P.M., Rathsam O, Kwan E., Kwan D.W.L., Bunny K.L., 

Koo S.-P., Jacques N.A. 

J. Gen. Microbiol. 139:913-920(1993). 


Transthyretin 


PDOC00617 


Transthyretin signatures 


Transthyretin (prealbumin) [1] is a thyroid hormone-binding 
protein that seems 

to transport thyroxine (T4) from the bloodstream to the brain. It is 
a protein 

of about 130 amino acids that assembles as a horn otetramer 
and forms an 

internal channel that binds thyroxine. Transthyretin is mainly 
synthesized in 

the brain choroid plexus. In humans, variants of the protein are 
associated 

with distinct forms of amyloidosis. 

The sequence of transthyretin is highly conserved in vertebrates. 
A number of 

uncharacterized proteins also belong to this family: 

- Escherichia coli hypothetical protein yedX. 

- Bacillus subtilis hypothetical protein yunM. 
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- Caenorhabditis elegans hypothetical protein R09H10.3. 

- Caenorhabditis elegans hypothetical protein 2K697.8. 

We selected two regions as signature patterns. The first located 
in the N- 

terminai extremity starts with a lysine known to be involved in 
binding T4. 

The second pattern is located in the C-termmal extremity. 
Description of pattern(s) and/or profile(s) 

Consensus pattern [KH]-[IV]-L-[DN]-x(3)-G-x-P-A-x(2)-[iV]-x-[IV] 
[The K binds thyroxine] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern Y-[TH3-[iV]-[AP]-x{2)-L-S-[PQl-[FYWl-[GS]- 
[FYHGS] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

July 1999 / Patterns and text revised. 

References 

[11 

Schreiber G., Richardson SJ. 

Comp. Biochem. Physiol. 116B: 137-1 60(1 997). 


TRM 




N2,N2~ 

dimethylguanosine tRNA 
methyltransferase 


Accession number: PF02005 

Definition: N2,N2-dimethylguanosine tRNA 

methyltransferase 

Author: Enright A, Ouzounis C ; Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 664.60 664.60 

Noise cutoffs: -259.50 -259.50 

HMM build command line: hmmburfd -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98352211 

Reference Title: The tRNA{guanine-26,N2-N2) 

methyltransferase (Trrnl) from 

Reference Title: the hyperthermophilic archaeon Pyrococcus 
furiosus: 

Reference Title: cloning, sequencing of the gene and its 
expression in 

Reference Title: Escherichia coli. 

Reference Author: Constantinesco F, Benachenhou N, 

Motorin Y, Grosjean H; 

Reference Location: Nucleic Acids Res 1 998;26; 3753-3761 . 
Reference Number: [2] 
Reference Medline: 87260951 

Reference Title: Amino-terminal extension generated from 
an upstream AUG 

Reference Title: codon is not required for mitochondrial 
import of yeast 

Reference Title: N2,N2-dimethylguanosine- specific tRNA 
methyltransferase. 

Reference Author: Ellis SR, Hopper AK, Martin NC; 
Reference Location: Proc Natl Acad Sci U S A 1987;84:5172- 
5176. 

Database Reference INTERPRO; 1PR002905; 

Database reference: PFAMB; PB041 661 ; 

Comment: This enzyme EC:2.1 .1 .32 used S-AdoMet to 

methyiate tRNA. 

Comment: The TRM1 gene of Saccharomyces 
cerevisiae is necessary for 

Comment: the N2,N2-dimethylguanosine modification 
of both mitochondrial 

Comment: and cytoplasmic tRNAs [1]. The enzyme is 
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found in both 

Comment: eukaryotes and archaebacteria [2] 
Number of members: 1 0 


tRNA bind 




Putative tRNA binding 
domain 


Accession number; PF01 588 

Definition: Putative tRNA binding domain 

Author: Bashton M } Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_482 (release 4. 1 } 

Gathering cutoffs: 20 20 

Trusted cutoffs: 22.30 22.30 

Noise cutoffs: 1 8.20 1 8.20 

HMM build command line: hmmbuifd -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97306356 

Reference Title: Human tyrosyl-tRNA synthetase shares 
amino acid sequence 

Reference Title: homology with a putative cytokine. 
Reference Author: Kleeman TA, Wei D, Simpson KL, First 
EA; 

Reference Location: J Biol Chem 1 997;272: 1 4420-1 4425. 
Reference Number: [2] 
Reference Medline: 97050848 

Reference Title: The yeast protein Arc1 p binds to tRNA and 
functions as a 

Reference Title: cofactor for the methionyl-and glutamyl- 
tRNA synthetases. 

Reference Author: Simos G } Segref A, Fasiolo F, Heltmuth K, 
Shevchenko A, 

Reference Author: Mann M, Hurt EC; 
Reference Location: EM BO J 1996;15:5437-5448. 
Database Reference: SCOP; 1pys; fa; [SCOP-US A] [CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR002547; 
Database Reference PDB; 1 b70 B; 1 53; 247; 
Database Reference PDB; 1 b7y B; 1 53; 247; 
Database Reference PDB; 1 eiy B; 1 53; 247; 
Database Reference PDB; 1 pys B; 1 53; 247; 
Database reference: PFAMB; PB01 001 5; 
Comment: This domain is found in prokaryotic 
methionyl-tRNA synthetases, 

Comment: prokaryotic phenylalanyl tRNA synthetases 
the yeast GU4 nucleic-binding 

Comment: protein (G4p1 or p42, ARC1) [2], human 
tyrosyl-tRNA synthetase [1], 

Comment: and endothelial-monocyte activating 
polypeptide II. 

Comment: G4p1 binds specifically to tRNA form a 
complex with methionyl-tRNA 

Comment: synthetases [23 . In human tyrosyl-tRNA 
synthetase this domain may direct 

Comment: tRNA to the active site of the enzyme [2]. 
This domain may perform a 

Comment: common function in tRNA aminoacylation 
[1]. 

Number of members: 46 


tRNA-synt_2d 


PDOC00363 


Ami noacyl -transfer RNA 
synthetases class-ll 
signatures 


Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of 
enzymes which 

activate amino acids and transfer them to specific tRNA 
molecules as the first 

step in protein biosynthesis. In prokaryotic organisms there are 
at least 

twenty different types of aminoacyl-tRNA synthetases, one for 
each different 

amino acid. In eukaryotes there are generally two aminoacyl- 
tRNA synthetases 

for each different amino acid: one cytosolic form and a 
mitochondrial form. 

While all these enzymes have a common function, they are 
widely diverse in 

terms of subunit size and of quaternary structure. 
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The synthetases specific for alanine, asparagine, aspartic acid, 
glycine, 

histidine, lysine, phenylalanine, proline, serine, and threonine are 
referred 

to as class-ll synthetases [2 to 6] and probably have a common 
folding pattern 

in their catalytic domain for the binding of ATP and amino acid 
which is 

different to the Rossmann fold observed for the class I 
synthetases [7]. 

Class-ll tRNA synthetases do not share a high degree of 
similarity, however at 

least three conserved regions are present [2,5,8]- We have 

derived signature 

patterns from two of these regions. 

Description of pattern (s) and/or profile(s} 

Consensus pattern [FYH]-R-x-[DE]-x(4 5 12)-[RH]-x(3}-F^x(3)-tDE] 
Sequences known to belong to this class detected by the pattern 
the majority of class-H tRNA synthetases with the exception of 
those specific for alanine, glycine as well as bacterial histidine. 
Other sequence(s) detected in SWISS-PROT 43. 

Consensus pattern [GSTALVF]-{DENQHRKPK[GSTA]-[LIVMF]- 

[DE]-R-[LIVMF]-x- [LIVMSTAG]-[LIVMFY] 

Sequences known to belong to this class detected by the pattern 

the majority of class-H tRNA synthetases with the exception of 

those specific for serine and proline. 

Other sequence(s) detected in SWISS-PROT 161 . 

Expert(s) to contact by email 

Cusack S. cusack@embi-grenoble.fr 

Last update 

Juiy 1998 / Text revised. 
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trypsin 


PDOC00124 


Serine proteases, trypsin 
family, active sites 


The catalytic activity of the serine proteases from the trypsin 
family is 

provided by a charge relay system involving an aspartic acid 
residue hydrogen- 
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bonded to a histidine, which itself is hydrogen-bonded to a 
serine. The 

sequences in the vicinity of the active site serine and histidine 
residues are 

well conserved in this family of proteases [1]. A partial list of 



known to belong to the trypsin family is shown below. 



- Acrosin. 

- Blood coagulation factors VII, IX, X, XI and XI i, thrombin, 
plasminogen, 

and protein C. 

- Cathepsin G. 

- Chymotrypsins. 

- Complement components C1r, C1s, C2, and complement 
factors B, D and I. 

- Complement-activating component of RA-reactive factor. 

- Cytotoxic ceil proteases (granzymes A to H). 

- Duodenase I. 

- Elastases 1, 2, 3A, 3B (protease E), leukocyte (medullasin). 

- Enterokinase (EC 3.4.21.9) (enteropeptidase) . 

- Hepatocyte growth factor activator. 

- Hepsin. 

- Glandular (tissue) kallikreins (including EGF-binding protein 
types A, B, 

and C, NGF-gamma chain, gamma-renin, prostate specific 
antigen (PSA) and 
tonin). 

- Plasma kallikrein. 

- Mast cell proteases (MCP) 1 (chymase) to 8. 

- Myeloblasts (proteinase 3) {Wegener's autoantigen). 

- Plasminogen activators (urokinase-type, and tissue-type). 
-Trypsins I, II, lll ) and IV. 

- Tryptases. 

; - Snake venom proteases such as ancrod, batroxobin, 
cerastobin, flavoxobin, 
and protein C activator. 

- Coilagenase from common cattle grub and coliagenolytic 
protease from 

Atlantic sand fiddler crab. 

- Apotipoprotein(a). 

- Blood fluke cercarial protease. 

- Drosophila trypsin like proteases: alpha, easter, snake-locus. 

- Drosophila protease stubble (gene sb). 

- Major mite fecal allergen Der p ill. 

All the above proteins belong to family S1 in the classification of 
peptidases 

[2,E1] and originate from eukaryotic species. It should be 
noted that 

bacterial proteases that belong to family S2A are similar 
enough in the 

regions of the active site residues that they can be picked up by 
the same 

patterns. These proteases are listed below. 

- Achromobacter lyticus protease I. 

- Lysobacter aipha-tytic protease. 

- Streptogrisin A and B (Streptomyces proteases A and B). 

- Streptomyces griseus glutamyl endopeptidase II. 

- Streptomyces fradiae proteases 1 and 2. 



Description of pattern(s) and/or profile(s) 

Consensus pattern [UVM]-[ST]~A-[STAG]-H-C [H is the active site 
residue] 

Sequences known to belong to this class detected by the pattern 
ALL, except for complement components C1 r and C1 s, pig 
plasminogen, bovine protein C, rodent urokinase, ancrod, gyroxin 
and two insect trypsins. 

Other sequence(s) detected in SWISS-PROT 14. 
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Consensus pattern [DNSTAGC]-[GSTAPIMVQH]-x(2)-G-[DE]-S- 
G-[GS]-[SAPHV]- [LI VM FYWH]-[Li VMFYSTANQH] [S is the active 
site residue] 

Sequences known to belong to this class detected by the pattern 
ALL, except for 18 different proteases which have lost the first 
conserved glycine. 

Other sequence(s) detected in SWISS-PROT H. influenzae 
protease HAP which belongs to family S6 and 3 other proteins. 

Note if a protein includes both the serine and the histidine active 
site signatures, the probability of it being a trypsin family serine 
protease is 1 00% 
Last update 

November 1 997 / Text revised. 

References 

[1] 

Brenner S. 

Nature 334:528-530(1988). 
I2] 

Rawlings N.D., Barrett A J. 
Meth. Enzymol. 244:19-61(1994). 

http://www.expasy.ch/cgi-bin/lists7peptidas.txt 


TYA 




TYA transposon protein 


Accession number: PF01021 

Definition: TYA transposon protein 

Author: Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pfam-B_90 {release 3.0) 

Gathering cutoffs: 15 15 

Trusted cutoffs: 1 8.00 1 8.00 

Noise cutoffs: 1 3.70 1 3.70 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM ■ 

Reference Number: [1} 

Reference Medline: 97404699 

Reference Title: Cryo-electron microscopy structure of yeast 
Ty 

Reference Title: retrotransposon virus-like particles. 
Reference Author: Palmer KJ, Tichelaar W f Myers N, Burns 
NR, Butcher SJ, 

Reference Author: Kingsman AJ, Fuller SD, Saibil HR; 
Reference Location: J Virol 1997;71 :6863-6868. 
Database Reference INTERPRO; IPR001042; 
Comment: Ty are yeast transposons. A 5.7kb 
transcript codes 

Comment: for p3 a fusion protein of TYA and TYB. 
The TYA 

Comment: protein is analogous to the gag protein of 
retroviruses. 

Comment: TYA a is cleaved to form 46kd protein which 
can form 

Comment: mature virion like particles [1]. 
Number of members: 62 


tyrosinase 


PDOC00398 


Tyrosinase signatures 


Tyrosinase {EC 1.14.18.1) [1J is a copper monooxygenases that 
catalyzes the 

hydroxylation of monophenols and the oxidation of o-diphenois 
to o-quinols. 

This enzyme, found in prokaryotes as well as in eukaryotes, is 
involved in the 

formation of pigments such as melanins and other polyphenols 
compounds. 

Tyrosinase binds two copper ions (CuA and CuB). Each of the 
two copper ion has 

been shown [2] to be bound by three conserved histidines 
residues. The regions 

around these copper-binding ligands are well conserved and aiso 
shared by some 

hemocyanins, which are copper-containing oxygen carriers from 
the hemolymph of 
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Description 

many molluscs and arthropods [3,4]. 

At least two proteins related to tyrosinase are known to exist in 
mammals: 

_ jRP-1 (TYRP1) [5], which is responsible for the conversion of 
5,6-dihydro- 

xyindole-2-carboxylic acid (DHICA) to indo!e-5,6-quinone~2- 
carboxylic acid. 

- TRP-2 (TYRP2) [6], which is the melanogenic enzyme 
DOPAchrome tautomerase 

(EC 5.3.3.12) that catalyzes the conversion of DOPAchrome to 
DHICA. TRP-2 

differs from tyrosinases and TRP-1 in that it binds two zinc ions 
instead 
of copper [7]. 

Other proteins that belong to this family are: 

- Plants polyphenol oxidases (PPO) (EC 1.10.3.1) which catalyze 
the oxidation 

of mono- and o-diphenols to o-diquinones [8], 

- Caenorhabditis elegans hypothetical protein C02C2.1 . 

We have derived two signature patterns for tyrosinase and 
related proteins. 

The first one contains two of the histidines that bind CuA, and is 
located in 

the N-terminal section of tyrosinase. The second pattern contains 
a histidine 

that binds CuB, that pattern is located in the central section of the 
enzyme. 



Description of pattern(s) and/or profile(s) 

Consensus pattern H-x(4,5)-F4LIVMr^Pl-x-[FWl-H-R-x(2)-[LVM]- 
x(3)-E [The two H's are copper ligands] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern D-P-x-F-[LlVMFYW]-x(2)-H-x(3)-D [H is a 
copper iigand] 

Sequences known to belong to this class detected by the pattern 
ALL the tyrosinases as well as all the hemocyanins. 
Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1999 / Patterns and text revised. 

References 

[13 

Lerch K. 

Prog. Clin. Biol. Res. 256:85-98(1988). 
[23 

Jackman M.P., Hajnal A., Lerch K. 
Biochem. J. 274:707-713(1991). 

[3] 

Linzen B. 

Naturwissenschaften 76:206-211 (1989). 
[4] 

Lang W.H., van Holde K.E. 

Proc. Natl. Acad. Sci. U.S.A. 88:244-248(1991). 



Kobayashi T. } Urabe K., Winder A., Jimenez-Cervantes C., 
Imokawa G., Brewington T., Solano F., Garcia-Borron J.C., 
Hearing V.J. 

EMBO J. 13:5818-5825(1994). 
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Jackson IJ., Chambers D.M., Tsukamoto K., Copeland N.G. f 
Gilbert D.J., Jenkins N.A., Hearing V. 
EMBO J. 1 1 :527-535(1992). 

17] 

Solano F., Martinez-Liarte J.H., Jimenez-Cervantes C, Garcia- 
Borron J.C., Lozano J. A. 

Biochem. Biophys. Res. Commun. 204:1243-1250(1994). 
[8] 

Cary J.W., Lax A.R., Fiurkey W.H. 
Plant Mol. Biol. 20:245-253(1992). 


UbiA 


PDOC00727 


UbiA prenyltransferase 
family signature 


The following prenyltransferases are evolutionary related [1 ,2]; 

- Bacterial 4-hydroxybenzoate octaprenyitransferase (gene ubiA}. 

- Yeast mitochondrial parahydroxybenzoate- 
poiyprenyltransferase (gene 

COQ2). 

- Protoheme IX farnesyltransferase (heme O synthase) from 
yeast and mammals 

(gene COX10) and from bacteria (genes cyoE or ctaB). 

These proteins probably contain seven transmembrane 
segments. The best 

conserved region is located in a loop between the second and 
third of these 

segments and we used it as a signature pattern. 

Description of pattern(s) and/or profile(s) 

Consensus pattern N-x(3)-[DEH]-x(2)-[LIMF]-D-x(2)-[VM}-x-R- 
[ST]-x(2)-R-x(4)- G 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1999 / Pattern and text revised. 

References 

[1] 

Melzer M., Heide L. 

Biochim. Biophys. Acta 1212:93-102(1994). 
[2] 

Mogi T., Saiki K., Anraku Y. 
Mol. Microbiol. 14:391-398(1994}. 


Ubie_methyltran 


PDOC0091 1 


ubiE/COQ5 

methyitransferase family 
signatures 


The following methyltransferases have been shown [1} to 

share regions of 

similarities: 

~ Escherichia coii ubiE, which is involved in both ubiquinone and 
menaquinone 

biosynthesis and which catalyzes the S-adenosyimethionine 
dependent 

methyiation of 2-poly prenyl -6-methoxy-1 ,4-benzoquinol into 2- 
polyprenyl-3- 

methyl-6-methoxy-1 ,4-benzoquinol and of demethylmenaquinol 
into menaquinol. 

- Yeast COQ5, a ubiquinone biosynthesis methlytransferase. 

- Bacilius subtilis spore germination protein C2 (gene: gercB or 
gerC2), a 

probable menaquinone biosynthesis methlytransferase. 
~ Lactococcus lactis gerC2 homolog. 

- Caenorhabditis elegans hypothetical protein ZK652.9. 

- Leishmania donovani amastigote-specific protein A41 . 

These are hydrophiiic proteins of about 30 Kd (except for ZK652.9 
which is 65 

Kd). They can be picked up in the database by the following 
patterns. 
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Description of pattern(s) and/or profile(s) 

Consensus pattern Y-D-x-M-N-x(2)-[LIVM]-S-x(3)-H-x(2)-W 
Sequences known to belong to this class detected by the pattern 
\LL. 

Dther sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern R-V-ELIVM]-K-[PV]-[GM]-G-x-[LIVMF3-x(2)- 
UVM]-E-x-S 

Sequences known to belong to this class detected by the pattern 
*LL 

Dther sequence(s) detected in SWISS-PROT NONE. 
_ast update 

December 1999 / Pattern and text revised. 

References 

[1] 

Lee P.T., Hsu A.Y., Ha H.T., Clarke C.F. 
J. Bacterid. 179:1748-1754(1997). 


ubiquitin 


PDOC00271 


Ubiquitin domain 
signature and profile 


Ubiquitin [1 ,2,3] is a protein of seventy six amino acid residues, 
: ound in 

ail eukaryotic cells and whose sequence is extremely well 
conserved from 

protozoan to vertebrates. It plays a key role in a variety of 
cellular 

processes, such as ATP-dependent selective degradation of 
cellular proteins, 

maintenance of chromatin structure, regulation of gene 

expression, stress 

response and ribosome biogenesis. 

In most species, there are many genes coding for ubiquitin. 
However they can 

be ciassified into two classes. The first class produces 
polyubiquitin 

molecules consisting of exact head to tail repeats of ubiquitin. The 
number of 

repeats is variable (up to twelve in a Xenopus gene), in the 
majority of 

polyubiquitin precursors, there is a final amino-actd after the last 
repeat. 

The second class of genes produces precursor proteins 
consisting of a single 

copy of ubiquitin fused to a C-terminai extension protein (CEP). 
There are two 

types of CEP proteins and both seem to be ribosomal proteins. 

Ubiquitin is a globular protein, the last four C-terminal residues 
(Leu-Arg- 

Gly-Giy) extending from the compact structure to form a 'tail', 
important for 

its function. The latter is mediated by the covalent conjugation of 
ubiquitin 

to target proteins, by an isopeptide linkage between the C- 
terminal glycine 

and the epsiion amino group of lysine residues in the target 
proteins. 

There are a number of proteins which are evolutionary related to 
ubiquitin: 

- Ubiquitin-like proteins from baculoviruses as well as in some 
strains of 

bovine viral diarrhea viruses (BVDV). These proteins are highly 
similar to 
xneir eut\aryuiK»> ouui uei \ja\ io. 

-Mammalian protein GDX [4]. GDX is composed of two 
domains, a N-terminal 

ubiquitin-like domain of 74 residues and a C-terminal domain of 
83 residues 

with some similarity with the thyroglobulin hormonogenic site. 

- Mammalian protein FAU F51. FAU is a fusion protein which 
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;onsist of a 

N-terminal ubiquitin-iike protein of 74 residues fused to 
ibosomal protein 
S30. 

- Mouse protein NEDD-8 [6], a ubiquitin-tike protein of 81 
esidues. 

- Human protein BAT3, a iarge fusion protein of 11 32 residues 
hat contains a 

N-terminal ubiquitin-iike domain. 

- Caenorhabditis elegans protein ubi-1 [7]. Ubl-1 is a fusion 
jrotein which 

consist of a N-terminal ubiquitin-iike protein of 70 residues 
used to 
ribosomal protein S27A. 

- Yeast DNA repair protein RAD23 [8]. RAD23 contains a N- 
erminai domain that 

seems to be distantly, yet significantly, related to ubiquitin. 

- Mammalian RAD23-related proteins RAD23A and RAD23B. 
-Mammalian BCL-2 binding athanogene-1 (BAG-1). BAG-1 is 
a protein of 274 

residues that contains a central ubiquitin-iike domain. 

- Human spliceosome associated protein 114 (SAP 114 or 
SF3A120). 

- Yeast protein DSK2, a protein involved in spindle pole body 
duplication and 

which contains a N-terminal ubiquitin-iike domain. 

- Human protein CKAP1/TFCB, Schizosaccharomyces pombe 
protein aip11 and 

Caenorhabditis elegans hypothetical protein F53F4.3. These 
proteins contain 

a N-terminal ubiquitin domain and a C-terminal CAP-Gly 
domain (see 

<PDOC00660>). 

- Schizosaccharomyces pombe hypothetical protein 
SpAC26A3.16. This protein 

contains a N-termina! ubiquitin domain. 

- Yeast protein SMT3. 

- Human ubiquitin-iike proteins SMT3A and SMT3B. 

- Human ubiquitin-iike protein SMT3C (also known as PIC1 ; Ubl1 , 
Sumo-1; Gmp-1 

or Sentnn). This protein is involved in targeting ranGAPI to the 
nuclear 

pore complex protein ranBP2. 

- SMT3-hke proteins in plants and Caenorhabditis elegans. 

To identify ubiquitin and related proteins we have developed a 
pattern based 

on conserved positions m the central section of the sequence. A 
profile was 

also developed that spans the complete length of the ubiquitin 
domain. 

Description of pattern(s) and/or profiie(s) 

Consensus pattern K-x(2)-[LIVM]-x-[DESAK]-x(3)-[UVM}-EPA]- 
x(3)-Q-x-[L!VM]- [LIVMC]-[LlVMFY]-x-G-x(4)-[DE] 
Sequences known to belong to this class detected by the pattern 
ALL, except for the RAD23 and SMT3 subfamilies, BAG-1 and 
SAP 114. 

Other sequence(s) detected in SWISS-PROT NONE. 

Sequences known to belong to this class detected by the profile 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note this documentation entry is linked to both a signature pattern 
and a profile. As the profile is much more sensitive than the 
pattern, you should use it if you have access to the necessary 
software tools to do so. 
Last update 

July 1998 / Text revised. 
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3io/Technology 8:209-215(1990). References 
1] 

entsch S., Seufert W., Hauser H.-P. 
3iochim. Biophys. Acta 1089:127-139(1991). 

2] 

Aon\a B.P., Ecker D.J., Croke S.T. 
3] 

r iniey D., Varshavsky A. 

rrends Biochem. Sci. 10:343-347(1985). 

4] 

=iiippi M., Tribioh C, Toniolo D. 
3enomics 7:453-457(1990). 

5] 

Dlvera J. ; Wool I.G. 

J. Biol. Chem. 268:17967-17974(1993). 
[6] 

Kumar S., Yoshida Y., Noda M. 

Biochem. Biophys. Res. Commun. 195:393-399(1993). 
[7] 

Jones D., Candido E.P. 

J. Biol. Chem. 268:19545-19551(1993). 

[8] 

Melnick L, Sherman F. 

J. Mol. Bio!. 233:372-388(1993). 


UPF0004 


PDOC00984 


Uncharacterized protein 
family UPF0004 
signature 


The following uncharacterized proteins have been shown [1] to 

share regions of 

similarities: 

- Escherichia coli hypothetical protein yliG. 
-Escherichia coli hypothetical protein yleA and HS001 9, the 
corresponding 

Haemophilus influenzae protein. 

- Bacillus subtilis hypothetical protein yqeV. 

- Helicobacter pylori hypothetical protein HP0269. 

- Helicobacter pylori hypothetical protein HP0285. 

- Mycoplasma iowae hypothetical protein in 16S RNA 5'region. 

- Mycobacterium tuberculosis hypothetical protein Rv2733c. 

- Rickettsia prowazekii hypothetical protein RP416. 

- Rickettsia prowazekii hypothetical protein RP808. 

- Synechocystis strain PCC 6803 hypothetical protein slr0082. 

- Synechocystis strain PCC 6803 hypothetical protein sl!0996. 

- Methanococcus jannaschii hypothetical protein MJ0865. 

- Methanococcus jannaschii hypothetical protein MJ0867. 

- Caenorhabditis etegans hypothetical protein F25B5.5. 

The size of these proteins range from 47 to 61 Kd. They contain 
six conserved 

cysteines, three of which are clustered in a region that can be 
used as a 
signature pattern. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [LIVM]-x-[LIVMT]-x(2)-G-C-x(3)-C-[STAN]- 
[FY]-C-x-[LIVMT]- x(4)-G 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 2. 
Last update 

December 1999 / Pattern and text revised. 

References 

[1] 

Bairoch A. 
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Accession number: PF01554 

Definition: Uncharacterized membrane protein family 
JPF0013 

\uthor: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 63 (release 4.0) 

gathering cutoffs: -26 26 

rrusted cutoffs: -16.10-16.10 

SJoise cutoffs: -36.70 -36.70 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference: URL; http://www.expasy.ch/cgi- 

)in/iists?upfiist.txt; 

Database Reference INTERPRO; IPR002528; 

Database reference: PFAMB; PB041 1 03; 

Comment: These proteins are integral membrane 

proteins of unknown 

Comment: function. 

Number of members: 47 


UPF0019 


PDOC00949 


Uncharacterized protein 
family UPF0019 
signature 


The following uncharacterized proteins have been shown [1,2] 

to be highly 

similar: 

- Yeast protein SNZ1 , which may be involved in growth arrest 
and cellular 

response to nutrient limitation. 

- Yeast chromosome VI hypothetical protein YFL059w. 

- Yeast chromosome XIV hypothetical protein YNL333w. 

- Fission yeast hypothetical protein SpAC29B12.04. 

- Hevea brasiltensis ethylene-inducible protein NEVER. 

- Stellaria longipes hypothetical protein H47. 

- Bacillus subtil is hypothetical protein yaaD. 

- Haemophilus influenzae hypothetical protein HI1647. 

- Mycobacterium leprae hypothetical protein MICL581.12C. 

- Mycobacterium tuberculosis hypothetical protein MtCY1A10.27. 

- Archaeoglobus fulgidus hypothetical protein AF050& 

- Methanococcus jannaschii hypothetical protein MJ0677. 

- Methanococcus vannielii hypothetical protein in tRNA/5S rRNA 
gene cluster. 

- Methanobacterium thermoautotrophicum hypothetical protein 
Mth666. 

These are hydrophilic proteins of about 32 Kd. They can be 
picked up in the 

database by the following pattern. 
Description of pattern(s) and/or profile(s) 








Consensus pattern L-P-V-[VT1-[NQL]-F-[AT|-A-G-Q-[UV1-A-T-P- 
A-D-A-A-[LM] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

July 1998 / Pattern and text revised. 

References 

[11 

Sivasubramaniam S., Vanniasingham V.M., Tan C.T., Chua N.H. 
Plant Mol. Biol. 29:173-178(1995). 














[2] 

Braun E.L., Fuge E.K., Padilla P.A., Werner-Wash burne M. 
J. Bacteriol. 178:6865-6872(1996). 


UPF0047 


PDOC01018 


Uncharacterized protein 
family UPF0047 
signature 


The following uncharacterized proteins have been shown [1] 

to be highly 

similar: 

- Bacillus subtil is hypothetical protein yugU. 
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- Escherichia coli hypothetical protein yjbQ. 

- Mycobacterium tuberculosis hypothetical protein IVHCY9C4.12. 

- Synechocystis strain PCC 6803 hypothetical protein sIH 880. 

- Archaeoglobus fuigidus hypothetical protein AF2050. 

- Methanococcus jannaschii hypothetical protein M J1 081 . 

- Methanobacterium thermoautotrophicum hypothetical protein 
VTTH771. 

- Fission yeast hypothetical protein SpAC4A8.02c. 

These are small proteins of 14 to 16 Kd. They can be picked up in 
he database 

oy the following pattern. This pattern is located in the C-terminal 
Dart of 

hese proteins. 

Description of pattern (s) and/or profile(s) 

Consensus pattern S-X(2)-[LIV]-x-ELIV3-x(2)-G-x(4)-G-T-W-Q-x- 
[LIV] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Last update 

July 1998/ First entry. 

References 

[13 

Bairoch A. 

Unpublished observations (1998). 


UPF0052 




Uncharacterised protein 
family UPF0052 


Accession number: PF01933 

Definition: Uncharacterised protein family UPF0052 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 263.90 263.90 

Noise cutoffs: -134.40 -134.40 

HMM build command iine: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference INTERPRO; IPR002882; 

Number of members: 1 2 


UPF0057 


PDOC01013 


Uncharacterized protein 
family UPF0057 
signature 


The following uncharacterized proteins have been shown [1] to 

be evolutionary 

related: 

- Barley low-temperature induced protein blt101. 

- Lophorium eiongatum salt-sress induced protein ESI3. 

- Yeast hypothetical proteins YDL123w, YDR276c, YDR525Bw 
and YJL151C. 

- Caenorhabditis elegans hypothetical proteins F47B7.1, 
T23F2.3, T23F2.4, 

T23F2.5 and ZK632.10, 

- Escherichia coli hypothetical protein yqaE. 

- Synechocystis strain PCC 6803 hypothetical protein ssr1 169. 

These are small proteins of from 52 to 1 40 amino-acid resiudes 
that contains 

two transmembrane domains As a signature pattern we 
selected a region that 

corresponds to the end of the first transmembrane helix. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [LlV]-x-[STA]-[LIVF](3)-P-P-[LIVA]-[GA]-[lVl- 
x(4)TGKN] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 
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July 1998/ First entry. 

References 

[1] 

Rudd K.E. } Humphery-Smith I., Wasinger V.C., Bairoch A. 
Electrophoresis 19:536-544(1998). 


UPF0066 


PDOC01022 


Uncharacterized protein 
family UPF0066 
signature 


The following uncharacterized proteins have been shown [1] to 

be evolutionary 

related: 

- Escherichia coli hypothetical protein yaeB and HI051 0, the 
corresponding 

Haemophilus influenzae protein. 

- Agrobacterium tumefaciens Ti plasmid protein virR. 

- Pseudomonas aeruginosa protein rcsF. 

- Archaeoglobus fulgidus hypothetical protein AF0241 . 

- Archaeoglobus fulgidus hypothetical protein AF0433. 

- Methanococcus jannaschii hypothetical protein MJ1583. 

- Methanobacterium thermoautotrophicum hypothetical protein 
MTH1 797. 

These are proteins of from 120 to 240 amino-acid resiudes {with 
the exception 

of AF0433 which is 366 residues long). As a signature pattern 
we selected a 

conserved region in the central part of these proteins. 

Description of pattern(s) and/or profile(s) 

Consensus pattern G-[Av>F-[STA]-x-R-[SA]-x(2)-R-P-N 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Last update 

July 1999/ First entry. 

References 

[1] 

Bairoch A. 

Unpublished observations (1998). 


UPF0076 


PDOC00838 


Uncharacterized protein 
family UPF0076 
signature 


The following uncharacterized proteins have been shown [1] to 

share regions of 

similarities: 

- Goat antigen UK1 1 4, a human homolog and the rat 
corresponding protein which 

is known as perchloric acid soluble protein (PSP1). PSP1 [2] 
may inhibit an 
initiation stage of cell-free protein synthesis. 

- Mouse heat-responsive protein HRSP12. 

- Yeast chromosome V hypothetical protein YER057C. 

- Yeast chromosome fX hypothetical protein YlL051c. 

- Caenorhabditis elegans hypothetical protein C23G10.2. 

- Escherichia coli hypothetical protein ycdK. 

- Escherichia coli hypothetical protein yhaR. 

- Escherichia coli hypothetical protein yjgF and HI0719, the 
corresponding 

Haemophilus influenzae protein. 

- Escherichia coli hypothetical protein yoaB. 

- Bacillus subtil is hypothetical protein yabJ. 

- Haemophilus influenzae hypothetical protein H1 1627. 

- Helicobacter pylori hypothetical protein HP0944. 

- Lactococcus lactis aldR. 

- Myxococcus xanthus dfrA. 

- Synechocystis strain PCC 6803 hypothetical protein slr0709. 

- Rhizobium strain NGR234 symbiotic plasmid hypothetical 
protein y4sK. 

- Pyrococcus horikoshii hypothetical protein PH0854. 

These are small proteins of around 15 Kd whose sequence is 
highly conserved. 

As a signature pattern, we selected a well conserved region 
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ocated in the C- 

erminal part of these proteins. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [PA]-[ASTPV]-R-[SACVF3-x-[LlVMFY]-x(2)- 
[GSAKR]-x-[LMVA]- x(5,8HLIVM]-E-[Ml] 

Sequences known to belong to this class detected by the pattern 
fKLL. 

Other sequence(s) detected in SW1SS-PROT 4. 
Last update 

July 1999 / Pattern and text revised. 

References 

EH 

Bairoch A. 

Unpublished observations (1995). 
[2] 

Oka T., Tsuji H., Noda C., Sakai K., Hong Y.-M., Suzuki I., Munoz 
S., Natori Y. 

J. Biol. Chem. 270:30060-30067(1995). 


UPF0099 




Domain of unknown 
function UPF0099 


Accession number: PF01 981 

Definition: Domain of unknown function UPF0099 

Previous Pfam IDs: DUF119; 

Author: En right A 5 Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 32.80 1 32.80 

Noise cutoffs: -35.70 -35.70 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference INTERPRO; IPR002833; 

Comment: This domain has no known function. 

Number of members: 1 0 


UQ_con 


PDOC00163 


Ubiquitin-conjugating 
enzymes active site 


Ubiquitin-conjugating enzymes (EC 6.3.2.19) (UBC or E2 
enzymes) [1,2,3] 

catalyze the covalent attachment of ubiquitin to target proteins. An 
activated 

ubiquitin moiety is transferred from an ubiquitin-activating enzyme 
(E1)to E2 

which later ligates ubiquitin directly to substrate proteins with or 
without 

the assistance of 'N-end' recognizing proteins (E3). 

In most species there are many forms of UBC (at least 9 in 
yeast) which are 

implicated in diverse cellular functions. 

A cysteine residue is required for ubiquitin-thioiester formation. 
There is a 

single conserved cysteine in UBC's and the region around that 
residue is 

conserved in the sequence of known UBC isozymes. We have 
used that region as 
a signature pattern. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [FYWLSPl-H-tPq-CNHJ-ILlVl-xf^^-G-x-ILlV]- 

C-[LIV]-x- [LIV] [C is the active site residue] 

Sequences known to belong to this class detected by the pattern 

ALL, except for yeast UBC6 (DOA2). 

Other sequence(s) detected in SWISS-PROT NONE. 

Expert(s) to contact by email 

Jentsch S. jentsch@zmbh.uni-heidelberg.de 

Last update 
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Urease signatures 



July 1998 / Text revised. 

References 

[1] 

Jentsch S., Seufert W., Sommer T., Reins H.-A. 
Trends Biochem. Sci. 15:195-198(1990). 

[2] 

Jentsch S,, Seufert W., Hauser H.-P. 
Biochim. Biophys. Acta 1089:127-139(1991). 

[3] 

Hershko A. 

Trends Biochem. Sci. 16:265-268(1991). 



Urease (EC 3.5.1 .5) is a nickel-binding enzyme that catalyzes 
the hydrolysis 

of urea to carbon dioxide and ammonia [1]. Historically, it was 
the first 

enzyme to be crystallized (in 1926). It is mainly found in plant 
seeds, 

microorganisms and invertebrates. In plants, urease is a hexamer 
of identical 

chains. In bacteria [2], it consists of either two or three different 
subunits 

(alpha, beta and gamma). 

Urease binds two nicke! ions per subunit; four histidine, an 
aspartate and a 

carbamated-lysine serve as Hgands to these metals; an additional 
histidine is 

involved in the catalytic mechanism [3]. 

As signatures for this enzyme, we selected a region that 
contains two 

histidine that bind one of the nickel ions and the region of the 

active site 

histidine. 



Description of pattern(s) and/or profile(s) 

Consensus pattern T-[AY]-[GA]-[GAT]-[LIVM]-D-x-H-[LIVM]-H- 
x(3)-P [The two H's bind nickel] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern [LIVM](2)-[CTI-H-[HN]-L-x(3)-[LIVM]-x(2)-D- 
[LIVM]-x-F-A [H is the active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Patterns and text revised. 

References 

[1] 

Takishima K., SugaT., Mamiya G. 
Eur. J. Biochem. 175:151-165(1988). 

[2] 

Mobley H.L.T., Husinger R.P. 
Microbiol. Rev. 53:85-108(1989). 

[3] 

Jabri E., Carr M.B., Hausinger R.P., Karplus P.A. 
Science 268:998-1004(1995). 



UreD urease accessory 
protein 



Accession number: PF01 774 

Definition: UreD urease accessory protein 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-BJ 1 09 (release 4.2) 

Gathering cutoffs: 25 25 . 
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Frusted cutoffs: 1 86.00 1 86.00 

vJoise cutoffs: -42.60 -42.60 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97352660 

Reference Title: Characterization of UreG, identification of a 
Reference Title: UreD-UreF-UreG complex, and evidence 
suggesting that a 

Reference Title: nucleotide-binding site in UreG is required 
For in vivo 

Reference Title: metallocenter assembly of Klebsiella 
aerogenes urease. 

Reference Author: Moncrief MB, Hausinger RP; 
Reference Location: J Bacterid 1 997; 1 79:4081 -4086. 
Reference Number: [2] 
Reference Medline: 96146510 

Reference Title: Organization of Ureaplasma urealyticum 
urease gene cluster 

Reference Title: and expression in a suppressor strain of 
Escherichia coii. 

Reference Author: Neyroiles O s Ferris S, Behbahani N, 
Montagnier L, Bianchard 
Reference Author: A; 

Reference Location: J Bacterioi 1 996;1 78:647-655. 
Reference Number: [3] 
Reference Medline: 9421 1 837 

Reference Title: In vitro activation of urease apoprotein and 
role of UreD 

Reference Title: as a chaperone required for nickel 
metallocenter assembly. 

Reference Author: Park IS, Carr MB, Hausinger RP; 
Reference Location: Proc Natl Acad Sci U S A 1994;91 :3233- 
3237. 

Database Reference INTERPRO; IPR002669; 

Comment: UreD is a urease accessory protein. Urease 

urease hydrolyses 

Comment: urea into ammonia and carbamic acid [2]. 
UreD is involved in 

Comment: activation of the urease enzyme via the 
UreD-UreF-UreG-urease complex 
Comment: [1] and is required for urease nickel 
metallocenter assembly [3]. 

Comment: See also UreF UreF, UreG HypBJJreG. 
Number of members: 23 


UreF 




UreF 


Accession number: PF01730 

Definition: UreF 

Author: Bashton M, Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Pfam-B_2037 (release 4.1} 

Gathering cutoffs: -31 -31 

Trusted cutoffs: -1 4.30 -1 4.30 

Noise cutoffs: -49.30 -49.30 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96404789 

Reference Title: Purification and activation properties of 
UreD-UreF-urease 

Reference Title: apoprotein complexes. 
Reference Author: Moncrief MB, Hausinger RP; 
Reference Location: J Bacterioi 1 996;178:541 7-5421 . 
Reference Number: [2] 
Reference Medline: 961 4651 0 

Reference Title: Organization of Ureaplasma urealyticum 
urease gene cluster 

Reference Title: and expression in a suppressor strain of 
Escherichia coii. 

Reference Author: Neyroiles O, Ferris S, Behbahani N, 
Montagnier L, Bianchard 
Reference Author: A; 

Reference Location: J Bacterioi 1996;178:647-655. 
Database Reference INTERPRO; IPR002639; 
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Comment: This family consists of the Urease 
accessory protein 

Comment: UreF. The urease enzyme (urea 
amidohydrolase} 

Comment: hydroiyses urea into ammonia and carbarn ic 
acid [2]. 

Comment: UreF is proposed to modulate the activation 
process of 

Comment: urease by eliminating the binding of nickel 
irons to 

Comment: noncarbamylated protein [1]. 
Number of members: 20 


Vif 




Retroviral Vif (Viral > 
nfectivity) protein I 

j 


Accession number: PF00559 

definition: Retroviral Vif (Viral infectivity) protein 

<\uthor: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Swiss- Prot 

3athering cutoffs: 25 25 

Trusted cutoffs: 53.90 53.90 

Noise cutoffs: 23.60 23.60 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 95287525 

Reference Title: Aberrant Gag protein composition of a 
human 

Reference Title: immunodeficiency virus type 1 vif mutant 
produced in 

Reference Title: primary lymphocytes. 

Reference Author: Simm M, Shahabuddin M, Chao W, Allan 

JS, Volsky DJ; 

Reference Location: J Virol 1995;69:4582-4586. 
Database Reference INTERPRO; IPR000475; 
Comment: -!- Human immunodeficiency virus type 1 
(HIV-t) Vif is required for 

Comment: productive infection of T lymphocytes and 
macrophages. Virions 

Comment: produced in the absence of Vif have 
abnormal core morphology and 

Comment: those produced in primary T cells carry immature core 
proteins 

Comment: and low levels of mature capsid. 
Number of members: 503 


Vpu 




Vpu protein 


Accession number: PF00558 

Definition: Vpu protein 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Swiss-Prot 

Gathering cutoffs' 15 15 

Trusted cutoffs: 1 5.50 1 5.50 

Noise cutoffs: 1 3.60 1 3.60 

HMM build command line: hmmbuiid -f HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97479365 

Reference Title: Enhancement of retroviral production from 
packaging cell 

Reference Title: lines expressing the human 
immunodeficiency type 1 VPU 
Reference Title: gene. 

Reference Author: Kobinger GP, Mouland AJ, Lalonde JP, 
Forget J, Cohen EA; 

Reference Location: Gene Ther 1 997;4:868-874. 
Reference Number: [2] 
Reference Medline: 951 56576 

Reference Title: The human immunodeficiency virus type 1 
Vpu protein 

Reference Title: specifically binds to the cytoplasmic domain 
of CD4: 

Reference Title: implications for the mechanism of 
degradation. 

Reference Author: Bour S, Schubert U, Strebel K; 
Reference Location: J Virol 1 995:69:1 51 0-1 520. 
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Reference Number; [3] 
Reference Medline: 97325981 

Reference Title: Secondary structure and tertiary fold of the 
luman 

Reference Title: immunodeficiency virus protein U (Vpu) 

:ytoplasmic domain 

Reference Title: in solution. 

Reference Author: Wiilbold D, Hoffmann S, Rosen P; 

Reference Location: Eur J Biochem 1 997;245:581 -588. 

Database Reference: SCOP; 1vpu; fa; [SCOP-USA][CATH- 

3 DBSUM] 

Database Reference iNTERPRO; IPR002094; 

Database Reference PDB; 1 vpu ; 38; 81 ; 

Database reference: PFAMB; PBO03303; 

Database reference: PFAMB; PB005882; 

Comment: -!- The Vpu protein contains an N-terminal 

ransmembrane spanning region 

Comment: and a C-terminal cytoplasmic region. 
Comment: -!- The HIV-1 Vpu protein stimulates virus 
production by enhancing 

Comment: the reiease of viral particles from infected 
:ells. 

Comment: -I- The VPU protein binds specifically to 
3D4. 

Slumber of members: 1 94 


XPG_N 


PDOC00658 


XPG protein signatures 


Xeroderma pigmentosum (XP) [1] is a human autosomal 
recessive disease, 

characterized by a high incidence of sunlight-induced skin 
cancer. People's 

skin ceils with this condition are hypersensitive to ultraviolet 
light, due 

to defects in the incision step of DNA excision repair. There are a 
minimum of 

seven genetic complementation groups involved in this pathway: 
XP-A to XP-G. 

The defect in XP-G can be corrected by a 133 Kd nuclear protein 
called XPG (or 
XPGC) [2]. 

XPG belongs to a family of proteins [2,3,4,5,6] that are 
composed of two 
main subsets: 

- Subset 1 , to which belongs XPG, RAD2 from budding yeast 
and rad13from 

fission yeast. RAD2 and XPG are single-stranded DNA 
endonucleases [7,8]. 

XPG makes the 3'incision in human DNA nucleotide excision 
repair [9]. 

- Subset 2, to which belongs mouse and human FEN-1 , rad2 
from fission yeast, 

and RAD27 from budding yeast. FEN-1 is a structure-specific 
endonuclease. 

In addition to the proteins listed in the above groups, this 

family also 

includes: 

- Fission yeast exol , a 5'->3' double-stranded DNA exonuclease 
that could act 

in a pathway that corrects mismatched base pairs. 

- Yeast EXG1 (DHS1), a protein with probably the same function 
as exol . 

-Yeast DIN7. 

Sequence alignment of this family of proteins reveals that 
similarities are 

largely confined to two regions. The first is located at the N- 
terminal 

extremity (N-region) and corresponds to the first 95 to 1 05 amino 
acids. The 

second region is internal (I -region) and found towards the C- 
termtnus; it 
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spans about 140 residues and contains a highly conserved 
sore of 27 amino 

acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS]). 
t is possible 

that the conserved acidic residues are involved in the catalytic 
mechanism of 

DNA excision repair in XPG. The amino acids linking the N- and 
-regions are 

not conserved; indeed, they are largely absent from proteins 
belonging to the 
second subset. 

We have developed two signature patterns for these proteins. 
The first 

corresponds to the central part of the N-region, the second to part 
of the I- 

region and includes the putative catalytic core pentapeptide. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [VI]-[KRE]-P-x-[FYIL]-V-F-D-G-x(2)-EPIL]-x- 
[LVC]-K 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern [GS]-[LIVM]-[PER]-FYS]-[LIVM]-x-A-P-x-E-A- 
[DE]-[PAS]- [QS]-[CLM] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Expert(s) to contact by email 

Clarkson S.G. clarkson<5>medecine.unige.ch 

Last update 

November 1997 / Patterns and text revised. 
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Y phosphatase 


PDOC00323 


Tyrosine specific protein 


Tyrosine specific protein phosphatases (EC 3.1 .3.48) (PTPase) 






phosphatases signature 


[1 to 5] are 






and profiles 


enzymes that catalyze the removal of a phosphate group 








attached to a tyrosine 








residue. These enzymes are very important in the control of 








cell growth, 








proliferation, differentiation and transformation. Multiple forms of 








n I rdac 








have been characterized and can be classified into two 








categories: soluble 








PTPases and transmembrane receptor proteins that contain 








PTPase domain(s). The 








currently known PTPases are listed below: 








Soluble PTPases. 








- PTPN1 (PTP-1B). 








- PTPN2 (T-ceil PTPase; TC-PTP). 








-PTPN3 (H1) andPTPN4 (MEG), enzymes that contain an N- 








terminal band 4.1- 








iike domain (see <PDOC00566>) and could act at junctions 








between the 








membrane and cytoskeleton. 








- PTPN5 (STEP). 








- PTPN6 (PTP-1C; HCP; SHP) and PTPN11 (PTP-2C; SH- 








PTP3; Syp), enzymes which 








contain two copies of the SH2 domain at its N-terminal 








tJAll Ci I Illy . I lie 








Drosophila protein corkscrew (gene csw) also belongs to this 








subgroup. 








- PTPN7 (LC-PTP; Hematopoietic protein-tyrosine phosphatase; 








HePTP). 








- PTPN8 (70Z-PEP). 








- PTPN9 (MEG2). 








- PTPN12 (PTP-G1; PTP-P19). 








-Yeast PTP1. 








- Yeast PTP2 which may be involved in the ubiqustin- 








mediated protein 








degradation pathway. 








- Fission yeast pyp1 and pyp2 which play a role in inhibiting the 








onset of 








1 1 1 HUolo. 








- Fission yeast pyp3 which contributes to the dephosphorylation 








rvf rrtrP 








- Yeast CDC14 which may be involved in chromosome 








segregation. 








- Yersinia viruience plasmid PTPAses (gene yopH). 








- Autographa californica nuclear polyhedrosis virus 1 9 Kd 








PTPase. 








Dual specificity PTPases. 








- DUSP1 (PTPN10; MAP kinase phosphatase-1 ; MKP-1); which 








dephosphorylates MAP 








kinase on both Thr-183 and Tyr-1 85. 








- DUSP2 (PAC-1), a nuclear enzyme that dephosphorylates 








MAP kinases ERK1 and 








ERK2 on both Thr and Tyr residues. 








- DUSP3 (VHR). 








- ni l^Pd ^m\/H9\ 

uuor *+ \n v nz.). 








- DUSP5 (HVH3). 








-DUSP6 <Pyst1;MKP-3). 








- DUSP7 (Pyst2; MKP-X). 








- Yeast MSGS, a PTPase that dephosphorylates MAP kinase 








FUS3. 








- Yeast YVH1 . 








- Vaccinia virus H1 PTPase; a dual specificity phosphatase. 








Receptor PTPases. 
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Structurally, all known receptor PT Pases, are made up of a 
variable length 

3xtracetlular domain, followed by a transmembrane region and 
a C-terminal 

catalytic cytoplasmic domain. Some of the receptor PTPases 
contain fibronectin 

ype III (FN-ill) repeats, immunogiobulin-like domains, MAM 
domains or 

carbonic anhydrase-like domains in their extracellular region. The 
cytoplasmic 

region generally contains two copies of the PTPAse domain. The 
first seems to 

have enzymatic activity, while the second is inactive but seems 
to affect 

substrate specificity of the first. In these domains, the catalytic 
cysteine 

is generally conserved but some other, presumably important, 
residues are not. 

In the following table, the domain structure of known receptor 

PTPases is 

shown: 

Extracellular Intracellular 

Ig FN-3 CAH MAM PTPase 

Leukocyte common antigen (LCA) (CD45) 0 2 0 0 2 
Leukocyte antigen related (LAR) 3 8 0 0 2 
Drosophtla DLAR 3 9 0 0 2 
Drosophila DPTP 2 2 0 0 2 
PTP-alpha (LRP) 0 0 0 0 2 
PTP-beta 0 16 0 0 1 
PTP-gamma 0 110 2 
PTP-delta 0 >7 0 0 2 
PTP-epsilon 0 0 0 0 2 
PTP-kappa 14 0 12 
PTP-mu 14 0 12 
PTP-zeta 0 110 2 

PTPase domains consist of about 300 amino acids. There are 
two conserved 

cysteines, the second one has been shown to be absolutely 
required for 

activity. Furthermore, a number of conserved residues in its 
immediate 

vicinity have also been shown to be important. 

We derived a signature pattern for PTPase domains centered on 

the active site 

cysteine. 

There are three profiles for PTPases, the first one spans the 
complete domain 

and is not specific to any subtype. The second profile is specific 
to dual- 

specificity PTPases and the third one to the PTP subfamily. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [LIVMF]-H-C-x(2}-G-x(3)-[STC]~[STAGP]-x- 
[LiVMFY] [C is the active site residue] 

Sequences known to belong to this class detected by the pattern 

ALL, except for nine sequences. 

Other sequence(s) detected in SWISS-PROT 3. 

Sequences known to belong to this class detected by the 1st 
profile ALL. 

Other sequence(s) detected in SWISS-PROT 2. 

Sequences known to belong to this class detected by the 2nd 
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profile ALL dual type PTPases. 

Dther sequence(s) detected in SWISS-PROT NONE. 

Sequences known to belong to this class detected by the 3rd 

profile ALL PTP type PTPases. 

Dther sequence(s) detected in SWISS-PROT NONE. 

Mote the M-phase inducer phosphatases (cdc25-type 
Phosphatase) are tyrosine- protein phosphatases that are not 
structurally related to the above PTPases. 

Mote this documentation entry is linked to both a signature pattern 
and to profiles. As profiles are much more sensitive than the 
pattern, you should use them if you have access to the necessary 
software tools to do so. 
Last update 

July 1999 / Text revised. 
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Zein 




Zein seed storage 
protein 


Accession number: PF01559 

Definition: Zein seed storage protein 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-BJ 81 (release 4.0) 

Gathering cutoffs: -21 -21 

Trusted cutoffs: 4.60 4.60 

Noise cutoffs: -46.60 -46.60 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 931 97294 

Reference Title: Studies of the zein-like aipha-prolamins 
based on an 

Reference Title: analysis of amino acid sequences: 
implications for their 

Reference Title: evolution and three-dimensional structure. 
Reference Author: Garratt R, Oiiva G, Caracelli I, Leite A, 
Arruda P; 

Reference Location: Proteins 1 993;1 5:88-99. 
Database Reference INTERPRO; IPR002530; 
Comment: Zeins are seed storage proteins. They are 
unusually rich in 

Comment: glutamine, proline, alanine, and leucine 
residues and their 

Comment: sequences show a series of tandem repeats 
[1]- 

Number of members: 48 


zf-AN1 




AN1-like Zinc finger 

- 


Accession number: PF01428 
Definition: AN 1 -like Zinc finger 
Author: Bateman A, SMART 
Alignment method of seed: Manual 
Source of seed members: SMART 
Gathering cutoffs: 1616 
Trusted cutoffs: 1 6.40 1 6.40 
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P utl Name 


Description 








Noise cutoffs: 7.30 7.30 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Mediine: 93292985 

Reference Title: Two related localized mRNAs from 

Xenopus laevis encode 

Reference Title: ubiquitin-like fusion proteins. 

Reference Author: Linnen JM, Bailey CP, Weeks DL; 

Reference Location: Gene 1993;128:181-188. 

Database reference: SMART; ZnF_AN 1 ; 

Database Reference INTERPRO; IPR000058; 

Comment: Zinc finger at the C-terminus of An1 

Swiss:Q91 889, a ubiquitin-like 

Comment: protein in Xenopus iaevis. 

Comment: The following pattern describes the zinc 

finger. 

Comment: C-X2-C-X(9-1 2)-C-X(1 -2J-C-X4-C-X2-H-X5- 
H-X-C 

Comment: Where X can be any amino acid, and 
numbers in brackets 

Comment: indicate the number of residues. 
Number of members: 1 8 


zf-B_box 


PDOC50015 


B-box zinc finger 


Accession number: PF00643 

Definition: B-box zinc finger. 

Author: Bateman A 

Alignment method of seed: pftools 

Source of seed members: Prosite 

Gathering cutoffs: 25 25 

Trusted cutoffs: 26.00 26.00 

Noise cutoffs: 24.50 29.90 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference: SCOP; 1fre; fa; [SCOP-USA][CATH- 

PDBSUM] 

Database reference: PROSlTE_PROFILE; PS501 19; 
Database Reference: PROSITE; PDOC50015 
Database Reference I NTERPRO; I PR002991 ; 
Database Reference PDB; 1fre ; 4; 42; 
Database reference: PFAMB; PB002777; 
Database reference: PFAMB; PB01 0625; 
Database reference: PFAMB; PB041 771 ; 
Number of members: 44 


zf-CONSTANS 




CONSTANS family zinc 
finger 


Accession number: PF01 760 

Definition: CONSTANS family zinc finger 

Author: Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pf am-B__1 072 (release 4.2) 

Gathering cutoffs: 25 1 0 

Trusted cutoffs: 76. 1 0 1 7.20 

Noise cutoffs: 9.70 9.70 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 9521 1 836 

Reference Title: The CONSTANS gene of Arabidopsis 
promotes flowering and 

Reference Title: encodes a protein showing similarities to 
zinc finger 

Reference Title- transcription factors. 

Reference Author: Putterill J, Robson F ? Lee K, Simon R, 

Coupland G; 

Reference Location: Cell 1 995;80:847-857. 
Database Reference INTERPRO; IPR002926; 
Number of members: 45 


zf-DHHC 




DHHC zinc finger domair 


\ Accession number: PF01529 
Definition: DHHC zinc finger domain 
Author: Bateman A 
Alignment method of seed; Clustaiw 
Source of seed members: Pfam-B_945 (release 4.0) 
Gathering cutoffs: 22 22 
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"rusted cutoffs: 22.40 22.40 

toise cutoffs: -22.40 -22.40 

HMM buiid command line: hmmbuiid HMM SEED 

HMM build command tine: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 99250263 

Reference Title: The drosophila STAM gene homolog is in a 
ight gene 

Reference Title: cluster, and its expression correlates to that 
3f the 

Reference Title: adjacent gene ial. 

Reference Author: Mesilaty-Gross S, Reich A, Motro B, 

Abides R; 

Reference Location: Gene 1 999;231 :1 73-1 86. 
Reference Number: [2] 
Reference Medline: 9731 5340 

Reference Title: Variations of the C2H2 zinc finger motif in 
he yeast 

Reference Title: genome and classification of yeast zinc 
finger proteins. 

Reference Author: Bohm S, Frishman D, Mewes HW; 
Reference Location: Nucleic Acids Res 1 997; 25:2464-2469. 
Reference Number: [3] 
Reference Medline: 99321009 

Reference Title: The DHHC domain: a new highly conserved 
cysteine-rich 

Reference Title: motif. 

Reference Author: Putilina T, Wong P, Gentleman S; 
Reference Location: Mol Cell Biochem 1999;1 95:21 9-226. 
Reference Number: [4] 
Reference Medline: 1 0490616 

Reference Title: Erf2, a Novel Gene Product That Affects the 
Localization 

Reference Title: and Paimitoylation of Ras2 in 
Saccharomyces cerevisiae. 

Reference Author: Bartels DJ, Mitchell DA, Dong X, 
Deschenes RJ; 

Reference Location: Mol Cell Biol 1999;19:6775-6787. 
Database Reference INTERPRO; IPR001 594; 
Comment: This domain is also known as NEW1 [2]. 
This domain is 

Comment: predicted to be a zinc binding domain. The 
function 

Comment: of this domain is unknown, but it has been 
predicted to 

Comment: be involved in protein-protein or protein-DNA 
Comment: interactions [3]. 
Number of members: 34 






zf-MYND 




MYND finger 


Accession number: PF01 753 

Definition: MYND finger 

Author: Bateman A 

Alignment method of seed: Manual 

Source of seed members: Bateman A 

Gathering cutoffs: 1111 

Trusted cutoffs: 1 7.30 1 7.30 

Noise cutoffs: 5.50 5.50 

HMM build command line: hmmbuiid HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 962031 1 8 

Reference Title: DEAF-1 , a novel protein that binds an 
essential region in a 

Reference Title: Deformed response element. 
Reference Author: Gross CT, McGinnis W; 
Reference Location: EMBO J 1996;15:1961-1970. 
Reference Number: [2] 
Reference Medline: you/yuoa 

Reference Title: Molecular cloning, sequence analysis, 
expression, and 

Reference Title: tissue distribution of suppressin, a novel 
suppressor of 

Reference Title: cell cycle entry. 

Reference Author: LeBoeuf RD, Ban EM, Green MM, Stone 
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Pfarn 



carbOpept 



Full Name 



PDOC00123 



Zinc carboxypeptidases, 
zinc-binding regions 
signatures 



AS, Propst SM, Blalock 

Reference Author: JE, TauberJD; 

Reference Location: J Biol Chem 1 998;273:361 -368. 

Database Reference INTERPRO; 1PR002893; 

Number of members: 48 



There are a number of different types of zinc-dependent 
carboxypeptidases (EC 

3.4.1 7.-) [1 ,2]. AH these enzymes seem to be structurally and 
functionally 

related The enzymes that belong to this family are listed below. 

Carboxypeptidase A1 (EC 3.4.17.1), a pancreatic digestive 
enzyme that can 

removes all C-terminal amino acids with the exception of Arg, 
Lys and Pro. 

Carboxypeptidase A2 (EC 3.4.17.15), a pancreatic digestive 
enzyme with a 

specificity similar to that of carboxypeptidase A1 , but with a 
preference 

for bulkier C-terminal residues. 

- Carboxypeptidase B (EC 3.4.1 7.2), also a pancreatic digestive 
enzyme, but 

that preferentially removes C-terminal Arg and Lys. 

- Carboxypeptidase N (EC 3,4.17.3) (also known as arginine 
carboxypeptidase) , 

a plasma enzyme which protects the body from potent 
vasoactive and 

inflammatory peptides containing C-terminal Arg or Lys (such 
as kinins or 

anaphylatoxins) which are released into the circulation. 
Carboxypeptidase H (EC 3.4.17.10) (also known as enkephalin 
convertase or 

carboxypeptidase E), an enzyme located in secretory granules 
of pancreatic 

islets, adrenal gland, pituitary and brain. This enzyme removes 
residual C- 

terminal Arg or Lys remaining after initial endoprotease 
cleavage during 
prohormone processing. 

- Carboxypeptidase M (EC 3.4.17.12), a membrane bound Arg 
and Lys specific 

enzyme. 

It is ideally situated to act on peptide hormones at local tissue 
sites 

where it could control their activity before or after interaction 
with 

specific plasma membrane receptors. 

- Mast ceil carboxypeptidase (EC 3.4.17.1), an enzyme with a 
specificity 

to carboxypeptidase A, but found in the secretory granules of 
mast cells. 

- Streptomyces griseus carboxypeptidase (Cpase SG) (EC 
3.4.17.-) [3], which 

combines the specificities of mammalian carboxypeptidases A 
and B. 

- Thermoactinomyces vulgaris carboxypeptidase T {EC 
3.4.17.18) (CPT) [4], 

which also combines the specificities of carboxypeptidases A 
and B. 

- AEBP1 [5], a transcriptional repressor active in preadipocytes. 
AEBP1 seems 

to regulate transcription by cleavage of other transcriptional 
proteins. 

Yeast hypothetical protein YHR132c. 

All of these enzymes bind an atom of zinc. Three conserved 
residues are 

implicated in the binding of the zinc atom: two histidines and a 
glutamic acid 

We have derived two signature patterns which contain these three 
zinc-ligands. 
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Description of pattern(s) and/or prof ile(s) 

Consensus pattern [PK]-x-[LIVMFY]-x-[LIVMFY]-x(4)-H-[STAG]-x- 
E-x-[LIVM]- [STAG]-x(6)-[LIVMFYTA] [H and E are zinc ligands] 
Sequences known to belong to this class detected by the pattern 
ALL 

Other sequence(s) detected in SWISS-PROT Bacillus sphaericus 
endopeptidase 1 which hydrolyses the gamma-D-Glu-(L)meso- 
diaminopimelic acid bond of spore cortex peptidogtycan [6] and 
which is possibly distantly related to zinc carboxypeptidases. 

Consensus pattern H-[STAG]-x(3)-[LIVME]-x(2)-[LIVMFYW]-P- 
FYW] [H is a zinc ligand] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 40. 

Note if a protein includes both signatures, the probability of it 
being a eukaryotic zinc carboxypeptidase is 100% 

Note these proteins belong to families M1 4A/M14B in the 
classification of peptidases [7,E1]. 
Last update 

November 1995 / Patterns and text revised. 
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ZZ 




Zinc finger present in 
dystrophin, CBP/p300 


Accession number: PF00569 

Definition: Zinc finger present in dystrophin, CBP/p300 
Author: SMART 
Alignment method of seed: Manual 

Source of seed members: Alignment kindly provided by SMART 

Gathering cutoffs: 14 14 

Trusted cutoffs: 1 4.60 1 4.60 

Noise cutoffs: 1 0.90 1 0.90 

HMM build command line: hmmbutld HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96402609 

Reference Title: ZZ and TAZ: new putative zinc fingers in 
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Proslte 


Full Name 


Description 








dystrophin and 

Reference Title: other proteins. 

Reference Author: Ponting CP, Blake DJ, Davies KE, 

Kendrick- Jones J, Winder 

Reference Author: SJ; 

Reference Location: Trends Biochem Sci 1996; 21 :1 1 -1 3. 
Database Reference: EXPERT; Chris. Ponting@human- 
anatomy.oxford.ac.uk; 

Database Reference INTERPRO; IPR000433; 
Database reference: PFAMB; PB041629; 
Comment: ZZ in dystrophin binds calmodulin 
Comment: Putative zinc finger; binding not yet shown. 
Number of members: 87 
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A A. Activities of Polypeptides Comprising Si gnal Peptides 

Polypeptides comprising signal peptides are a family of proteins that are typically 
targeted to (1) a particular organelle or intracellular compartment, (2) interact with a 
particular molecule or (3) for secretion outside of a host cell. Example of polypeptides 
comprising signal peptides include, without limitation, secreted proteins, soluble proteins, 
receptors, proteins retained in the ER, etc. 

These proteins comprising signal peptides are useful to modulate ligand-receptor 
interactions, cell-to-cell communication, signal transduction, intracellular communication, 
and activities and/or chemical cascades that take part in an organism outside or within of any 
particular cell. 

One class of such proteins are soluble proteins which are transported out of the cell. 
These proteins can act as ligands that bind to receptor to trigger signal transduction or to 
permit communication between cells. 

Another class is receptor proteins which also comprise a retention domain that lodges 
the receptor protein in the membrane when the cell transports the receptor to the surface of 
the cell. Like the soluble ligands, receptors can also modulate signal transduction and 
communication between cells. 

In addition the signal peptide itself can serve as a ligand for some receptors. An 
example is the interaction of the ER targeting signal peptide with the signal recognition 
particle (SRP). Here, the SRP binds to the signal peptide, halting translation, and the 
resulting SRP complex then binds to docking proteins located on the surface of the ER, 
prompting transfer of the protein into the ER. 

A description of signal peptide residue composition is described below in Subsection 

IV.C.l. 
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III. Methods of Modulating Polypeptide Production 

It is contemplated that polynucleotides of the invention can be incorporated into a 
host cell or in-vitro system to modulate polypeptide production. For instance, the SDFs 
prepared as described herein can be used to prepare expression cassettes useful in a number of 
techniques for suppressing or enhancing expression. 

An example are polynucleotides comprising sequences to be transcribed, such as 
coding sequences, of the present invention can be inserted into nucleic acid constructs to 
modulate polypeptide production. Typically, such sequences to be transcribed are 
heterologous to at least one element of the nucleic acid construct to generate a chimeric gene 
or construct. 

Another example of useful polynucleotides are nucleic acid molecules comprising 
regulatory sequences of the present invention. Chimeric genes or constructs can be generated 
when the regulatory sequences of the invention linked to heterologous sequences in a vector 
construct. Within the scope of invention are such chimeric gene and/or constructs. 

Also within the scope of the invention are nucleic acid molecules, whereof at least a part 
or fragment of these DNA molecules are presented in Tables 1 and 2 of the present application, 
and wherein the coding sequence is under the control of its own promoter and/or its own 
regulatory elements. Such molecules are useful for transforming the genome of a host cell or an 
organism regenerated from said host cell for modulating polypeptide production. 

Additionally, a vector capable of producing the oligonucleotide can be inserted into the 
host cell to deliver the oligonucleotide. 

More detailed description of components to be included in vector constructs are 
described both above and below. 

Whether the chimeric vectors or native nucleic acids are utilized, such 
polynucleotides can be incorporated into a host cell to modulate polypeptide production. 
Native genes and/or nucleic acid molecules can be effective when exogenous to the host cell. 

Methods of modulating polypeptide expression includes, without limitation: 

Suppression methods, such as 
Antisense 
Ribozymes 
Co-suppression 

Insertion of Sequences into the Gene to be Modulated 
Regulatory Sequence Modulation. 
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as well as Methods for Enhancing Production, such as 
Insertion of Exogenous Sequences; and 
Regulatory Sequence Modulation. 

IILA. Suppression 

Expression cassettes of the invention can be used to suppress expression of 
endogenous genes which comprise the SDF sequence. Inhibiting expression can be useful, 
for instance, to tailor the ripening characteristics of a fruit (Oeller et aL, Science 254>437 
(1991)) or to influence seed size_(WO98/07842) or to provoke cell ablation (Mariani et aL, 
Nature 357: 384-387 (1992). 

As described above, a number of methods can be used to inhibit gene expression in 
plants, such as antisense, ribozyme, introduction of exogenous genes into a host cell, 
insertion of a polynucleotide sequence into the coding sequence and/or the promoter of the 
endogenous gene of interest, and the like. 

III. A. 1. Antisense 

An expression cassette as described above can be transformed into host cell or 
plant to produce an antisense strand of RNA. For plant cells, antisense RNA inhibits gene 
expression by preventing the accumulation of mRNA which encodes the enzyme of interest, see, 
e.g., Sheehy et aL, Proc. Nat Acad. Set USA, 85:8805 (1988), and Hiatt et aL, U.S. Patent No. 
4,801,340. 

III.A.2. Ribozymes 

Similarly, ribozyme constructs can be transformed into a plant to cleave mRNA 
and down-regulate translation. 

TTT A .3 - Co-Suppression 

Another method of suppression is by introducing an exogenous copy of the gene 
to be suppressed. Introduction of expression cassettes in which a nucleic acid is configured in 
the sense orientation with respect to the promoter has been shown to prevent the accumulation o: 
mRNA. A detailed description of this method is described above. 



IILA.4. 



Insertion of Sequences into the Ge ne to be Modulated 
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Yet another means of suppressing gene expression is to insert a polynucleotide 
into the gene of interest to disrupt transcription or translation of the gene. 

Homologous recombination could be used to target a polynucleotide insert to a 
gene using the Cre-Lox system (A.C. Vergunst et al., Nucleic Acids Res. 26:2729 (1998), A.C. 
Vergunst et al., Plant MoL Biol 38:393 (1998), R Albert et al., Plant J. 7:649 (1995)). 

In addition, random insertion of polynucleotides into a host cell genome can also 
be used to disrupt the gene of interest. Azpiroz-Leehan et al. 5 Trends in Genetics 13:152 (1997). 
In this method, screening for clones from a library containing random insertions is preferred for 
identifying those that have polynucleotides inserted into the gene of interest. Such screening can 
be performed using probes and/or primers described above based on sequences from Tables 1 
and 2, fragments thereof, and substantially similar sequence thereto. The screening can also be 
performed by selecting clones or any transgenic plants having a desired phenotype. 

III.A.5. Re gulatory SequenceModulation 

The SDFs described in Tables 1 and 2, and fragments thereof are examples of 
nucleotides of the invention that contain regulatory sequences that can be used to suppress or 
inactivate transcription and/or translation from a gene of interest as discussed in LC.5. 

IILA.6. Genes Comprising Dominant-Nega tive Mutations 
When suppression of production of the endogenous, native protein is desired it 
is often helpful to express a gene comprising a dominant negative mutation. Production of 
protein variants produced from genes comprising dominant negative mutations is a useful 
tool for research Genes comprising dominant negative mutations can produce a variant 
polypeptide which is capable of competing with the native polypeptide, but which does not 
produce the native result. Consequently, over expression of genes comprising these mutations 
can titrate out an undesired activity of the native protein. For example, The product from a 
gene comprising a dominant negative mutation of a receptor can be used to constitutively 
activate or suppress a signal transduction cascade, allowing examination of the phenotype 
and thus the trait(s) controlled by that receptor and pathway. Alternatively, the protein arising 
from the gene comprising a dominant-negative mutation can be an inactive enzyme still capable 
of binding to the same substrate as the native protein and therefore competes with such native 
protein. 
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Products from genes comprising dominant-negative mutations can also act upon 
the native protein itself to prevent activity. For example, the native protein may be active only 
as a homo-multimer or as one subunit of a hetero-multimer. Incorporation of an inactive subunit 
into the multimer with native subunit(s) can inhibit activity. 

Thus, gene function can be modulated in host cells of interest by insertion into 
these cells vector constructs comprising a gene comprising a dominant-negative mutation. 

IILB. Enhanced Expression 

Enhanced expression of a gene of interest in a host cell can be accomplished by either 
(1) insertion of an exogenous gene; or (2) promoter modulation. 

IILB.l. Insertion of an Exogen ous Gene 

Insertion of an expression construct encoding an exogenous gene can boost the 
number of gene copies expressed in a host cell. 

Such expression constructs can comprise genes that either encode the native 
protein that is of interest or that encode a variant that exhibits enhanced activity as compared to 
the native protein. Such genes encoding proteins of interest can be constructed from the 
sequences from Tables 1 and 2, fragments thereof, and substantially similar sequence thereto. 

Such an exogenous gene can include either a constitutive promoter permitting 
expression in any cell in a host organism or a promoter that directs transcription only in 
particular cells or times during a host cell life cycle or in response to environmental stimuli. 

IILB.2. Regulatory Sequence Modulation 

The SDFs of Tables 1 and 2, and fragments thereof, contain regulatory sequences 
that can be used to enhance expression of a gene of interest. For example, some of these 
sequences contain useful enhancer elements. In some cases, duplication of enhancer elements or 
insertion of exogenous enhancer elements will increase expression of a desired gene from a 
particular promoter. As other examples, all 11 promoters require binding of a regulatory protein 
to be activated, while some promoters may need a protein that signals a promoter binding 
protein to expose a polymerase binding site. In either case, over-production of such proteins 
can be used to enhance expression of a gene of interest by increasing the activation time of the 
promoter. 

Such regulatory proteins are encoded by some of the sequences in Tables 1 and 
2, fragments thereof, and substantially similar sequences thereto. 
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Coding sequences for these proteins can be constructed as described above. 

IV. Gene Constructs and Vector Construction 

To use isolated SDFs of the present invention or a combination of them or parts and/or 
mutants and/or fusions of said SDFs in the above techniques, recombinant DNA vectors which 
comprise said SDFs and are suitable for transformation of cells, such as plant cells, are usually 
prepared. The SDF construct can be made using standard recombinant DNA techniques 
(Sambrook et al. 1989) and can be introduced to the species of interest by Agrobacterium- 
mediated transformation or by other means of transformation (e.g. 9 particle gun 
bombardment) as referenced below. 

The vector backbone can be any of those typical in the art such as plasmids, viruses, 
artificial chromosomes, BACs, YACs and PACs and vectors of the sort described by 

(a) BAC: Shizuya et al., Proc. Natl. Acad. Sci. USA 89: 8794-8797 (1992); 
Hamilton et al., Proc. Natl. Acad. Sci. USA 93: 9975-9979 (1996); 

(b) YAC: Burke et al. Science 236:806-812 (1987);. 

(c) PAC: Sternberg N. et al, Proc Natl Acad Sci USA. Jan;87(l): 103-7 (1990); 

(d) Bacteria-Yeast Shuttle Vectors: Bradshaw et al, Nucl Acids Res 23: 4850- 
4856 (1995); 

(e) Lambda Phage Vectors: Replacement Vector, e.g., 
Frischauf et al, J. Mol Biol 170: 827-842 (1983); or Insertion vector, e.g., 

Huynh et al. In: Glover NM (ed) DNA Cloning: A practical Approach, Vol.1 Oxford: IRL 
Press (1985); 

(f) T-DNA gene fusion vectors :Walden et al, Mol Cell Biol 1: 175-194 (1990); 
and 

(g) Plasmid vectors: Sambrook et al, infra. 

Typically, a vector will comprise the exogenous gene, which in its turn comprises an 
SDF of the present invention to be introduced into the genome of a host cell, and which gene 
may be an antisense construct, a ribozyme construct chimeraplast, or a coding sequence with 
any desired transcriptional and/or translational regulatory sequences, such as promoters, UTRs, 
and y end termination sequences. Vectors of the invention can also include origins of 
replication, scaffold attachment regions (SARs), markers, homologous sequences, introns, etc. 

A DNA sequence coding for the desired polypeptide, for example a cDNA sequence 
encoding a full length protein, will preferably be combined with transcriptional and translational 
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initiation regulatory sequences which will direct the transcription of the sequence from the gene 
in the intended tissues of the transformed plant. 

For example, for over-expression, a plant promoter fragment may be employed that will 
direct transcription of the gene in all tissues of a regenerated plant. Alternatively, the plant 
promoter may direct transcription of an SDF of the invention in a specific tissue (tissue-specific 
promoters) or may be otherwise under more precise environmental control (inducible 
promoters). 

If proper polypeptide productionis desired, a polyadenylation region at the 3 -end of the 
coding region is typically included. The polyadenylation region can be derived from the natural 
gene, from a variety of other plant genes, or from T-DNA. 

The vector comprising the sequences from genes or SDF or the invention may 
comprise a marker gene that confers a selectable phenotype on plant cells. The vector can 
include promoter and coding sequence, for instance. For example, the marker may encode 
biocide resistance, particularly antibiotic resistance, such as resistance to kanamycin, G418, 
bleomycin, hygromycin, or herbicide resistance, such as resistance to chlorosulfuron or 
phosphinotricin. 

IV A. Coding Sequences 

Generally, the sequence in the transformation vector and to be introduced into 
the genome of the host cell does not need to be absolutely identical to an SDF of the present 
invention. Also, it is not necessary for it to be full length, relative to either the primary 
transcription product or fully processed mRNA. Furthermore, the introduced sequence need not 
have the same intron or exon pattern as a native gene. Also, heterologous non-coding segments 
can be incorporated into the coding sequence without changing the desired amino acid sequence 
of the polypeptide to be produced. 

IV.B. Promoters 

As explained above, introducing an exogenous SDF from the same species or an 
orthologous SDF from another species can modulate the expression of a native gene 
corresponding to that SDF of interest. Such an SDF construct can be under the control of 
either a constitutive promoter or a highly regulated inducible promoter (e.g., a copper 
inducible promoter). The promoter of interest can initially be either endogenous or 
heterologous to the species in question. When re-introduced into the genome of said species, 
such promoter becomes exogenous to said species. Over-expression of an SDF transgene can 
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lead to co-suppression of the homologous endogeneous sequence thereby creating some 
alterations in the phenotypes of the transformed species as demonstrated by similar analysis 
of the chalcone synthase gene (Napoli et al., Plant Cell 2:279 (1990) and van der Krol et al. 
Plant Cell 2:291 (1990)). If an SDF is found to encode a protein with desirable 
characteristics, its over-production can be controlled so that its accumulation can be 
manipulated in an organ- or tissue-specific manner utilizing a promoter having such 
specificity. 

Likewise, if the promoter of an SDF (or an SDF that includes a promoter) is found to 
be tissue-specific or developmental^ regulated, such a promoter can be utilized to drive or 
facilitate the transcription of a specific gene of interest (eg., seed storage protein or root- 
specific protein). Thus, the level of accumulation of a particular protein can be manipulated 
or its spatial localization in an organ- or tissue- specific manner can be altered. 

TV. C Signal Peptides 

SDFs of the present invention containing signal peptides are indicated in Tables 1 and 
2. In some cases it may be desirable for the protein encoded by an introduced exogenous or 
orthologous SDF to be targeted (1) to a particular organelle intracellular compartment, (2) to 
interact with a particular molecule such as a membrane molecule or (3) for secretion outside 
of the cell harboring the introduced SDF. This will be accomplished using a signal peptide. 

Signal peptides direct protein targeting, are involved in ligand-receptor interactions 
and act in cell to cell communication. Many proteins, especially soluble proteins, contain a 
signal peptide that targets the protein to one of several different intracellular compartments. 
In plants, these compartments include, but are not limited to, the endoplasmic reticulum (ER), 
mitochondria, plastids (such as chloroplasts), the vacuole, the Golgi apparatus, protein 
storage vessicles (PSV) and, in general, membranes. Some signal peptide sequences are 
conserved, such as the Asn-Pro-Ile-Arg amino acid motif found in the N-terminal propeptide 
signal that targets proteins to the vacuole (Marty (1999) The Plant Cell 11: 587-599). Other 
signal peptides do not have a consensus sequence per se, but are largely composed of 
hydrophobic amino acids, such as those signal peptides targeting proteins to the ER (Vitale 
and Denecke (1999) The Plant Cell 11: 615-628). Still others do not appear to contain either 
a consensus sequence or an identified common secondary sequence, for instance the 
chloroplast stromal targeting signal peptides (Keegstra and Cline (1999) The Plant Cell 11: 
557-570). Furthermore, some targeting peptides are bipartite, directing proteins first to an 
organelle and then to a membrane within the organelle (e.g. within the thylakoid lumen of the 
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chloroplast; see Keegstra and Cline (1999) The Plant Cell 11: 557-570). In addition to the 
diversity in sequence and secondary structure, placement of the signal peptide is also varied. 
Proteins destined for the vacuole, for example, have targeting signal peptides found at the N- 
terminus, at the C-terminus and at a surface location in mature, folded proteins. Signal 
peptides also serve as ligands for some receptors. 

These characteristics of signal proteins can be used to more tightly control the 
phenotypic expression of introduced SDFs. In particular, associating the appropriate signal 
sequence with a specific SDF can allow sequestering of the protein in specific organelles 
(plastids, as an example), secretion outside of the cell, targeting interaction with particular 
receptors, etc. Hence, the inclusion of signal proteins in constructs involving the SDFs of the 
invention increases the range of manipulation of SDF phenotypic expression. The nucleotide 
sequence of the signal peptide can be isolated from characterized genes using common 
molecular biological techniques or can be synthesized in vitro. 

In addition, the native signal peptide sequences, both amino acid and nucleotide, 
described in Tables 1 and 2 can be used to modulate polypeptide transport. Further variants of 
the native signal peptides described in Tables 1 and 2 are contemplated. Insertions, deletions, or 
substitutions can be made. Such variants will retain at least one of the functions of the native 
signal peptide as well as exhibiting some degree of sequence identity to the native sequence. 

Also, fragments of the signal peptides of the invention are useful and can be fused with 
other signal peptides of interest to modulate transport of a polypeptide. 

V. Transformation Techniques 

A wide range of techniques for inserting exogenous polynucleotides are known for a 
number of host cells, including, without limitation, bacterial, yeast, mammalian, insect and plant 
cells. 

Techniques for transforming a wide variety of higher plant species are well known and 
described in the technical and scientific literature. See, e.g. Weising et al., Ann. Rev. Genet. 
22:421 (1988); and Christou, Euphytica, v. 85, n.l-3:13-27, (1995). 

DNA constructs of the invention may be introduced into the genome of the desired plant 
host by a variety of conventional techniques. For example, the DNA construct may be 
introduced directly into the genomic DNA of the plant cell using techniques such as 
electroporation and microinjection of plant cell protoplasts, or the DNA constructs can be 
introduced directly to plant tissue using ballistic methods, such as DNA particle bombardment. 
Alternatively, the DNA constructs may be combined with suitable T-DNA flanking regions and 
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introduced into a conventional Agrobacterium tumefaciens host vector. The virulence functions 
of the Agrobacterium tumefaciens host will direct the insertion of the construct and adjacent 
marker into the plant cell DNA when the cell is infected by the bacteria (McCormac et al., Mol. 
BiotechnoL 8:199 (1997); Hamilton, Gene 200:107 (1997)); Salomon et al. EMBOJ. 3:141 
(1984); Herrera-Estrella et al. EMBOJ. 2:987 (1983). 

Microinjection techniques are known in the art and well described in the scientific and 
patent literature. The introduction of DNA constructs using polyethylene glycol precipitation is 
described in Paszkowski et al. EMBOJ. 3:2717 (1984). Electroporation techniques are 
described in Fromm et al. Proc. Natl Acad Sci. USA 82:5824 (1985). Ballistic transformation 
techniques are described in Klein et al. Nature 327:773 (1987). Agrobacterium 
tumefaciens-m.edia.ted transformation techniques, including disarming and use of binary or co- 
integrate vectors, are well described in the scientific literature. See, for example Hamilton, CM., 
Gene 200:107 (1997); Miiller et al. Mol. Gen. Genet. 207:171 (1987); Komari et al. Plant J. 
10:165 (1996); Venkateswarlu et al. Biotechnology 9:1103 (1991) and Gleave,AP., Plant Mol. 
Biol. 20:1203 (1992); Graves and Goldman, Plant Mol. Biol. 7:34 (1986) and Gould et al., Plant 
Physiology 25_:426 (1991). 

Transformed plant cells which are derived by any of the above transformation 
techniques can be cultured to regenerate a whole plant that possesses the transformed genotype 
and thus the desired phenotype such as seedlessness. Such regeneration techniques rely on 
manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a 
biocide and/or herbicide marker which has been introduced together with the desired nucleotide 
sequences. Plant regeneration from cultured protoplasts is described in Evans et al., Protoplasts 
Isolation and Culture in Handbook of Plant Cell Culture," pp. 124-176, MacMillan Publishing 
Company, New York, 1983; and Binding, Regeneration of Plants, Plant Protoplasts, pp. 21-73, 
CRC Press, Boca Raton, 1988. Regeneration can also be obtained from plant callus, explants, 
organs, or parts thereof. Such regeneration techniques are described generally in Klee et al. Ann. 
Rev. of Plant Phys. 38:467 (1987). Regeneration of monocots (rice) is described by Hosoyama 
et al. (Biosci. BiotechnoL Biochem. 58:1500 (1994)) and by Ghosh et al. (/. BiotechnoL 32:1 
(1994)). The nucleic acids of the invention can be used to confer desired traits on essentially any 
plant. 

Thus, the invention has use over a broad range of plants, including species from the 
genera Anacardium, Arachis, Asparagus, Atropa, Avena, Brassica, Citrus, Citrullus, Capsicum, 
Carthamus, Cocos, Coffea, Cucumis, Cucurbita, Daucus, Elaeis, Fragaria, Glycine, Gossypium, 
Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, Linum, Lolium^Lupinus, 
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Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicotiana, Olea, Oryza, Panieum, 
Pannesetum, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus, Secale, 
Senecio, Sinapis, Solatium, Sorghum, Theobromus, Trigonella, Triticum, Vicia, Vitis, Vigna, 
and, Zea. 

One of skill will recognize that after the expression cassette is stably incorporated in 
transgenic plants and confirmed to be operable, it can be introduced into other plants by 
sexual crossing. Any of a number of standard breeding techniques can be used, depending 
upon the species to be crossed. 

The particular sequences of SDFs identified are provided in the attached Tables 1 and 
2. One of ordinary skill in the art, having this data, can obtain cloned DNA fragments, 
synthetic DNA fragments or polypeptides constituting desired sequences by recombinant 
methodology known in the art or described herein. 

EXAMPLES 

The invention is illustrated by way of the following examples. The invention is not 
limited by these examples as the scope of the invention is defined solely by the claims 
following. 

EXAMPLE 1: cDNA PREPARATION 

A number of the nucleotide sequences disclosed in Tables 1 and 2 herein as 
representative of the SDFs of the invention can be obtained by sequencing genomic DNA 
(gDNA) and/or cDNA from corn plants grown from HYBRID SEED # 35A19, purchased from 
Pioneer Hi-Bred International, Inc., Supply Management, P.O. Box 256, Johnston, Iowa 50131- 
0256. 

A number of the nucleotide sequences disclosed in Tables 1 and 2 herein as 
representative of the SDFs of the invention can also be obtained by sequencing genomic 
DNA from Arabidopsis thaliana, Wassilewskija ecotype or by sequencing cDNA obtained 
from mRNA from such plants as described below. This is a true breeding strain. Seeds of 
the plant are available from the Arabidopsis Biological Resource Center at the Ohio State 
University, under the accession number CS2360. Seeds of this plant were deposited under 
the terms and conditions of the Budapest Treaty at the American Type Culture Collection, 
Manassas, VA on August 31, 1999, and were assigned ATCC No. PTA-595. 

Other methods for cloning full-length cDNA are described, for example, by Seki et 
al., Plant Journal 15:707-720 (1998) High-efficiency cloning of Arabidopsis full-length 
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cDNA by biotinylated Cap trapper"; Maruyama et aL, Gene 138:171 (1994) Oligo-capping a 
simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides"; 
and WO 96/34981. 

Tissues were, or each organ was, individually pulverized and frozen in liquid 
nitrogen. Next, the samples were homogenized in the presence of detergents and then 
centrifuged. The debris and nuclei were removed from the sample and more detergents were 
added to the sample. The sample was centrifuged and the debris was removed. Then the 
sample was applied to a 2M sucrose cushion to isolate polysomes. The RNA was isolated by 
treatment with detergents and proteinase K followed by ethanol precipitation and 
centrifugation. The polysomal RNA from the different tissues was pooled according to the 
following mass ratios: 15/15/1 for male inflorescences, female inflorescences and root, 
respectively. The pooled material was then used for cDNA synthesis by the methods 
described below. 

Starting material for cDNA synthesis for the exemplary corn cDNA clones 
with sequences presented in Tables 1 and 2 was poly(A)-containing polysomal mRNAs from 
inflorescences and root tissues of corn plants grown from HYBRID SEED # 35A19. Male 
inflorescences and female (pre-and post-fertilization) inflorescences were isolated at various 
stages of development. Selection for poly(A) containing polysomal RNA was done using 
oligo d(T) cellulose columns, as described by Cox and Goldberg, Plant Molecular Biology: 
A Practical Approach", pp. 1-35, Shaw ed., c. 1988 by IRL, Oxford. The quality and the 
integrity of the polyA+ RNAs were evaluated. 

Starting material for cDNA synthesis for the exemplary Arabidopsis cDNA 
clones with sequences presented in Tables 1 and 2 was polysomal RNA isolated from the top- 
most inflorescence tissues of Arabidopsis thaliana Wassilewskija (Ws.) and from roots of 
Arabidopsis thaliana Landsberg erecta (L. er.) ? also obtained from the Arabidopsis 
Biological Resource Center. Nine parts inflorescence to every part root was used, as 
measured by wet mass. Tissue was pulverized and exposed to liquid nitrogen. Next, the 
sample was homogenized in the presence of detergents and then centrifuged. The debris and 
nuclei were removed from the sample and more detergents were added to the sample. The 
sample was centrifuged and the debris was removed and the sample was applied to a 2M 
sucrose cushion to isolate polysomal RNA. Cox et aL, Plant Molecular Biology: A Practical 
Approach", pp. 1-35, Shaw ed., c. 1988 by IRL, Oxford. The polysomal RNA was used 
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for cDNA synthesis by the methods described below. Polysomal mRNA was then isolated as 
described above for corn cDNA. The quality of the RNA was assessed electrophoretically. 

Following preparation of the mRNAs from various tissues as described above, selection 
of mRNA with intact 5' ends and specific attachment of an oligonucleotide tag to the 5 ? end of 
such mRNA was performed using either a chemical or enzymatic approach. Both techniques 
take advantage of the presence of the cap" structure, which characterizes the 5 ' end of most 
intact mRNAs and which comprises a guanosine generally methylated once, at the 7 position. 

The chemical modification approach involves the optional elimination of the 2\ 3'-cis 
diol of the 3' terminal ribose, the oxidation of the 2% 3'-cis diol of the ribose linked to the cap of 
the 5' ends of the mRNAs into a dialdehyde, and the coupling of the such obtained dialdehyde to 
a derivatized oligonucleotide tag. Further detail regarding the chemical approaches for 
obtaining mRNAs having intact 5' ends are disclosed in International Application No. 
W096/34981 published November 7, 1996. 

The enzymatic approach for ligating the oligonucleotide tag to the intact 5' ends of 
mRNAs involves the removal of the phosphate groups present on the 5 ? ends of uncapped 
incomplete mRNAs, the subsequent decapping of mRNAs having intact 5 ' ends and the ligation 
of the phosphate present at the 5' end of the decapped mRNA to an oligonucleotide tag. Further 
detail regarding the enzymatic approaches for obtaining mRNAs having intact 5' ends are 
disclosed in Dumas Milne Edwards J.B. (Doctoral Thesis of Paris VI University, Le clonage des 
ADNc complets: difficulty et perspectives nouvelles. Apports pour 1 'etude de la regulation de 
l'expression de la tryptophane hydroxylase de rat, 20 Dec. 1993), EPO 625572 and Kato et aL, 
Gene 150:243-250 (1994). 

In both the chemical and the enzymatic approach, the oligonucleotide tag has a 
restriction enzyme site (e.g. an EcoRI site) therein to facilitate later cloning procedures. 
Following attachment of the oligonucleotide tag to the mRNA, the integrity of the mRNA is 
examined by performing a Northern blot using a probe complementary to the oligonucleotide 
tag. 
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For the mRNAs joined to oligonucleotide tags using either the chemical or the enzymatic 
method, first strand cDNA synthesis is performed using an oligo-dT primer with reverse 
transcriptase. This oligo-dT primer can contain an internal tag of at least 4 nucleotides, which 
can be different from one mRNA preparation to another. Methylated dCTP is used for cDNA 
first strand synthesis to protect the internal EcoRI sites from digestion during subsequent steps. 
The first strand cDNA is precipitated using isopropanol after removal of RNA by alkaline 
hydrolysis to eliminate residual primers. 

Second strand cDNA synthesis is conducted using a DNA polymerase, such as Klenow 
fragment and a primer corresponding to the 5' end of the ligated oligonucleotide. The primer is 
typically 20-25 bases in length. Methylated dCTP is used for second strand synthesis in order to 
protect internal EcoRI sites in the cDNA from digestion during the cloning process. 

Following second strand synthesis, the full-length cDNAs are cloned into a phagemid 
vector, such as pBlueScript™ (Stratagene). The ends of the full-length cDNAs are blunted with 
T4 DNA polymerase (Biolabs) and the cDNA is digested with EcoRI. Since methylated dCTP 
is used during cDNA synthesis, the EcoRI site present in the tag is the only hemi-methylated 
site; hence the only site susceptible to EcoRI digestion. In some instances, to facilitate 
subcloning, an Hind III adapter is added to the 3' end of full-length cDNAs. 

The full-length cDNAs are then size fractionated using either exclusion chromatography 
(AcA, Biosepra) or electrophoretic separation which yields 3 to 6 different fractions. The full- 
length cDNAs are then directionally cloned either into pBlueScript™ using either the EcoRI and 
Smal restriction sites or, when the Hind III adapter is present in the full-length cDNAs, the 
EcoRI and Hind III restriction sites. The ligation mixture is transformed, preferably by 
electroporation, into bacteria, which are then propagated under appropriate antibiotic selection. 

Clones containing the oligonucleotide tag attached to full-length cDNAs are selected as 
follows. 

The plasmid cDNA libraries made as described above are purified (e.g. by a column 
available from Qiagen). A positive selection of the tagged clones is performed as follows. 
Briefly, in this selection procedure, the plasmid DNA is converted to single stranded DNA using 
phage Fl gene II endonuclease in combination with an exonuclease (Chang et al., Gene 127:95 
(1993)) such as exonuclease III or T7 gene 6 exonuclease. The resulting single stranded DNA is 
then purified using paramagnetic beads as described by Fry et al., Biotechniques 13: 124 (1992). 
Here the single stranded DNA is hybridized with a biotinylated oligonucleotide having a 
sequence corresponding to the 3' end of the oligonucleotide tag. Preferably, the primer has a 
length of 20-25 bases. Clones including a sequence complementary to the biotinylated 
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oligonucleotide are selected by incubation with streptavidin coated magnetic beads followed by 
magnetic capture. After capture of the positive clones, the plasmid DNA is released from the 
magnetic beads and converted into double stranded DNA using a DNA polymerase such as 
ThermoSequenase™ (obtained from Amersham Pharmacia Biotech). Alternatively, protocols 
such as the Gene Trapper™ kit (Gibco BRL) can be used. The double stranded DNA is then 
transformed, preferably by electroporation, into bacteria. The percentage of positive clones 
having the 5' tag oligonucleotide is typically estimated to be between 90 and 98% from dot blot 
analysis. 

Following transformation, the libraries are ordered in microtiter plates and sequenced. 
The Arabidopsis library was deposited at the American Type Culture Collection on January 
7, 2000 as E-coli liba 010600" under the accession number FTA-1161. 
EXAMPLE 2! SOUTHERN HYBRIDIZATIONS 

The SDFs of the invention can be used in Southern hybridizations as described above. 
The following describes extraction of DNA from nuclei of plant cells, digestion of the 
nuclear DNA and separation by length, transfer of the separated fragments to membranes, 
preparation of probes for hybridization, hybridization and detection of the hybridized probe. 

The procedures described herein can be used to isolate related polynucleotides or for 
diagnostic purposes. Moderate stringency hybridization conditions, as defined above, are 
described in the present example. These conditions result in detection of hybridization 
between sequences having at least 70% sequence identity. As described above, the 
hybridization and wash conditions can be changed to reflect the desired percenatge of 
sequence identity between probe and target sequences that can be detected. 

In the following procedure, a probe for hybridization is produced from two PCR 
reactions using two primers from genomic sequence of Arabidopsis thaliana. As described 
above, the particular template for generating the probe can be any desired template. 

The first PCR product is assessed to validate the size of the primer to assure it is of 
the expected size. Then the product of the first PCR is used as a template, with the same pair 
of primers used in the first PCR, in a second PCR that produces a labeled product used as the 
probe. 

Fragments detected by hybridization, or other bands of interest, can be isolated from 
gels used to separate genomic DNA fragments by known methods for further purification 
and/or characterization. 
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Buffers for nuclear DNA extraction 
1. 10XHB 





1000 ml 




40 mM spermidine 


10.2 g 


Spermine (Sigma S-2876) and spermidine (Sigma 
S-2501) 


10 mM spermine 


3.5 g 


Stabilize chromatin and the nuclear membrane 


0.1 M EDTA 
(disodium) 


37.2 g 


EDTA inhibits nuclease 


0.1 M Tris 


12.1 g 


Buffer 


0.8 M KC1 


59.6 g 


Adjusts ionic strength for stability of nuclei 

J 



Adjust pH to 9.5 with 10 N NaOH. It appears that there is a nuclease present in 
leaves. Use of pH 9.5 appears to inactivate this nuclease. 



2. 2 M sucrose (684 g per 1000 ml) 

Heat about half the final volume of water to about 50°C. Add the sucrose slowly then 
bring the mixture to close to final volume; stir constantly until it has dissolved. Bring 
the solution to volume. 

3. Sarkosyl solution (lyses nuclear membranes) 

1000 ml 

N-lauroyl sarcosine (Sarkosyl) 20.0 g 

0.1 M Tris 12.1 g 

0.04 M EDTA (Disodium) 14.9 g 



Adjust the pH to 9.5 after all the components are dissolved and bring up to the proper 
volume. 



rney No. 2750-1237P 



1063 



20% Triton X-100 

80 ml Triton X-100 

320 ml lxHB (w/o p-ME and PMSF) 

Prepare in advance; Triton takes some time to dissolve 

Procedure 

Prepare IX H" buffer (keep ice-cold during use) 

1000 ml 

10XHB 100 ml 

2 M sucrose 250 ml a non-ionic osmoticum 

Water 634 ml 

Added just before use: 



100 mM PMSF* 
B-mercaptoethanol 



10 ml a protease inhibitor; protects 
nuclear membrane proteins 
1 ml inactivates nuclease by reducing 
disulfide bonds 



*100 mM PMSF 

(phenyl methyl sulfonyl fluoride, Sigma P-7626) 
(add 0.0875 g to 5 ml 100% ethanol) 

Homogenize the tissue in a blender (use 300-400 ml of lxHB per blender). Be sure 
that you use 5-10 ml of HB buffer per gram of tissue. Blenders generate heat so be 
sure to keep the homogenate cold. It is necessary to put the blenders in ice 
periodically. 



Add the 20% Triton X-100 (25 ml per liter of homogenate) and gently stir on ice for 
20 min. This lyses plastid, but not nuclear, membranes. 
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Filter the tissue suspension through several nylon filters into an ice-cold beaker. The 
first filtration is through a 250-micron membrane; the second is through an 85-micron 
membrane; the third is through a 50-micron membrane; and the fourth is through a 
20-micron membrane. Use a large funnel to hold the filters. Filtration can be sped up 
by gently squeezing the liquid through the filters. 

Centrifuge the filtrate at 1200 x g for 20 min. at 4°C to pellet the nuclei. 

Discard the dark green supernatant. The pellet will have several layers to it. One is 
starch; it is white and gritty. The nuclei are gray and soft. In the early steps, there 
may be a dark green and somewhat viscous layer of chloroplasts. 

Wash the pellets in about 25 ml cold H buffer (with Triton X-100) and resuspend by 
swirling gently and pipetting. After the pellets are resuspended. 

Pellet the nuclei again at 1200 - 1300 x g. Discard the supernatant. 

Repeat the wash 3-4 times until the supernatant has changed from a dark green to a 
pale green. This usually happens after 3 or 4 resuspensions. At this point, the pellet 
is typically grayish white and very slippery. The Triton X-100 in these repeated steps 
helps to destroy the chloroplasts and mitochondria that contaminate the prep. 

Resuspend the nuclei for a final time in a total of 15 ml of H buffer and transfer the 
suspension to a sterile 125 ml Erlenmeyer flask. 

Add 15 ml, dropwise, cold 2% Sarkosyl, 0.1 M Tris, 0.04 M EDTA solution (pH 9.5) 
while swirling gently. This lyses the nuclei. The solution will become very viscous. 

Add 30 grams of CsCl and gently swirl at room temperature until the CsCl is in 
solution. The mixture will be gray, white and viscous. 

Centrifuge the solution at 11,400 x g at 4°C for at least 30 min. The longer this spin 
is, the firmer the protein pellicle. 
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10. The result is typically a clear green supernatant over a white pellet, and (perhaps) 
under a protein pellicle. Carefully remove the solution under the protein pellicle and 
above the pellet. Determine the density of the solution by weighing 1 ml of solution 

5 and add CsCl if necessary to bring to 1.57 g/mL The solution contains dissolved 

solids (sucrose etc) and the refractive index alone will not be an accurate guide to 
CsCl concentration. 

11. Add 20 juil of 10 mg/ml EtBr per ml of solution. 

12. Centrifuge at 184,000 x g for 16 to 20 hours in a fixed-angle rotor. 

10 13. Remove the dark red supernatant that is at the top of the tube with a plastic transfer 
pipette and discard. Carefully remove the DNA band with another transfer pipette. 
The DNA band is usually visible in room light; otherwise, use a long wave UV light 
to locate the band. 

14. Extract the ethidium bromide with isopropanol saturated with water and salt. Once 
15 the solution is clear, extract at least two more times to ensure that all of the EtBr is 

gone. Be very gentle, as it is very easy to shear the DNA at this step. This extraction 
may take a while because the DNA solution tends to be very viscous. If the solution 
is too viscous, dilute it with TE. 

15. Dialyze the DNA for at least two days against several changes (at least three times) of 
2 0 TE (10 mM Tris, ImM EDTA, pH 8) to remove the cesium chloride. 

16. Remove the dialyzed DNA from the tubing. If the dialyzed DNA solution contains a 
lot of debris, centrifuge the DNA solution at least at 2500 x g for 10 min. and 
carefully transfer the clear supernatant to a new tube. Read the A260 concentration of 
the DNA. 



2 5 17. Assess the quality of the DNA by agarose gel electrophoresis (1% agarose gel) of the 
DNA. Load 50 ng and 100 ng (based on the OD reading) and compare it with known 
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and good quality DNA. Undigested lambda DNA and a lambda-Hindlll-digested 
DNA are good molecular weight makers. 

Protocol for Digestion of Genomic DNA 

Protocol : 

1. The relative amounts of DNA for different crop plants that provide approximately a 
balanced number of genome equivalent is given in Table 3. Note that due to the size 
of the wheat genome, wheat DNA will be underrepresented. Lambda DNA provides 
a useful control for complete digestion. 

2. Precipitate the DNA by adding 3 volumes of 100% ethanol. Incubate at -20 °C for at 
least two hours. Yeast DNA can be purchased and made up at the necessary 
concentration, therefore no precipitation is necessary for yeast DNA. 

3. Centrifuge the solution at 11,400 x g for 20 min. Decant the ethanol carefully (be 
careful not to disturb the pellet). Be sure that the residual ethanol is completely 
removed either by vacuum desiccation or by carefully wiping the sides of the tubes 
with a clean tissue. 

4. Resuspend the pellet in an appropriate volume of water. Be sure the pellet is fully 
resuspended before proceeding to the next step. This may take about 30 min. 

5. Add the appropriate volume of 10X reaction buffer provided by the manufacturer of 
the restriction enzyme to the resuspended DNA followed by the appropriate volume 
of enzymes. Be sure to mix it properly by slowly swirling the tubes. 

6. Set-up the lambda digestion-control for each DNA that you are digesting. 

7. Incubate both the experimental and lambda digests overnight at 37°C. Spin down 
condensation in a microfuge before proceeding. 



8. 



After digestion, add 2 [il of loading dye (typically 0.25% bromophenol blue, 0.25% 
xylene cyanol in 15% Ficoll or 30% glycerol) to the lambda-control digests and load 
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in 1% TPE-agarose gel (TPE is 90 mM Tris-phosphate, 2 mM EDTA, pH 8). If the 
lambda DNA in the lambda control digests are completely digested, proceed with the 
precipitation of the genomic DNA in the digests. 

9. Precipitate the digested DNA by adding 3 volumes of 100% ethanol and incubating in 
5 -20°C for at least 2 hours (preferably overnight). 

EXCEPTION: Arabidopsis and yeast DNA are digested in an appropriate volume; 
they don't have to be precipitated. 

10. Resuspend the DNA in an appropriate volume of TE (e.g., 22 \x\ x 50 blots = 1100 (il) 
and an appropriate volume of 10X loading dye (e.g., 2.4 \x\ x 50 blots = 120 Be 

1 o careful in pipetting the loading dye - it is viscous. Be sure you are pipetting the 

correct volume. 



Table 3 

Some guide points in digesting genomic DNA. 









Genome 


Amount 








Equivalent to 2 


of DNA 






Size Relative to 


|j.g Arabidopsis 


per blot 




Genome 


Arabidopsis 


DNA 




Species 


Size 








Arabidopsis 


120 Mb 


IX 


IX 




Brassica 


1,100 Mb 


9.2X 


0.54X 


10 ug 


Corn 


2,800 Mb 


23.3X 


0.43X 


20 ng 


Cotton 


2,300 Mb 


19.2X 


0.52X 


20 ug 


Oat 


11,300 Mb 


94X 


0.11X 


20 ^g 


Rice 


400 Mb 


3.3X 


0.75X 


5 Jig 


Soybean 


1,100 Mb 


9.2X 


0.54X 


10 jig 


Sugarbeet 


758 Mb 


6.3X 


0.8X 


10 ^g 


Sweetclover 


1,100 Mb 


9.2X 


0.54X 


10 jag 


Wheat 


16,000 Mb 


133X 


0.08X 


20 ug 
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Yeast 


15 Mb 


0.12X 


IX 


0.25 jig 



Protocol for Southern Blot Analysis 

The digested DNA samples are electrophoresed in 1% agarose gels in lx TPE buffer. 
Low voltage; overnight separations are preferred. The gels are stained with EtBr and 
photographed. 



1. For blotting the gels, first incubate the gel in 0.25 N HC1 (with gentle shaking) for 
about 15 min. 

2. Then briefly rinse with water. The DNA is denatured by 2 incubations. Incubate 
(with shaking) in 0.5 M NaOH in 1.5 M NaCl for 15 min. 

3. The gel is then briefly rinsed in water and neutralized by incubating twice (with 
shaking) in 1.5 M Tris pH 7.5 in 1.5 M NaCl for 15 min. 

4. A nylon membrane is prepared by soaking it in water for at least 5 min, then in 6X 
SSC for at least 15 min. before use. (20x SSC is 175.3 g NaCl, 88.2 g sodium citrate 
per liter, adjusted to pH 7.0.) 

5. The nylon membrane is placed on top of the gel and all bubbles in between are 
removed. The DNA is blotted from the gel to the membrane using an absorbent 
medium, such as paper toweling and 6x SCC buffer. After the transfer, the membrane 
may be lightly brushed with a gloved hand to remove any agarose sticking to the 
surface. 



6. The DNA is then fixed to the membrane by UV crosslinking and baking at 80 C. The 
membrane is stored at 4°C until use. 

B. Protocol for PCR Amplification of Genomic Fragments in Arabidopsis 



Amplification procedures : 

1. Mix the following in a 0.20 ml PCR tube or 96-well PCR plate: 
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Volume 


Stock 


Final Amount or Cone. 


0.5 fj-1 


-10 ng/ul genomic DNA 1 


5 ng 


2.5 ul 


10X PCR buffer 


20 mM Tris, 50 mM KC1 


0.75 ul 


50 mM MgCl 2 


1.5 mM 


1 ul 


10 pmol/ul Primer 1 (Forward) 


10 pmol 


1 ul 


10 pmol/ul Primer 2 (Reverse) 


10 pmol 


0.5 jjI 


5 mM dNTPs 


0.1 mM 


0.1 ul 


5 units/ul Platinum Taq™ (Life 
Technologies, Gaithersburg, MD) 
DNA Polymerase 


1 units 


(to 25 jLtl) 


Water 





2. The template DNA is amplified using a Perkin Elmer 9700 PCR machine: 



1) 94°C for 10 min. followed by 



2) 

5 cycles: 


2) 

5 cycles: 


4) 

25 cycles: 


94 °C- 30 sec 
62 °C- 30 sec 
72 °C- 3 min 


94 °C - 30 sec 
58 °C - 30 sec 
72 °C - 3 min 


94 °C - 30 sec 
53 °C- 30 sec 
72 °C - 3 min 



1 Arabidopsis DNA is used in the present experiment, but the procedure is a general one. 
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5) 72°C for 7 min. Then the reactions are stopped by chilling to 4°C. 
The procedure can be adapted to a multi-well format if necessary. 
Quantification and Dilution of PCR Products: 

1. The product of the PCR is analyzed by electrophoresis in a 1% agarose gel. A 
linearized plasmid DNA can be used as a quantification standard (usually at 50, 100, 
200, and 400 ng). These will be used as references to approximate the amount of 
PCR products. Hindlll-digested Lambda DNA is useful as a molecular weight 
marker. The gel can be run fairly quickly; e.g., at 100 volts. The standard gel is 
examined to determine that the size of the PCR products is consistent with the 
expected size and if there are significant extra bands or smeary products in the PCR 
reactions. 

2. The amounts of PCR products can be estimated on the basis of the plasmid standard. 

3. For the small number of reactions that produce extraneous bands, a small amount of 
DNA from bands with the correct size can be isolated by dipping a sterile 10-fal tip 
into the band while viewing though a UV Transilluminator. The small amount of 
agarose gel (with the DNA fragment) is used in the labeling reaction. 

C. Protocol for PCR-DIG-Labeling of DNA 

Solutions : 

Reagents in PCR reactions (diluted PCR products, 10X PCR Buffer, 50 mM MgCl 2? 5 
U/jjiI Platinum Taq Polymerase, and the primers) 

10X dNTP + DIG-ll-dUTP [1:5]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1.65 
mM dTTP, 0.35 mM DIG-ll-dUTP) 

10X dNTP + DIG-ll-dUTP [1:10]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1,81 
mM dTTP, 0.19 mM DIG-ll-dUTP) 
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10X dNTP + DIG-ll-dUTP [1:15]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1.875 
mM dTTP, 0.125 mM DIG-ll-dUTP) 



TE buffer (10 mM Tris, 1 mM EDTA, pH 8) 

Maleate buffer: In 700 ml of deionized distilled water, dissolve 11.61 g maleic acid 
and 8.77 g NaCl. Add NaOH to adjust the pH to 7.5. Bring the volume to 1 L. Stir 
for 15 min. and sterilize. 



10% blocking solution: In 80 ml deionized distilled water, dissolve 1.16g maleic 
acid. Next, add NaOH to adjust the pH to 7.5. Add 10 g of the blocking reagent 
powder (Boehringer Mannheim, Indianapolis, IN, Cat. no. 1096176). Heat to 60 C 
1 o while stirring to dissolve the powder. Adjust the volume to 100 ml with water. Stir 

and sterilize. 

1% blocking solution: Dilute the 10% stock to 1% using the maleate buffer. 



15 



Buffer 3 (100 mM Tris, 100 mM NaCl, 50 mM MgCl 2 , pH9.5). Prepared from 
autoclaved solutions of 1M Tris pH 9.5, 5 M NaCl, and 1 M MgCl 2 in autoclaved 
distilled water. 
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Procedure : 

1. PCR reactions are performed in 25 pi volumes containing: 



10 



15 



PCR buffer 
MgCl 2 

10X dNTP + DIG-ll-dUTP 
Platinum Taq™ Polymerase 
10 pg probe DNA 
10 pmol primer 1 



IX 

1.5 mM 

IX (please see the note below) 
1 unit 



Note: 



1 0X dNTP + PTG-1 1 -dlJTP (1:5) 



Use for : 
< 1 kb 



10X dNTP + DIG-ll-dUTP (1:10) 
10X dNTP + DIG-ll-dUTP (1:15) 



1 kb to 1.8 kb 
>1.8kb 



2. The PCR reaction uses the following amplification cycles: 
1) 94°C for 10 min. 



2) 




3) 




4) 


5 cycles: 




5 cycles: 




25 cycles: 


95°C - 


30 sec 


95°C - 


30 sec 


95°C -30 sec 


61°C - 


1 min 


59°C - 


1 min 


51°C - 1 min 


73°C - 


5 min 


75°C - 


5 min 


73°C - 5 min 



5) 72°C for 8 min. The reactions are terminated by chilling to 4°C (hold). 

The products are analyzed by electrophoresis- in a 1% agarose gel, comparing to an 
aliquot of the unlabelled probe starting material. 



The amount of DIG-labeled probe is determined as follows: 
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Make serial dilutions of the diluted control DNA in dilution buffer (TE: 10 mM Tris 
and 1 mM EDTA, pH 8) as shown in the following table: 



DIG-labeled control 
DNA starting cone. 


Stepwise Dilution 


Final Cone. (Dilution 
Name) 


5 ng/ul 


1 ul in 49 ul TE 


100 pg/ul (A) 


100 pg/ul (A) 


25 in 25 ul TE 


50 pg/ul (B) 


50 pg/ul (B) 


25 ul in 25 ul TE 


25 pg/ul (C) 


25 pg/ul (C) 


20 |al in 30 |xl TE 


10 pg/ul (D) 



Serial deletions of a DIG-labeled standard DNA ranging from 100 pg to 10 pg 
are spotted onto a positively charged nylon membrane, marking the membrane 
lightly with a pencil to identify each dilution. 

Serial dilutions (e.g., 1:50, 1:2500, 1:10,000) of the newly labeled DNA probe 
are spotted. 

The membrane is fixed by UV crosslinking. 

The membrane is wetted with a small amount of maleate buffer and then 
incubated in 1% blocking solution for 15 min at room temp. 

The labeled DNA is then detected using alkaline phosphatase conjugated anti- 
DIG antibody (Boehringer Mannheim, Indianapolis, IN, cat. no. 1093274) and 
an NBT substrate according to the manufacture's instruction. 



a. 



10 



c. 
d. 



15 



Spot intensities of the control and experimental dilutions are then compared to 
estimate the concentration of the PCR-DIG-labeled probe. 
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D. Prehybridization and Hybridization of Southern Blots 

Solutions : 

100% Formamide purchased from Gibco 

20X SSC (IX = 0.15 M NaCl ? 0,015 M Na 3 citrate) 

per L: 175 g NaCl 

87.5gNa 3 citrate2H 2 0 

20% Sarkosyl (N-lauroyl-sarcosine) 
20% SDS (sodium dodecyl sulphate) 

10% Blocking Reagent: In 80 ml deionized distilled water, dissolve 1.16 g maleic 

acid. Next, add NaOH to adjust the pH to 7.5. Add 10 g of the blocking reagent 
powder. Heat to 60°C while stirring to dissolve the powder. Adjust the volume 
to 100 ml with water. Stir and sterilize. 



Prehybridization Mix: 



Final 

Concentration 


Components 


Volume 
(per 100 ml) 


Stock 


50% 


Formamide 


50 ml 


100% 


5X 


SSC 


25 ml 


20X 


0.1% 


Sarkosyl 


0.5 ml 


20% 


0.02% 


SDS 


0.1ml 


20% 


2% 


Blocking Reagent 


20 ml 


10% 




Water 


4.4 ml 





General Procedures : 

1. Place the blot in a heat-sealable plastic bag and add an appropriate volume of 

prehybridization solution (30 ml/100cm 2 ) at room temperature. Seal the bag with a 
heat sealer, avoiding bubbles as much as possible. Lay down the bags in a large 
plastic tray (one tray can accommodate at least 4-5 bags). Ensure that the bags are 
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lying flat in the tray so that the prehybridization solution is evenly distributed 
throughout the bag. Incubate the blot for at least 2 hours with gentle agitation using a 
waver shaker. 

2. Denature DIG-labeled DNA probe by incubating for 10 min. at 98°C using the PCR 
machine and immediately cool it to 4°C. 

3. Add probe to prehybridization solution (25 ng/ml; 30 ml = 750 ng total probe) and 
mix well but avoid foaming. Bubbles may lead to background. 

4. Pour off the prehybridization solution from the hybridization bags and add new 
prehybridization and probe solution mixture to the bags containing the membrane. 

5. Incubate with gentle agitation for at least 16 hours. 

6. Proceed to medium stringency post-hybridization wash: 

Three times for 20 min. each with gentle agitation using IX SSC, 1% SDS at 60°C 

All wash solutions must be prewarmed to 60°C. Use about 100 ml of wash solution 
per membrane. 

To avoid background keep the membranes fully submerged to avoid drying in spots; 
agitate sufficiently to avoid having membranes stick to one another. 

7. After the wash, proceed to immunological detection and CSPD development. 

E. Procedure for Immunological Detection with CSPD 

Solutions : 

Buffer 1: Maleic acid buffer (0.1 M maleic acid, 0.15 M NaCl; 

adjusted to pH 7.5 with NaoH) 



Washing buffer: 



Maleic acid buffer with 0.3% (v/v) Tween 20. 
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Blocking stock solution 



10% blocking reagent in buffer 1. Dissolve (10X 
concentration): blocking reagent powder (Boehringer 
Mannheim, Indianapolis, IN, cat. no. 1096176) by 
constantly stirring on a 65°C heating block or heat in a 

o 

microwave, autoclave and store at 4 C. 



Buffer 2 



(IX blocking solution): 



Dilute the stock solution 1:10 in Buffer 1. 



Detection buffer: 



0.1 M Tris, 0.1 M Nad, pH 9.5 



Procedure : 

1. After the post-hybridization wash the blots are briefly rinsed (1-5 min.) in the maleate 
washing buffer with gentle shaking. 

2. Then the membranes are incubated for 30 min. in Buffer 2 with gentle shaking. 

3. Anti-DIG-AP conjugate (Boehringer Mannheim, Indianapolis, IN, cat. no. 1093274) 
at 75 mU/ml (1:10,000) in Buffer 2 is used for detection. 75 ml of solution can be 
used for 3 blots. 

4. The membrane is incubated for 30 min. in the antibody solution with gentle shaking. 

5. The membrane are washed twice in washing buffer with gentle shaking. About 250 
mis is used per wash for 3 blots. 

6. The blots are equilibrated for 2-5 min in 60 ml detection buffer. 

7. Dilute CSPD (1:200) in detection buffer. (This can be prepared ahead of time and 
stored in the dark at 4°C). 

The following steps must be done individually. Bags (one for detection and one for 
exposure) are generally cut and ready before doing the following steps. 
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8. The blot is carefully removed from the detection buffer and excess liquid removed 
without drying the membrane. The blot is immediately placed in a bag and 1.5 ml of 
CSPD solution is added. The CSPD solution can be spread over the membrane. 
Bubbles present at the edge and on the surface of the blot are typically removed by 
gentle rubbing. The membrane is incubated for 5 min. in CSPD solution. 

9. Excess liquid is removed and the membrane is blotted briefly (DNA side up) on 
Whatman 3 MM paper. Do not let the membrane dry completely. 

10. Seal the damp membrane in a hybridization bag and incubate for 10 min at 37 C to 
enhance the luminescent reaction. 

11. Expose for 2 hours at room temperature to X-ray film. Multiple exposures can be 
taken. Luminescence continues for at least 24 hours and signal intensity increases 
during the first hours. 

Exam ple 3: Transformation of Carrot Cells 

Transformation of plant cells can be accomplished by a number of methods, as 
described above. Similarly, a number of plant genera can be regenerated from tissue culture 
following transformation. Transformation and regeneration of carrot cells as described herein 
is illustrative. 

Single cell suspension cultures of carrot (Daucus carota) cells are established from 
hypocotyls of cultivar Early Nantes in B 5 growth medium (OX. Gamborg et al., Plant 
Physiol 45:372 (1970)) plus 2,4-D and 15 mM CaCl 2 (B 5 -44 medium) by methods known in 
the art. The suspension cultures are subcultured by adding 10 ml of the suspension culture to 
40 ml of B 5 -44 medium in 250 ml flasks every 7 days and are maintained in a shaker at 150 
rpm at 27 °C in the dark. 

The suspension culture cells are transformed with exogenous DNA as described by Z. 
Chen et al. Plant Mol Bio. 36:163 (1998). Briefly, 4-days post-subculture cells are incubated 
with cell wall digestion solution containing 0.4 M sorbitol, 2% driselase, 5mM MES (2-[N- 
Morpholino] ethanesulfonic acid) pH 5.0 for 5 hours. The digested cells are pelleted gently 
at 60 xg for 5 min. and washed twice in W5 solution containing 154 mM NaCl, 5 mM KC1, 
125 mM CaCl 2 and 5mM glucose, pH 6.0. The protoplasts are suspended in MC solution 
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containing 5 mM MES, 20 mM CaCl 2 , 0.5 M mannitol, pH 5.7 and the protoplast density is 
adjusted to about 4 x 10 6 protoplasts per ml. 

15-60 jag of plasmid DNA is mixed with 0.9 ml of protoplasts. The resulting 
suspension is mixed with 40% polyethylene glycol (MW 8000, PEG 8000), by gentle 
inversion a few times at room temperature for 5 to 25 min. Protoplast culture medium known 
in the art is added into the PEG-DNA-protoplast mixture. Protoplasts are incubated in the 
culture medium for 24 hour to 5 days and cell extracts can be used for assay of transient 
expression of the introduced gene. Alternatively, transformed cells can be used to produce 
transgenic callus, which in turn can be used to produce transgenic plants, by methods known 
in the art. See, for example, Nomura and Komamine, Plu Phys. 79:988-991 (1985), 
Identification and Isolation of Single Cells that Produce Somatic Embryos in Carrot 
Suspension Cultures. 

An additional deposit of an E. coli Library, E. co/*LibA021800, was made at the 
American Type Culture Collection in Manassas, Virginia, USA on February 22, 2000 to meet 
the requirements of Budapest Treaty for the international recognition of the deposit of 
microorganisms. 

The invention being thus described, it will be apparent to one of ordinary skill in the 
art that various modifications of the materials and methods for practicing the invention can be 
made. Such modifications are to be considered within the scope of the invention as defined 
by the following claims. 

Each of the references from the patent and periodical literature cited herein is hereby 
expressly incorporated in its entirety by such citation. 
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CLAIMS 

What is claimed is: 

1. An isolated nucleic acid molecule comprising a nucleic acid having a 
nucleotide sequence which encodes an amino acid sequence exhibiting at least 40% sequence 
identity to an amino acid sequence encoded by 

(a) a nucleotide sequence described in Tables 1 and/or 2 or a fragment thereof; 
5 or 

(b) a complement of a nucleotide sequence shown in Tables 1 and/or 2 or a 
fragment thereof. 

2. An isolated nucleic acid molecule comprising a nucleic acid having a 
nucleotide sequence which exhibits at least 65% sequence identity to 

(a) a nucleotide sequence shown in Tables 1 and/or 2 or a fragment thereof; or 

(b) a complement of a nucleotide sequence described in Tables 1 and/or 2 or a 
5 fragment thereof. 

3. An isolated nucleic acid molecule comprising a nucleic acid having a 
nucleotide sequence which exhibits at least 65% sequence identity to a gene comprising 

(a) a nucleotide sequence shown in Tables 1 and/or 2 or a fragment thereof; or 

(b) a complement of a nucleotide sequence described in Tables 1 and/or 2 or a 
5 fragment thereof. 

4. An isolated nucleic acid molecule which is the reverse of the isolated 
nucleotide sequence according to claim 1, such that the reverse nucleotide sequence has a 
sequence order which is the reverse of the sequence order of said isolated nucleotide 
sequence according to claim 1. 

5. An isolated nucleic acid molecule comprising a nucleic acid capable of 
hybridizing to a nucleic acid having a sequence selected from the group consisting of: 

(a) a nucleotide sequence which is shown in Tables 1 and/or 2; and 

(b) a nucleotide sequence which is complementary to a nucleotide sequence 
5 shown in Tables 1 and/or 2; 

under conditions that permit formation of a nucleic acid duplex at a temperature from about 
40°C and 48°C below the melting temperature of the nucleic acid duplex. 

6. The nucleic acid molecule according to claim 1, wherein said nucleic acid 
comprises an open reading frame. 
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7. The isolated nucleic acid molecule of claim 1, wherein said nucleic acid is 
capable of functioning as a promoter, a 3' end termination sequence, an untranslated region 
(UTR) ? or as a regulatory sequence. 

8. The isolated nucleic acid molecule of claim 7, wherein said nucleic acid is a 
promoter and comprises a sequence selected from the group consisting of a TATA box 
sequence, a CAAT box sequence, a motif of GCAATCG or any transcription-factor binding 
sequence, and any combination thereof. 

9. The isolated nucleic acid molecule of claim 7, wherein the nucleic acid 
sequence is a regulatory sequence which is capable of promoting seed-specific expression, 
embryo-specific expression, ovule-specific expression, tapetum-specific expression or root- 
specific expression of a sequence or any combination thereof. 

10. A vector construct comprising a nucleic acid molecule according to claim 1, 
wherein said nucleic acid molecule is heterologous to any element in said vector construct. 

11. A vector construct comprising: 

(a) a first nucleic acid having a regulatory sequence capable of causing 
transcription and/or translation; and 

(b) a second nucleic acid having the sequence of the isolated nucleic acid 
5 molecule according to claim 1; 

wherein said first and second nucleic acids are operably linked and 

wherein said second nucleic acid is heterologous to any element in said vector construct. 

12. The vector construct according to claim 11, wherein said first nucleic acid is 
native to said second nucleic acid. 

13. The vector construct according to claim 11, wherein said first nucleic acid is 
heterologous to said second nucleic acid. 

14. A vector construct comprising: 

(c) a first nucleic acid having the sequence of the isolated nucleic acid 
molecule according to claim 7; and 

(d) a second nucleic acid; 

5 wherein said first and second nucleic acids are operably linked and 

wherein said first nucleic acid is heterologous to any element in said vector construct. 

15. The vector construct according to claim 14, wherein said first nucleic acid is 
native to said second nucleic acid. 

16. The vector construct according to claim 14, wherein said first nucleic acid is 
heterologous to said second nucleic acid. 
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17. A host cell comprising an isolated nucleic acid molecule according to claim 1, 
wherein said nucleic acid molecule is flanked by exogenous sequence. 

18. A host cell comprising a vector construct of claim 10. 

19. A host cell comprising a vector construct of claim 11. 

20. A host cell comprising a vector construct of claim 12. 

21. A host cell comprising a vector construct of claim 13. 

22. A host cell comprising a vector construct of claim 14. 

23. A host cell comprising a vector construct of claim 15. 

24. A host cell comprising a vector construct of claim 16. 

25. An isolated polypeptide comprising an amino acid sequence 

(a) exhibiting at least 40% sequence identity of an amino acid sequence encoded 
by a sequence shown in Tables 1 and/or 2 or a fragment thereof; and 

(b) capable of exhibiting at least one of the biological activities of the polypeptide 
encoded by said nucleotide sequence shown in Tables 1 and/or 2 or a fragment 
thereof. 

26. The isolated polypeptide of claim 25, wherein said amino acid sequence 
exhibits at least 75% sequence identity to an amino acid sequence encoded by a 
sequence shown in Tables 1 and/or 2 or a fragment thereof. 

27. The isolated polypeptide of claim 25, wherein said amino acid sequence 
exhibits at least 85% sequence identity to an amino acid sequence encoded by a 
sequence shown in Tables 1 and/or 2 or a fragment thereof. 

28. The isolated polypeptide of claim 25, wherein said amino acid sequence 
exhibits at least 90% sequence identity to an amino acid sequence encoded by a 
sequence shown in Tables 1 and/or 2 or a fragment thereof, 

29. An antibody capable of binding the isolated polypeptide of claim 25. 

30. A method of introducing an isolated nucleic acid into a host cell comprising: 

(a) providing an isolated nucleic acid molecule according to claim 1; and 

(b) contacting said isolated nucleic with said host cell under conditions that 
permit insertion of said nucleic acid into said host cell. 

31. A method of transforming a host cell which comprises contacting a host cell 
with a vector construct according to claim 10. 

32. A method of transforming a host cell which comprises contacting a host cell 
with a vector construct according to claim 11. 
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33. A method of transforming a host cell which comprises contacting a host cell 
with a vector construct according to claim 12. 

34. A method of transforming a host cell which comprises contacting a host cell 
with a vector construct according to claim 13. 

35. A method of transforming a host cell which comprises contacting a host cell 
with a vector construct according to claim 14. 

36. A method of transforming a host cell which comprises contacting a host cell 
with a vector construct according to claim 15. 

37. A method of transforming a host cell which comprises contacting a host cell 
with a vector construct according to claim 16. 

38. A method of modulating transcription and/or translation of a nucleic acid in a 
host cell comprising: 

(a) providing the host cell of claim 17; and 

(b) culturing said host cell under conditions that permit transcription or 
translation. 

39. A method for detecting a nucleic acid in a sample which comprises: 

(a) providing an isolated nucleic acid molecule according to claim 1; 

(b) contacting said isolated nucleic acid molecule with a sample under 
conditions which permit a comparison of the sequence of said isolated 
nucleic acid molecule with the sequence of DNA in said sample; and 

(c) analyzing the result of said comparison. 

40. The method according to claim 39, wherein said isolated nucleic acid 
molecule and said sample are contacted under conditions which permit the formation 
of a duplex between complementary nucleic acid sequences. 

41. A plant or cell of a plant which comprises a nucleic acid molecule according 
to claim 1 which is exogenous to said plant or plant cell. 

42. A plant or cell of a plant which comprises a nucleic acid molecule according 
to claim 1, wherein said nucleic acid molecule is heterologous to said plant or said 
cell of a plant. 

43. A plant or cell of a plant which has been transformed with a nucleic acid 
molecule according to claim 1. 

44. A plant or cell of a plant which comprises a vector construct according to 
claim 10. 
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45. A plant or cell of a plant which has been transformed with a vector construct 

according to claim 10. 

46. A plant which has been regenerated from a plant cell according to claim 41. 

47. A plant which has been regenerated from a plant cell according to claim 42. 

48. A plant which has been regenerated from a plant cell according to claim 43. 

49. A plant which has been regenerated from a plant cell according to claim 44. 

50. A plant which has been regenerated from a plant cell according to claim 45. 
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1085 

ABSTRACT OF THE DISCLOSURE 

The present invention provides DNA molecules that constitute fragments of the 
genome of a plant, and polypeptides encoded thereby. The DNA molecules are useful for 
specifying a gene product in cells, either as a promoter or as a protein coding sequence or as 
5 an UTR or as a 3 ? termination sequence, and are also useful in controlling the behavior of a 
gene in the chromosome, in controlling the expression of a gene or as tools for genetic 
mapping, recognizing or isolating identical or related DNA fragments, or identification of a 
particular individual organism, or for clustering of a group of organisms with a common trait. 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

31493 

15490 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1 

- Ceres seq_id 1580171 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 

- Ceres seq_id 1580172 

- Location of start within SEQ ID NO 1: at 226 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Zinc finger, C3HC4 type (RING finger) 

- Location within SEQ ID NO 2 : from 75 to 116 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1 

- gi No. 3790575 

- Description: (AF078825) RING-H2 finger protein RHA3b [Arabidopsis 

thaliana] 

- % Identity: 100 

- Alignment Length: 159 

- Location of Alignment in SEQ ID NO 2: from 4 to 162 

Maximum Length Sequence: 

related to: 
Clone IDs: 
20782 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3 

- Ceres seq__id 1580240 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 

- Ceres seq__id 1580241 

- Location of start within SEQ ID NO 3: at 34 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 

- gi No. 2285792 

- Description: (AB004568) cyanase [Arabidopsis thaliana] 
>gi I 3287503 [dbj I BAA31224 | (AB015748) cyanase [Arabidopsis thaliana] 

- % Identity: 98.2 

- Alignment Length: 168 

- Location of Alignment in SEQ ID NO 4: from 1 to 168 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 5 

- Ceres seq_id 1580242 

- Location of start within SEQ ID NO 3: at 244 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 3 

- gi No. 2285792 
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- Description: (AB004568) cyanase [Arabidopsis thaliana] 
>gi t 3287503 icibj i BAA31224 | (AB015748) cyanase [Arabidopsis thaliana] 

- % Identity: 98.2 

- Alignment Length: 168 

- Location of Alignment in SEQ ID NO 5: from 1 to 98 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 6 

- Ceres seq__id 1580243 

- Location of start within SEQ ID NO 3: at 247 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 

- gi No. 2285792 

- Description: (AB004568) cyanase [Arabidopsis thaliana] 
>gi t 3287503 j dbj 1 BAA31224 | (AB015748) cyanase [Arabidopsis thaliana] 

- % Identity: 98.2 

- Alignment Length: 168 

- Location of Alignment in SEQ ID NO 6: from 1 to 97 

Maximum Length Sequence : 

related to: 
Clone IDs: 
25200 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 7 

- Ceres seq_id 1580263 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 

- Ceres seq_id 1580264 

- Location of start within SEQ ID NO 7: at 54 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- lactate/malate dehydrogenase 

- Location within SEQ ID NO 8: from 101 to 420 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 

- gi No. 2827076 

- Description: (AF020269) malate dehydrogenase precursor [Medicago 

sativa] 

- % Identity: 81 

- Alignment Length: 4 44 

- Location of Alignment in SEQ ID NO 8 : from 1 to 443 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 9 

- Ceres seq_id 1580265 

- Location of start within SEQ ID NO 7 : at 60 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- lactate/malate dehydrogenase 

- Location within SEQ ID NO 9: from 99 to 418 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 6 

- gi No. 2827076 
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- Description: (AF020269) malate dehydrogenase precursor [Medicago 

sativa] 

- % Identity: 81 

- Alignment Length: 4 44 

- Location of Alignment in SEQ ID NO 9: from 1 to 441 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 10 

- Ceres seq_id 1580266 

- Location of start within SEQ ID NO 7 : at 381 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- lactate/malate dehydrogenase 

- Location within SEQ ID NO 10: from 1 to 311 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 

- gi No. 2827076 

- Description: (AF0202 69) malate dehydrogenase precursor [Medicago 

sativa] 

- % Identity: 81 

- Alignment Length: 444 

- Location of Alignment in SEQ ID NO 10: from 1 to 334 

Maximum Length Sequence: 

related to: 
Clone IDs: 
37242 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 11 

- Ceres seq_id 1580285 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 12 

- Ceres seq_id 1580286 

- Location of start within SEQ ID NO 11: at 216 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

- Cytidine and deoxycytidylate deaminase zinc-binding region 

- Location within SEQ ID NO 12: from 80 to 119 aa . 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 13 

- Ceres seq_id 1580287 

- Location of start within SEQ ID NO 11: at 423 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Cytidine and deoxycytidylate deaminase zinc-binding region 

- Location within SEQ ID NO 13: from 11 to 50 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 
27909 
17864 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 14 

- Ceres seq_id 1580305 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 15 

- Ceres seq_id 1580306 

- Location of start within SEQ ID NO 14: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Peroxidase 

- Location within SEQ ID NO 15: from 64 to 345 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 

- gi No. 1890319 

- Description: (Y117 92) peroxidase ATP27a [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 204 

- Location of Alignment in SEQ ID NO 15: from 142 to 345 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 16 

- Ceres seq_id 1580307 

- Location of start within SEQ ID NO 14: at 47 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Peroxidase 

- Location within SEQ ID NO 16: from 49 to 330 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 9 

- gi No. 1890319 

- Description: (Y11792) peroxidase ATP27a [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 204 

- Location of Alignment in SEQ ID NO 16: from 127 to 330 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 

- Ceres seq_id 1580308 

- Location of start within SEQ ID NO 14: at 71 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Peroxidase 

- Location within SEQ ID NO 17: from 41 to 322 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 

- gi No. 1890319 

- Description: (Y11792) peroxidase ATP27a [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 204 

- Location of Alignment in SEQ ID NO 17: from 119 to 322 

Maximum Length Sequence: 

related to: 
Clone IDs: 
21920 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 18 

- Ceres seq_id 1580328 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 19 

- Ceres seq_id 1580329 

- Location of start within SEQ ID NO 18: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

{Dp) Related Amino Acid Sequences 

- Alignment No. 11 

- gi No. 2454182 

- Description: (U80185) pyruvate dehydrogenase El alpha subunit 
[Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 72 

- Location of Alignment in SEQ ID NO 19: from 1 to 72 

Maximum Length Sequence: 

related to: 
Clone IDs: 
35447 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 20 

- Ceres seq_id 1580388 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 21 

- Ceres seq_id 1580389 

- Location of start within SEQ ID NO 20: at 133 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Cystatin domain 

- Location within SEQ ID NO 21: from 87 to 141 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 22 

- Ceres seq_id 1580390 

- Location of start within SEQ ID NO 20: at 142 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cystatin domain 

- Location within SEQ ID NO 22: from 84 to 138 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

34004 

6247 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 23 

- Ceres seq_id 1580426 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 24 

- Ceres seq_id 1580427 
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- Location of start within SEQ ID NO 23: at 66 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin family 

- Location within SEQ ID NO 24: from 1 to 76 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 

- gi No. 99772 

- Description: ubiquitin 81-aa extension protein 2 - Arabidopsis 
thaliana >gij 166936 (J05540) ubiquitin extension protein (UBQ6) [Arabidopsis 
thaliana] >gi I 3522953 1 gb | AAC34235 . 1 I {AC004411) ubiquitin extension protein 
(UBQ6) [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 157 

- Location of Alignment in SEQ ID NO 24: from 1 to 157 

Maximum Length Sequence: 

related to: 
Clone IDs: 
25313 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 25 

- Ceres seq__id 1580481 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 26 

- Ceres seq_id 1580482 

- Location of start within SEQ ID NO 25: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 13 

- gi No. 2708813 

- Description: (AF037362) ATA20 [Arabidopsis thaliana] 
» % Identity: 79.1 

- Alignment Length: 217 

- Location of Alignment in SEQ ID NO 26: from 11 to 221 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 

- Ceres seq_id 1580483 

- Location of start within SEQ ID NO 25: at 33 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 14 

- gi No. 2708813 

- Description: (AF037362) ATA20 [Arabidopsis thaliana] 

- % Identity: 79.1 

- Alignment Length: 217 

- Location of Alignment in SEQ ID NO 27: from 1 to 211 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 28 

- Ceres seq_id 1580484 

- Location of start within SEQ ID NO 25: at 610 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Dehydrins 

- Location within SEQ ID NO 28: from 22 to 118 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 15 

- gi No. 2708813 

- Description: (AF037362) ATA20 [Arabidopsis thaliana] 

- % Identity: 99.2 

- Alignment Length: 119 

- Location of Alignment in SEQ ID NO 28: from 21 to 139 



Maximum Length Sequence: 

related to: 
Clone IDs: 

27178 

30369 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2 9 

- Ceres seq_id 1580511 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 30 
» Ceres seq__id 1580512 

- Location of start within SEQ ID NO 29: at 103 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S24e 

- Location within SEQ ID NO 30: from 24 to 108 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 16 

- gi No. 445612 

- Description: ribosomal protein S19 [Solarium tuberosum] 

- % Identity: 90.2 

- Alignment Length: 133 

- Location of Alignment in SEQ ID NO 30: from 1 to 133 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 31 

- Ceres seq_id 1580513 

- Location of start within SEQ ID NO 29: at 142 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S24e 

- Location within SEQ ID NO 31: from 11 to 95 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 17 

- gi No. 445612 

- Description: ribosomal protein S19 [Solanum tuberosum] 

- % Identity: 90.2 

- Alignment Length: 133 

- Location of Alignment in SEQ ID NO 31: from 1 to 120 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 32 

- Ceres seq_id 1580514 

- Location of start within SEQ ID NO 29: at 335 nt. 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 8 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

100290 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 33 

- Ceres seq__id 1580599 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 34 

- Ceres seq_id 1580600 

- Location of start within SEQ ID NO 33: at 247 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Kelch motif 

- Location within SEQ ID NO 34: from 7 to 58 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 18 

- gi No. 4587545 

- Description: (AC006577) Identical to mRNA gb 1X71915 from 
A.thaliana and contains the Kelch motif. ESTs gb|T04568, gb|T21762 f 
gb|H38255 / gb|Z18380, gb|R30291, gb|T21120, gb|R90517, gbiT75605 f gb|R30422, 
gb|T04568, gb|T21762, gb | . . . 

- % Identity: 99.4 

- Alignment Length: 160 

- Location of Alignment in SEQ ID NO 34: from 1 to 88 



Maximum Length Sequence: 

related to: 
Clone IDs: 

102475 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 35 

- Ceres seq_id 1580618 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3 6 

- Ceres seq_id 1580619 

- Location of start within SEQ ID NO 35: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 36: from 28 to 103 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 19 

- gi No. 2497886 

- Description: METALLOTHIONEIN- LIKE PROTEIN 2B (MT-2B) 

>gi 1 1361999 Ipirl i S57862 metallothionein 2b - Arabidopsis thaliana >gitl086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 36: from 28 to 104 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 37 

- Ceres seq_id 1580620 

- Location of start within SEQ ID NO 35: at 82 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 37: from 1 to 7 6 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 20 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999 ipir M S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 37: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 
11671 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 38 

- Ceres seq__id 1580629 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 39 

- Ceres seq_id 1580630 

- Location of start within SEQ ID NO 38: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 39: from 27 to 90 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 21 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999 Ipir | j S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 64 

- Location of Alignment in SEQ ID NO 39: from 27 to 90 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 0 

- Ceres seq_id 1580631 

- Location of start within SEQ ID NO 38: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 40: from 1 to 64 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 22 

- gi No. 2497886 
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- Description: METALLOTH I ONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999 ipir | | S57862 metallothionein 2b - Arabiciopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 64 

- Location of Alignment in SEQ ID NO 40: from 1 to 64 



Maximum Length Sequence : 

related to: 
Clone IDs: 

108202 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 41 

- Ceres seq_id 1580663 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 42 

- Ceres seq__id 1580664 

- Location of start within SEQ ID NO 41: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 3 

- Ceres seq_id 1580665 

- Location of start within SEQ ID NO 41: at 83 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 43: from 1 to 7 6 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 23 

- gi No. 2497886 

- Description: METALLOTH IONEIN-L IKE PROTEIN 2B (MT-2B) 

>gi i 1361999 Ipir MS57862 metallothionein 2b - Arabidopsis thaliana >giil086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 43: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

110506 

(Ac) cDNA Polynucleotide Sequence 
- Pat. Appln. SEQ ID NO 4 4 
_ ceres seq_id 1580681 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 5 

- Ceres seq_id 1580682 

- Location of start within SEQ ID NO 44: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 45: from 27 to 102 aa . 



(Dp) Related Amino Acid Sequences 
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- Alignment No. 24 

- gi No. 2497886 

- Description: METALLOT H I ONE I N- L I KE PROTEIN 2B (MT-2B) 

>gi i 1361999 Ipir M S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 45: from 27 to 103 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 6 

- Ceres seq__id 1580683 

- Location of start within SEQ ID NO 44: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s } 

- Metallothionein 

- Location within SEQ ID NO 46: from 1 to 76 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 25 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999 Ipir | | S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 46: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

112665 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 4 7 

- Ceres seq__id 1580697 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 48 

- Ceres seq__id 1580698 

- Location of start within SEQ ID NO 47: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 48: from 27 to 102 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 2 6 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 
>gitl3619991pir| 1S57862 metallothionein 2b - Arabidopsis thaliana >gijl086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 48: from 27 to 103 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 9 

- Ceres seq__id 1580699 

- Location of start within SEQ ID NO 47: at 81 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 49: from 1 to 7 6 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 27 

- gi No. 2497886 

- Description: METALLOTHIONEIN- LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999 jpir | | S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 4 9: from 1 to 7 7 



Maximum Length Sequence: 

related to: 
Clone IDs: 

113731 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 50 

- Ceres seq_id 1580706 
(B) Polypeptide Sequence 

- Patl" Appln. SEQ ID NO 51 

- Ceres seq_id 1580707 

- Location of start within SEQ ID NO 50: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 51: from 25 to 100 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 28 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 
>giil361999|pirj IS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 51: from 25 to 101 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 52 

- Ceres seq_id 1580708 

- Location of start within SEQ ID NO 50: at 74 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 52: from 1 to 76 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 2 9 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 
>gi!1361999|pir i IS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 52: from 1 to 77 
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Maximum Length Sequence: 

related to: 
Clone IDs: 
9633 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 53 
_ ceres seq_id 1580743 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 54 

- Ceres seq_id 1580744 

- Location of start within SEQ ID NO 53: at 63 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 30 

- gi No. 5734522 

- Description: (AJ245631) photosystem I subunit VI precursor 
[Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 145 

- Location of Alignment in SEQ ID NO 54: from 1 to 145 

Maximum Length Sequence: 

related to: 
Clone IDs: 

107735 

30013 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 55 

- Ceres seq^id 1580787 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 56 

- Ceres seq_id 1580788 

- Location of start within SEQ ID NO 55: at 139 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 31 

- gi No. 3513730 

- Description: (AF080118) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 77.6 

- Alignment Length: 203 

- Location of Alignment in SEQ ID NO 56: from 17 to 217 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 57 

- Ceres seq__id 1580789 

- Location of start within SEQ ID NO 55: at 193 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 32 

- gi No. 3513730 

- Description: (AF080118) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 77.6 
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- Alignment Length: 203 

- Location of Alignment in SEQ ID NO 57: from 1 to 199 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 58 

- Ceres seq_id 1580790 

- Location of start within SEQ ID NO 55: at 199 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 33 

- gi No. 3513730 

- Description: (AF080118) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 77.6 

- Alignment Length: 203 

- Location of Alignment in SEQ ID NO 58: from 1 to 197 



Maximum Length Sequence: 

related to: 
Clone IDs: 

115919 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 59 

- Ceres seq_id 1580805 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 60 

- Ceres seq_id 1580806 

- Location of start within SEQ ID NO 59: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 60: from 27 to 102 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 34 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999 1 pir | IS57862 metallothionein 2b - Arabidopsis thaliana >gi 11086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 60: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 61 

- Ceres seq__id 1580807 

- Location of start within SEQ ID NO 59: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 61: from 1 to 76 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 35 

- gi No. 2497886 
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- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999 |pir M S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 61: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

119262 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 62 

- Ceres seq_id 1580822 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 63 

- Ceres seq_id 1580823 

- Location of start within SEQ ID NO 62: at 7 4 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 63: from 1 to 76 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 36 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 

>gi I 1361999 Ipir | 1 S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 63: from 1 to 77 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 64 

- Ceres seq_id 1580824 

- Location of start within SEQ ID NO 62: at 348 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

14853 

32894 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 65 

- Ceres seq__id 1580870 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 66 

- Ceres seq_id 1580871 

- Location of start within SEQ ID NO 65: at 129 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Nucleosome assembly protein (NAP) 

- Location within SEQ ID NO 66: from 45 to 293 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 37 

- gi No. 549975 

- Description: (U12858) nucleosome assembly protein I-like protein; 
similar tomouse nap I, PIR Accession Number JS0707 [Arabidopsis thaliana] 

- % Identity: 7 6.3 

- Alignment Length: 37 8 

- Location of Alignment in SEQ ID NO 66: from 1 to 371 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 67 

- Ceres seq__id 1580872 

- Location of start within SEQ ID NO 65: at 135 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Nucleosome assembly protein (NAP) 

- Location within SEQ ID NO 67: from 43 to 291 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 38 

- gi No. 549975 

- Description: (U12858) nucleosome assembly protein I-like protein; 
similar tomouse nap I, PIR Accession Number JS0707 [Arabidopsis thaliana] 

- % Identity: 7 6.3 

- Alignment Length: 378 

- Location of Alignment in SEQ ID NO 67: from 1 to 369 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 68 

- Ceres seq__id 1580873 

- Location of start within SEQ ID NO 65: at 309 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Nucleosome assembly protein (NAP) 

- Location within SEQ ID NO 68: from 1 to 233 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 39 

- gi No. 549975 

- Description: (U12858) nucleosome assembly protein I-like protein 
similar tomouse nap I, PIR Accession Number JS0707 [Arabidopsis thaliana] 

- % Identity: 76.3 

- Alignment Length: 37 8 

- Location of Alignment in SEQ ID NO 68: from 1 to 311 

Maximum Length Sequence: 

related to: 
Clone IDs: 

145318 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 69 

- Ceres seq__id 1580898 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 0 

- Ceres seq_id 1580899 

- Location of start within SEQ ID NO 69: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- Metallothionein 

- Location within SEQ ID NO 70: from 27 to 102 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 4 0 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 
>gijl361999jpiri 1S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 70: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 71 

- Ceres seq_id 1580900 

- Location of start within SEQ ID NO 69: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 71: from 1 to 7 6 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 41 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 
>gi|1361999ipir! 1S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 71: from 1 to 7 7 



Maximum Length Sequence : 

related to: 
Clone IDs: 

146464 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 72 

- Ceres seq_id 1580910 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 73 

- Ceres seq_id 1580911 

- Location of start within SEQ ID NO 72: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 73: from 27 to 102 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 42 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 
>gi|1361999jpiri 1S57862 metallothionein 2b - Arabidopsis thaliana >gi|10864 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 73: from 27 to 103 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 7 4 

- Ceres seq_id 1580912 

- Location of start within SEQ ID NO 72: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Metallothionein 

- Location within SEQ ID NO 74: from 1 to 7 6 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 43 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi I 1361999 Ipir | j S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 74: from 1 to 77 

Maximum Length Sequence: 

related to: 
Clone IDs : 
4609 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 75 

- Ceres seq_id 1580921 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 6 

- Ceres seq_id 1580922 

- Location of start within SEQ ID NO 75: at 129 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 76: from 101 to 168 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 77 

- Ceres seq_id 1580923 

- Location of start within SEQ ID NO 75: at 237 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- RNA recognition motif, {a.k.a. RRM, RBD, or RNP domain} 

- Location within SEQ ID NO 77: from 65 to 132 aa . 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 8 

- Ceres seq_id 1580924 

- Location of start within SEQ ID NO 75: at 270 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 78: from 54 to 121 aa. 



(Dp) Related Amino Acid Sequences 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 19 



Maximum Length Sequence: 

related to: 
Clone IDs: 

15801 

252223 

{Ac} cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 7 9 

- Ceres seq_id 1580932 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 0 
_ ceres seq_id 1580933 

- Location of start within SEQ ID NO 79: at 129 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 4 

- gi No. 4914432 

- Description: (AL050351) ribosomal protein S25 [Arabidopsis 

thaliana] 

- % Identity: 100 

- Alignment Length: 108 

- Location of Alignment In SEQ ID NO 80: from 1 to 108 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 81 

- Ceres seq_id 1580934 

- Location of start within SEQ ID NO 79: at 246 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 5 

- gi No. 4914432 

- Description: (AL050351) ribosomal protein S25 [Arabidopsis 

thaliana] 

- % Identity: 100 

- Alignment Length: 108 

- Location of Alignment in SEQ ID NO 81: from 1 to 69 

Maximum Length Sequence: 

related to: 
Clone IDs: 
6179 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 82 

- Ceres seq_id 1580939 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 83 

- Ceres seq_id 1580940 

- Location of start within SEQ ID NO 82: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s) 

- Lipase/Acylhydrolase with GDSL-like motif 

- Location within SEQ ID NO 83: from 39 to 184 aa . 



(Dp) Related Amino Acid Sequences 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 84 

- Ceres seq_id 1580941 

- Location of start within SEQ ID NO 82: at 31 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Lipase/Acylhydrolase with GDSL-like motif 

- Location within SEQ ID NO 84: from 29 to 174 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 85 

- Ceres seq_id 1580942 

- Location of start within SEQ ID NO 82: at 481 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

147302 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 8 6 

- Ceres seq_id 1580968 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 87 

- Ceres seq_id 1580969 

- Location of start within SEQ ID NO 86: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 87: from 27 to 102 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 6 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 

>gi I 1361999 |pir! i S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 87: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 88 

- Ceres seq_id 1580970 

- Location of start within SEQ ID NO 86: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 88: from 1 to 7 6 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 7 

- gi No. 2497886 
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- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 
>gi[1361999jpir|!S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 88: from 1 to 77 

Maximum Length Sequence: 

related to: 
Clone IDs: 

38624 

18677 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 8 9 

- Ceres seq_id 1580983 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 90 

- Ceres seq_id 1580984 

- Location of start within SEQ ID NO 89: at 995 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- GMC oxidoreductases 

- Location within SEQ ID NO 90: from 1 to 58 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 8 

- gi No. 4903018 

- Description: (AB027507) ACE [Arabidopsis thaliana] 
» % Identity: 99.8 

- Alignment Length: 520 

- Location of Alignment in SEQ ID NO 90: from 1 to 294 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 91 

- Ceres seq_id 1580985 

- Location of start within SEQ ID NO 89: at 1001 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- GMC oxidoreductases 

- Location within SEQ ID NO 91: from 1 to 56 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 9 

- gi No. 4903018 

- Description: (AB027507) ACE [Arabidopsis thaliana] 

- % Identity: 99.8 

- Alignment Length: 520 

- Location of Alignment in SEQ ID NO 91: from 1 to 292 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 92 

- Ceres seq_id 1580986 

- Location of start within SEQ ID NO 89: at 1085 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- GMC oxidoreductases 

- Location within SEQ ID NO 92: from 90 to 249 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 50 

- gi No. 4903018 

- Description: (AB027507) ACE [Arabidopsis thaliana] 

- % Identity: 99.8 

- Alignment Length: 520 

- Location of Alignment in SEQ ID NO 92: from 1 to 264 

Maximum Length Sequence: 

related to: 
Clone IDs: 
37036 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 93 

- Ceres seq_id 1580995 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 94 

- Ceres seq_id 1580996 

- Location of start within SEQ ID NO 93: at 146 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DnaJ domain 

- Location within SEQ ID NO 94: from 4 to 7 0 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 51 

- gi No. 4914337 

- Description: (AC005489) F14N23.23 [Arabidopsis thaliana] 

- % Identity: 99.7 

- Alignment Length: 34 9 

- Location of Alignment in SEQ ID NO 94: from 1 to 34 9 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 95 

- Ceres seq_id 1580997 

- Location of start within SEQ ID NO 93: at 224 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 52 

- gi No. 4914337 

- Description: (AC005489) F14N23.23 [Arabidopsis thaliana] 

- % Identity: 99.7 

- Alignment Length: 34 9 

- Location of Alignment in SEQ ID NO 95: from 1 to 323 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 96 

- Ceres seq_id 1580998 

- Location of start within SEQ ID NO 93: at 230 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 53 

- gi No. 4914337 

- Description: (AC005489) F14N23.23 [Arabidopsis thaliana] 

- % Identity: 99.7 

- Alignment Length: 34 9 
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- Location of Alignment in SEQ ID NO 96: from 1 to 321 

Maximum Length Sequence: 

related to: 
Clone IDs: 

100156 

19721 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 97 

- Ceres seq_id 1581038 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 98 

- Ceres seq__id 1581039 

- Location of start within SEQ ID NO 97: at 120 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- AP2 domain 

- Location within SEQ ID NO 98: from 49 to 107 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 99 

- Ceres seq_id 1581040 

- Location of start within SEQ ID NO 97: at 168 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- AP2 domain 

- Location within SEQ ID NO 99: from 33 to 91 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 100 

- Ceres seq__id 1581041 

- Location of start within SEQ ID NO 97: at 285 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- AP2 domain 

- Location within SEQ ID NO 100: from 1 to 52 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

149939 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 101 

- Ceres seq__id 1581046 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 102 

- Ceres seq_id 1581047 

- Location of start within SEQ ID NO 101: at 1 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Lyase 
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- Location within SEQ ID NO 102: from 9 to 167 aa. 

Related Amino Acid Sequences 
Alignment No. 54 
gi No. 1769568 

Description: (U82202) fumarase; fumarate hydratase [Arabidopsis 

% Identity: 93.1 
Alignment Length: 160 

Location of Alignment in SEQ ID NO 102: from 9 to 167 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 103 

- Ceres seq__id 1581048 

- Location of start within SEQ ID NO ^101: at 91 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Lyase 

- Location within SEQ ID NO 103: from 1 to 137 aa. 

Related Amino Acid Sequences 
Alignment No. 55 
gi No. 1769568 

Description: (U82202) fumarase; fumarate hydratase [Arabidopsis 

% Identity: 93.1 
Alignment Length : 160 

Location of Alignment in SEQ ID NO 103: from 1 to 137 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 10 4 

- Ceres seq_id 1581049 

- Location of start within SEQ ID NO 101: at 151 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Lyase 

- Location within SEQ ID NO 104: from 1 to 117 aa. 

Related Amino Acid Sequences 
Alignment No. 5 6 
gi No. 1769568 

Description: (U82202) fumarase; fumarate hydratase [Arabidopsis 

% Identity: 93.1 
Alignment Length: 160 

Location of Alignment in SEQ ID NO 104: from 1 to 117 

Maximum Length Sequence: 

related to: 
Clone IDs: 

151330 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 105 

- Ceres seq_id 1581056 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 106 

- Ceres seq_id 1581057 

- Location of start within SEQ ID NO 105: at 3 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 106: from 27 to 102 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 57 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 
>gi|1361999|pir| IS57862 metallothionein 2b - Arabidopsis thaliana >gi!1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 106: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 107 

- Ceres seq_id 1581058 

- Location of start within SEQ ID NO 105: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 107: from 1 to 7 6 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 58 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B ) 
>gi|1361999|pir| IS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 107: from 1 to 77 

Maximum Length Sequence: 

related to: 
Clone IDs: 

153415 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 108 

- Ceres seq_id 1581067 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 109 

- Ceres seq_id 1581068 

- Location of start within SEQ ID NO 108: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 109: from 26 to 101 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 9 

- gi No, 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 

>gi i 1361999 Ipir | | S57862 metallothionein 2b - Arabidopsis thaliana >gi|108646: 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

~ Alignment Length: 77 

- Location of Alignment in SEQ ID NO 109: from 26 to 102 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 110 

- Ceres seq_id 1581069 

- Location of start within SEQ ID NO 108: at 77 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 110: from 1 to 76 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 60 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 

>gi! 1361999 ipirS 1S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 110: from 1 to 77 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 111 

- Ceres seq_id 1581070 

- Location of start within SEQ ID NO 108: at 351 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 61 

- gi No. 1173045 

- Description: 60S RIBOSOMAL PROTEIN L37A >gi 1 421866 i pir || S34661 
ribosomal protein L37a - turnip >gi 1347062 (L21897) ribosomal protein 
[Brassica rapa] >gi S 395077 [ emb | CAA808 64 | (Z24739) ribosomal protein L37a 
[Brassica rapa] 

- % Identity: 90 

- Alignment Length: 30 

- Location of Alignment in SEQ ID NO 111: from 28 to 56 



Maximum Length Sequence: 

related to: 
Clone IDs: 

154838 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 112 

- Ceres seq_id 1581075 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 113 

- Ceres seq_id 1581076 

- Location of start within SEQ ID NO 112: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Peptidase family M3 

- Location within SEQ ID NO 113: from 1 to 138 aa . 



(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 114 

- Ceres seq_id 1581077 
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- Location of start within SEQ ID NO 112: at 97 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Peptidase family M3 

- Location within SEQ ID NO 114: from 1 to 106 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

156364 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 115 

- Ceres seq_id 1581118 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 116 

- Ceres seq_id 1581119 

- Location of start within SEQ ID NO 115: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 116: from 27 to 102 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 62 

- gi No. 2497886 

- Description: METALLOTHIONEIN- LIKE PROTEIN 2B (MT-2B) 

>gi ! 1361999 |pirl|S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 116: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 117 

- Ceres seq_id 1581120 

- Location of start within SEQ ID NO 115: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 117: from 1 to 7 6 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 63 

- gi No. 2497886 

- Description: METALLOT HI ONE IN-LIKE PROTEIN 2B (MT-2B) 
>gi!1361999|pir| IS57862 metallothionein 2b - Arabidopsis thaliana >gi!1086462 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 117: from 1 to 77 

Maximum Length Sequence: 

related to: 
Clone IDs: 

159215 

(Ac) cDNA Polynucleotide Sequence 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 28 



- Pat. Appln. SEQ ID NO 118 

- Ceres seq_ici 1581137 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 119 

- Ceres seq__id 1581138 

- Location of start within SEQ ID NO 118: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 120 

- Ceres seq_id 1581139 

- Location of start within SEQ ID NO 118: at 75 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 120: from 1 to 7 6 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 64 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 
>giil361999|pirl IS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 120: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

158076 

31290 

126073 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 121 

- Ceres seq_id 1581140 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 122 

- Ceres seq_id 1581141 

- Location of start within SEQ ID NO 121: at 7 9 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Iron/Ascorbate oxidoreductase family 

- Location within SEQ ID NO 122: from 15 to 272 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 65 

- gi No. 2781354 

- Description: (AC003113) F24O1.10 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 320 

- Location of Alignment in SEQ ID NO 122: from 1 to 320 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 123 

- Ceres seq_id 1581142 
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- Location of start within SEQ ID NO 121: at 91 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Iron/Ascorbate oxidoreductase family 

- Location within SEQ ID NO 123: from 11 to 268 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 66 

- gi No. 2781354 

- Description: (AC003113) F24O1.10 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 32 0 

- Location of Alignment in SEQ ID NO 123: from 1 to 316 

<B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 124 

- Ceres seq_id 1581143 

- Location of start within SEQ ID NO 121: at 148 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Iron/Ascorbate oxidoreductase family 

- Location within SEQ ID NO 124: from 1 to 24 9 aa. 

{Dp) Related Amino Acid Sequences 

- Alignment No. 67 

- gi No. 2781354 

- Description: (AC003113) F24O1.10 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 32 0 

- Location of Alignment in SEQ ID NO 124: from 1 to 297 

Maximum Length Sequence: 

related to: 
Clone IDs: 

21689 

92780 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 125 

- Ceres seq_id 1581172 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 12 6 

- Ceres seq_id 1581173 

- Location of start within SEQ ID NO 125: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Asparaginase 

- Location within SEQ ID NO 126: from 20 to 342 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 127 

- Ceres seq_id 1581174 

- Location of start within SEQ ID NO 125: at 57 nt. 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Asparaginase 
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- Location within SEQ ID NO 127: from 2 to 324 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 128 

- Ceres seq__id 1581175 

- Location of start within SEQ ID NO 125: at 294 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Asparaginase 

- Location within SEQ ID NO 128: from 1 to 245 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 
19111 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 129 

- Ceres seq_id 1581200 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 130 

- Ceres seq_id 1581201 

- Location of start within SEQ ID NO 129: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 130: from 27 to 102 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 68 

- gi No. 2497886 

- Description: METALLOTHIONEIN- LIKE PROTEIN 2B (MT-2B) 

>gii 1361999 Ipiri IS57862 metallothionein 2b - Arabidopsis thaliana >gi!1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 98.7 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 130: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 131 

- Ceres seq_id 1581202 

- Location of start within SEQ ID NO 129: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 131: from 1 to 7 6 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 69 

- gi No. 2497886 

- Description: METALLOTH I ONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi| 1361999|pir| IS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 98.7 

- Alignment Length: 77 
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- Location of Alignment in SEQ ID NO 131: from 1 to 77 

Maximum Length Sequence: 

related to: 
Clone IDs: 
19891 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 132 

- Ceres seq_id 1581205 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 133 

- Ceres seq_id 1581206 

- Location of start within SEQ ID NO 132: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 133: from 27 to 102 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 0 

- gi No. 2497886 

- Description: METALLOTHI ONE IN-LIKE PROTEIN 2B (MT-2B) 
>gi|1361999|piri IS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 133: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 134 

- Ceres seq_id 1581207 

- Location of start within SEQ ID NO 132: at 7 9 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 134: from 1 to 7 6 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 71 

- gi No. 2497886 

- Description: METALLOTHI ONE IN- LIKE PROTEIN 2B (MT-2B) 
>gi|1361999|pirMS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 134: from 1 to 77 

Maximum Length Sequence: 

related to: 
Clone IDs: 

12629 

40191 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 135 

- Ceres seq_id 1581223 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 136 

- Ceres seq__id 1581224 

- Location of start within SEQ ID NO 135: at 180 nt. 
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(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Armadillo/beta-catenin-like repeats 

- Location within SEQ ID NO 136: from 194 to 235 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No, 72 

- gi No. 2950210 

- Description: (Y14615) Importin alpha-like protein [Arabidopsis 

thaliana] 

- % Identity: 98.5 

- Alignment Length: 535 

- Location of Alignment in SEQ ID NO 136: from 1 to 535 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 137 

- Ceres seq_id 1581225 

- Location of start within SEQ ID NO 135: at 27 6 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide <s ) 

- Armadillo/beta-catenin-like repeats 

- Location within SEQ ID NO 137: from 162 to 203 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 3 

- gi No. 2950210 

- Description: (Y14615) Importin alpha-like protein [Arabidopsis 

thaliana] 

- % Identity: 98.5 

- Alignment Length: 535 

- Location of Alignment in SEQ ID NO 137: from 1 to 503 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 138 

- Ceres seq_id 1581226 

- Location of start within SEQ ID NO 135: at 423 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Armadillo/beta-catenin-like repeats 

- Location within SEQ ID NO 138: from 113 to 154 aa . 

Related Amino Acid Sequences 
Alignment No. 7 4 
gi No. 2950210 

Description: (Y14615) Importin alpha-like protein [Arabidopsis 

% Identity: 98.5 
Alignment Length: 535 

Location of Alignment in SEQ ID NO 138: from 1 to 454 

Maximum Length Sequence: 

related to: 
Clone IDs: 

108385 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 139 

- Ceres seq_id 1581382 
(B) Polypeptide Sequence 



(Dp) 



thaliana] 
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- Pat. Appln. SEQ ID NO 140 

- Ceres seq_id 1581383 

- Location of start within SEQ ID NO 139: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 75 

- gi No. 4538961 

- Description: (AL04 9488) isoleucine-tRNA ligase-like protein 
[Arabidopsis thaliana] 

- % Identity: 92.3 

- Alignment Length: 39 

- Location of Alignment in SEQ ID NO 140: from 1 to 39 



Maximum Length Sequence: 

related to: 
Clone IDs: 

108774 

(Ac} cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 141 

- Ceres seq_id 1581384 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 142 

- Ceres seq__id 1581385 

- Location of start within SEQ ID NO 141: at 154 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- C2 domain 

- Location within SEQ ID NO 142: from 96 to 188 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 7 6 

- gi No. 2129670 

- Description: phosphoinositide-specif ic phospholipase C - 
Arabidopsis thaliana >gi 1 857374 ! dbj | BAA09432 | {D50804} phosphoinositide 
specific phospholipase C [Arabidopsis thaliana] 

- % Identity: 92.5 

- Alignment Length: 228 

- Location of Alignment in SEQ ID NO 142: from 1 to 214 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 143 

- Ceres seq_id 1581386 

- Location of start within SEQ ID NO 141: at 271 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- C2 domain 

- Location within SEQ ID NO 143: from 57 to 149 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 77 

- gi No. 2129670 

- Description: phosphoinositide-specif ic phospholipase C - 
Arabidopsis thaliana >gi i 857374 1 dbj I BAA09432 i (D50804) phosphoinositide 
specific phospholipase C [Arabidopsis thaliana] 

- % Identity: 92.5 

- Alignment Length: 228 

- Location of Alignment in SEQ ID NO 143: from 1 to 17 5 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 144 

- Ceres seq_id 1581387 

- Location of start within SEQ ID NO 141: at 286 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- C2 domain 

- Location within SEQ ID NO 144: from 52 to 144 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 8 

- gi No. 2129670 

- Description: phosphoinositide-specif ic phospholipase C - 
Arabidopsis thaliana >gi i 857374 | dbj | BAA09432 | (D50804) phosphoinositide 
specific phospholipase C [Arabidopsis thaliana] 

- % Identity: 92.5 

- Alignment Length: 228 

- Location of Alignment in SEQ ID NO 144: from 1 to 170 



Maximum Length Sequence: 

related to: 
Clone IDs: 

142668 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 145 

- Ceres seq_id 1581438 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 146 

- Ceres seq_id 1581439 

- Location of start within SEQ ID NO 145: at 45 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Pyridoxal-phosphate dependent enzymes 

- Location within SEQ ID NO 146: from 11 to 300 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 7 9 

- gi No. 4996618 

- Description: (AB024283) cysteine synthase [Arabidopsis thaliana] 

- % Identity: 99.4 

- Alignment Length: 323 

- Location of Alignment in SEQ ID NO 14 6: from 1 to 323 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 147 

- Ceres seq_id 1581440 

- Location of start within SEQ ID NO 145: at 102 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Pyridoxal-phosphate dependent enzymes 

- Location within SEQ ID NO 147: from 1 to 281 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 8 0 

- gi No. 4996618 

- Description: (AB024283) cysteine synthase [Arabidopsis thaliana] 

- % Identity: 99.4 
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- Alignment Length: 323 

- Location of Alignment in SEQ ID NO 147: from 1 to 304 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 148 

- Ceres seq_id 1581441 

- Location of start within SEQ ID NO 145: at 162 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Pyridoxal-phosphate dependent enzymes 

- Location within SEQ ID NO 148: from 1 to 261 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 81 

- gi No. 4996618 

- Description: {AB024283} cysteine synthase [Arabidopsis thaliana] 

- % Identity: 99.4 

- Alignment Length: 323 

- Location of Alignment in SEQ ID NO 148: from 1 to 284 



Maximum Length Sequence: 

related to: 
Clone IDs: 

20187 

16737 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 14 9 

- Ceres seq_id 1581454 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 150 

- Ceres seq_id 1581455 

- Location of start within SEQ ID NO 149: at 189 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sm protein 

- Location within SEQ ID NO 150: from 9 to 82 aa. 



(Dp) Related Amino Acid Sequences 



Maximum Length Sequence: 

related to: 
Clone IDs: 

27553 

99409 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 151 

- Ceres seq_id 1581498 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 152 

- Ceres seq_id 1581499 

- Location of start within SEQ ID NO 151: at 8 6 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Pollen allergen 

- Location within SEQ ID NO 152: from 161 to 239 aa. 



(Dp) Related Amino Acid Sequences 
- Alignment No. 82 
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- gi No. 3510538 

- Description: (U93167) expansin [Prunus armeniaca] 

- % Identity: 7 9.8 

- Alignment Length: 253 

- Location of Alignment in SEQ ID NO 152: from 4 to 253 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 153 

- Ceres seq_id 1581500 

- Location of start within SEQ ID NO 151: at 224 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Pollen allergen 

- Location within SEQ ID NO 153: from 115 to 193 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 83 

- gi No. 3510538 

- Description: (U93167) expansin [Prunus armeniaca] 

- % Identity: 7 9.8 

- Alignment Length: 25 3 

- Location of Alignment in SEQ ID NO 153: from 1 to 207 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 154 

- Ceres seq_id 1581501 

- Location of start within SEQ ID NO 151: at 338 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Pollen allergen 

- Location within SEQ ID NO 154: from 77 to 155 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 4 

- gi No. 3510538 

- Description: (U93167) expansin [Prunus armeniaca] 

- % Identity: 79.8 

- Alignment Length: 253 

- Location of Alignment in SEQ ID NO 154: from 1 to 169 

Maximum Length Sequence: 

related to: 
Clone IDs: 

116085 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 155 

- Ceres seq_id 1581567 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 156 

- Ceres seq^id 1581568 

- Location of start within SEQ ID NO 155: at 55 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide <s) 

- Ribosomal protein S4/S9 

- Location within SEQ ID NO 156: from 8 to 144 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 85 
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- gi No. 1710780 

- Description: 40S RIBOSOMAL PROTEIN S9 (S7) 

>gi I 1321917 1 emb [CAA65433I (X96613) cytoplasmic ribosomal protein S7 
[Podospora anserina] 

- % Identity: 74.6 

- Alignment Length: 138 

- Location of Alignment in SEQ ID NO 156: from 8 to 144 

Maximum Length Sequence: 

related to: 
Clone IDs: 

126602 

15384 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 157 

- Ceres seq_id 1581585 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 158 

- Ceres seq__id 1581586 

- Location of start within SEQ ID NO 157: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 6 

- gi No. 3123264 

- Description: 60S RIBOSOMAL PROTEIN L27 

>gi j 2244857 i emb | CAB1027 9.il (Z97337) ribosomal protein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 135 

- Location of Alignment in SEQ ID NO 158: from 22 to 156 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 159 

- Ceres seq_id 1581587 

- Location of start within SEQ ID NO 157: at 66 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 87 

- gi No. 3123264 

- Description: 60S RIBOSOMAL PROTEIN L27 

>gi I 2244857 | emb | CAB10279.1! (Z97337) ribosomal protein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 135 

- Location of Alignment in SEQ ID NO 15 9: from 1 to 135 

Maximum Length Sequence: 

related to: 
Clone IDs: 
37132 
6045 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 160 

- Ceres seq_id 1581608 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 161 

- Ceres seq__id 1581609 

- Location of start within SEQ ID NO 160: at 157 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Papain family cysteine protease 

- Location within SEQ ID NO 161: from 135 to 360 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 8 

- gi No. 1172872 

- Description: CYSTEINE PROTEINASE RD19A PRECURSOR 

>gi I 541856 ipir | IJN0718 drought-inducible cysteine proteinase (EC 3.4.22. 
RD19A precursor - Arabidopsis thaliana >gi i 435 618 [ dbj | BAA02373 I (D13042) 
thiol protease [Arabidopsis 

- % Identity: 99.7 

- Alignment Length: 368 

- Location of Alignment in SEQ ID NO 161: from 1 to 368 

Maximum Length Sequence: 

related to: 
Clone IDs: 

120572 

15535 

(Ac) cDNA Polynucleotide Sequence 

- Pat. s Appln. SEQ ID NO 162 

- Ceres seq_id 1581621 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 163 

- Ceres seq_id 1581622 

- Location of start within SEQ ID NO 162: at 206 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 163: from 169 to 437 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 164 

- Ceres seq_id 1581623 

- Location of start within SEQ ID NO 162: at 686 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 164: from 9 to 277 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 165 

- Ceres seq_id 1581624 

- Location of start within SEQ ID NO 162: at 893 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 165: from 1 to 208 aa . 
(Dp) Related Amino Acid Sequences 



Maximum Length Sequence: 
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related to: 
Clone IDs: 
37859 
42666 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 166 

- Ceres seq_id 1581926 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 167 

- Ceres seq_id 1581927 

- Location of start within SEQ ID NO 166: at 111 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 9 

- gi No. 3935172 

- Description: (AC004557) F17L21.15 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 3 93 

- Location of Alignment in SEQ ID NO 167: from 1 to 393 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 168 

- Ceres seq_id 1581928 

- Location of start within SEQ ID NO 166: at 123 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 90 

- gi No. 3935172 

- Description: (AC004557) F17L21.15 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 393 

- Location of Alignment in SEQ ID NO 168: from 1 to 389 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 169 

- Ceres seq_id 1581929 

- Location of start within SEQ ID NO 166: at 162 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 91 

- gi No. 3935172 

- Description: (AC004557) F17L21.15 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 3 93 

- Location of Alignment in SEQ ID NO 169: from 1 to 376 

Maximum Length Sequence : 

related to: 
Clone IDs: 

25432 

114578 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 17 0 

- Ceres seq_id 1581971 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 171 

- Ceres seq_id 1581972 

- Location of start within SEQ ID NO 170: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp} Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 172 

- Ceres seq_id 1581973 

- Location of start within SEQ ID NO 17 0: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 92 

- gi No. 5729802 

- Description: ref I NP_00 66 92 . 1 i pDIMl 1 similar to S. pombe diml+ 
>gi 12565275 (AF023611) Dimlp homolog [Homo sapiens] 

- % Identity: 85.2 

- Alignment Length: 14 2 

- Location of Alignment in SEQ ID NO 172: from 17 to 158 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 173 

- Ceres seq_id 1581974 

- Location of start within SEQ ID NO 170: at 51 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 93 

- gi No. 5729802 

- Description: ref 1 NP_00 6 692 . 1 | pDIMl 1 similar to S. pombe diml+ 
>gi 12565275 (AF023611) Dimlp homolog [Homo sapiens] 

- % Identity: 85.2 

- Alignment Length: 14 2 

- Location of Alignment in SEQ ID NO 173: from 1 to 142 

Maximum Length Sequence: 

related to: 
Clone IDs: 

101179 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 174 

- Ceres seq_id 1581981 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 5 

- Ceres seq_id 1581982 

- Location of start within SEQ ID NO 174: at 7 6 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 94 

- gi No. 100216 

- Description: extensin class II (clone uJ-2) - tomato 
>gi | 1345538 ! emb | CAA39216 | (X55686) extensin (class II) [Lycopersicon 
esculentum] 

- % Identity: 70 
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- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 175: from 256 to 274 

Maximum Length Sequence: 

related to: 
Clone IDs: 

108648 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 17 6 

- Ceres seq_id 1582014 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 177 

- Ceres seq_id 1582015 

- Location of start within SEQ ID NO 17 6: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Zinc finger, C3HC4 type {RING finger) 

- Location within SEQ ID NO 177: from 291 to 331 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 178 

- Ceres seq_id 1582016 

- Location of start within SEQ ID NO 17 6: at 237 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Zinc finger, C3HC4 type (RING finger) 

- Location within SEQ ID NO 178: from 213 to 253 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 9 

- Ceres seq_id 1582017 

- Location of start within SEQ ID NO 176: at 483 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Zinc finger, C3HC4 type (RING finger) 

- Location within SEQ ID NO 179: from 131 to 171 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 
98981 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 180 

- Ceres seq__id 1582040 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 181 

- Ceres seq_id 1582041 

- Location of start within SEQ ID NO 180: at 243 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Uncharacterized protein family UPF0034 
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- Location within SEQ ID NO 181: from 13 to 312 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 182 

- Ceres seq__id 1582042 

- Location of start within SEQ ID NO 180: at 291 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Uncharacterized protein family UPF0034 

- Location within SEQ ID NO 182: from 1 to 296 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 183 

- Ceres seq_id 1582043 

- Location of start within SEQ ID NO 180: at 294 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Uncharacterized protein family UPF0034 

- Location within SEQ ID NO 183: from 1 to 295 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

121063 

101843 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 184 

- Ceres seq_id 1582064 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 185 

- Ceres seq__id 1582065 

- Location of start within SEQ ID NO 184: at 127 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Thioredoxin 

- Location within SEQ ID NO 185: from 269 to 375 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 18 6 

- Ceres seq_id 1582066 

- Location of start within SEQ ID NO 184: at 346 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s } 

- Thioredoxin 

- Location within SEQ ID NO 186: from 196 to 302 aa. 
(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 187 

- Ceres seq_id 1582067 

- Location of start within SEQ ID NO 18 4: at 412 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Thioredoxin 

- Location within SEQ ID NO 187: from 174 to 280 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

230986 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 188 

- Ceres seq_id 1582076 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 18 9 

- Ceres seq_id 1582077 

- Location of start within SEQ ID NO 188: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 189: from 1 to 52 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 190 

- Ceres seq_id 1582078 

- Location of start within SEQ ID NO 188: at 120 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 190: from 17 to 101 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 191 
_ ceres seq_id 1582079 

- Location of start within SEQ ID NO 188: at 147 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 191: from 8 to 92 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs : 
93821 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 192 

- Ceres seq_id 1582098 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 193 

- Ceres seq_id 1582099 

- Location of start within SEQ ID NO 192: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Thiolase 

- Location within SEQ ID NO 193: from 35 to 426 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 95 

- gi No. 1542941 

- Description: (X78116) Acetoacetyl-coenzyme A thiolase [Raphanus 

sativus] 

- % Identity: 75.5 

- Alignment Length: 400 

- Location of Alignment in SEQ ID NO 193: from 29 to 426 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 194 

- Ceres seq_id 1582100 

- Location of start within SEQ ID NO 192: at 70 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Thiolase 

- Location within SEQ ID NO 194: from 12 to 403 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 96 

- gi No. 1542941 

- Description: (X78116) Acetoacetyl-coenzyme A thiolase [Raphanus 

sativus] 

- % Identity: 75.5 

- Alignment Length: 4 00 

- Location of Alignment in SEQ ID NO 194: from 6 to 403 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 195 

- Ceres seq_id 1582101 

- Location of start within SEQ ID NO 192: at 376 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Thiolase 

- Location within SEQ ID NO 195: from 1 to 301 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 97 

- gi No. 1542941 

- Description: (X78116) Acetoacetyl-coenzyme A thiolase [Raphanus 

sativus] 

- % Identity: 75.5 

- Alignment Length: 4 00 

- Location of Alignment in SEQ ID NO 195: from 1 to 301 



Maximum Length Sequence: 

related to: 
Clone IDs: 

248034 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 196 

- Ceres seq_id 1582106 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 197 
_ Ceres seq_id 1582107 

- Location of start within SEQ ID NO 196: at 75 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Dehydrins 

- Location within SEQ ID NO 197: from 49 to 108 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 98 

- gi No. 4972049 

- Description: (AL078470) glycine-rich protein like [Arabidopsis 

thaliana] 

- % Identity: 99.1 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 197: from 1 to 110 

Maximum Length Sequence: 

related to: 
Clone IDs: 

248224 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 198 

- Ceres seq_id 1582111 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 199 

- Ceres seq_id 1582112 

- Location of start within SEQ ID NO 198: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 99 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999 |pir | ! S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 51 

- Location of Alignment in" SEQ ID NO 199: from 24 to 73 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 200 

- Ceres seq_id 1582113 

- Location of start within SEQ ID NO 198: at 72 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 100 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi I 1361999 |pir M S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 51 
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- Location of Alignment in SEQ ID NO 200: from 1 to 50 



Maximum Length Sequence: 

related to: 
Clone IDs: 

249909 

(Ac) cDNA Polynucleotide Sequence 

- Pat* Appln. SEQ ID NO 201 

- Ceres seq_id 1582124 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 202 

- Ceres seq__id 1582125 

- Location of start within SEQ ID NO 201: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 202: from 27 to 102 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 101 

- gi No. 2497886 

- Description: ME T AL LOT H I ONE I N- L I KE PROTEIN 2B (MT-2B) 

>gi | 1361999 !pir | | S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 202: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 203 

- Ceres seq_id 1582126 

- Location of start within SEQ ID NO 201: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Metallothionein 

- Location within SEQ ID NO 203: from 1 to 76 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 102 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 
>gi|1361999|pir||S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 203: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

255321 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 204 

- Ceres seq_id 1582179 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 205 

- Ceres seq__id 1582180 

- Location of start within SEQ ID NO 204: at 3 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 205: from 27 to 102 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 103 

- gi No. 2497886 

- Description: METALLOTH I ONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi 1 1361999 |pir | i S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 205: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 206 

- Ceres seq_id 1582181 

- Location of start within SEQ ID NO 204: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 206: from 1 to 76 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 104 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi 1 1361999 Ipir | 1S57862 metallothionein 2b - Arabidopsis thaliana >giil0864 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 206: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

260004 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 207 

- Ceres seq_id 1582190 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 20 8 

- Ceres seq_Id 1582191 

- Location of start within SEQ ID NO 207: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 208: from 27 to 102 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 105 

- gi No. 24 9788 6 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 
>gi|1361999|pir MS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 208: from 27 to 103 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 209 

- Ceres seq_id 1582192 

- Location of start within SEQ ID NO 207: at 79 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 209: from 1 to 76 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 106 

- gi No. 2497886 

- Description: METALLOTH I ONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999 jpirjl S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 209: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

262837 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 210 

- Ceres seq_id 1582199 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 211 

- Ceres seq_id 1582200 

- Location of start within SEQ ID NO 210: at 1 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 211: from 28 to 103 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 107 

- gi No. 2497886 

- Description: METALLOTHIONEIN- LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999 |pir | | S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 211: from 28 to 104 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 212 

- Ceres seq_id 1582201 

- Location of start within SEQ ID NO 210: at 82 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 212: from 1 to 76 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 10 8 

- gi No. 2497886 
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- Description: METALLOT HI ONE IN-LIKE PROTEIN 2B (MT-2B) 
>giil361999[pir|jS57862 metallothionein 2b - Arabidopsis thaliana >gi!1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 212: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

265299 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 213 

- Ceres seq_id 1582204 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 214 

- Ceres seq_id 1582205 

- Location of start within SEQ ID NO 213: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Metallothionein 

- Location within SEQ ID NO 214: from 27 to 102 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 10 9 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999 Ipiri IS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 214: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 215 

- Ceres seq_id 1582206 

- Location of start within SEQ ID NO 213: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 215: from 1 to 76 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 110 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 
>gi|1361999|pir| IS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 215: from 1 to 77 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 216 

- Ceres seq_id 1582207 

- Location of start within SEQ ID NO 213: at 355 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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(Dp) Related Amino Acid Sequences 



Maximum Length Sequence: 

related to: 
Clone IDs: 

270010 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 217 

- Ceres seq__id 1582236 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 218 

- Ceres seq_id 1582237 

- Location of start within SEQ ID NO 217: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 218: from 27 to 102 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. Ill 

- gi No. 2497886 

- Description: ME T ALLOT H I ONE I N- L I KE PROTEIN 2B (MT-2B) 
>gi|1361999jpiri (S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 96.1 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 218: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 219 

- Ceres seq_id 1582238 

- Location of start within SEQ ID NO 217: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 219: from 1 to 76 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 112 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 

>gi i 1361999 |pir | ! S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 96.1 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 219: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

270131 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 220 

- Ceres seq_id 1582239 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 221 

- Ceres seq_id 1582240 

- Location of start within SEQ ID NO 220: at 3 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 221: from 27 to 102 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 113 

- gi No. 2497886 

- Description: ME TALLOT H I ONE IN- L I KE PROTEIN 2B (MT-2B) 

>gi i 1361999 ipirl i S57862 metallothionein 2b - Arabidopsis thaliana >gi|10864 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 221: from 27 to 103 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 222 

- Ceres seq_id 1582241 

- Location of start within SEQ ID NO 220: at 81 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 222: from 1 to 7 6 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 114 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999 ipir | 1S57862 metallothionein 2b - Arabidopsis thaliana >gijl086 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 222: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

271244 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 22 3 

- Ceres seq__id 1582245 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 224 

- Ceres seq_id 1582246 

- Location of start within SEQ ID NO 223: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 224: from 27 to 102 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 115 

- gi No. 2497886 

- Description: METALLOTHIONE IN- LIKE PROTEIN 2B (MT-2B) 
>gi|13619991pir| IS57862 metallothionein 2b - Arabidopsis thaliana >gi|10864 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 224: from 27 to 103 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 52 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 225 

- Ceres seq_ici 1582247 

- Location of start within SEQ ID NO 223: at 81 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 225: from 1 to 7 6 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 116 

- gi No. 2497886 

- Description: METALLOTHIONEIN- LIKE PROTEIN 2B (MT-2B) 
>gi|1361999|pir| IS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 225: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

271977 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 226 

- Ceres seq_id 1582248 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 227 

- Ceres seq_id 1582249 

- Location of start within SEQ ID NO 226: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Metallothionein 

- Location within SEQ ID NO 227: from 27 to 102 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 117 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi i 1361999 I pir | IS57862 metallothionein 2b - Arabidopsis thaliana >giil086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 227: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 228 

- Ceres seq_id 1582250 

- Location of start within SEQ ID NO 226: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 228: from 1 to 7 6 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 118 

- gi No. 2497886 
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- Description: MET ALLOT HI ONE IN- LIKE PROTEIN 2B (MT-2B) 

>gi 1 1361999 [pirl i S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 228: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

104788 

16044 

20668 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 229 

- Ceres seq_id 1582254 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 230 

- Ceres seq_id 1582255 

- Location of start within SEQ ID NO 229: at 715 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glycosyl hydrolase family 1 

- Location within SEQ ID NO 230: from 127 to 369 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 119 

- gi No. 585536 

- Description: MYROSINASE PRECURSOR (SINIGRINASE) (THIOGLUCOSIDASE ) 
>gi|1362006|pir| IS56653 thioglucosidase (EC 3.2.3.1) - Arabidopsis thaliana 
>gi 1 304115 (L11454) thioglucosidase [Arabidopsis thaliana] 

>gi 1 871990 1 emb | CAA557 8 6 j (X79194) E=le-196, N=l 

- % Identity: 97 

- Alignment Length: 2 64 

- Location of Alignment in SEQ ID NO 230: from 127 to 390 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 231 

- Ceres seq_id 1582256 

- Location of start within SEQ ID NO 229: at 763 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s ) 

- Glycosyl hydrolase family 1 

- Location within SEQ ID NO 231: from 111 to 353 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 120 

- gi No. 585536 

- Description: MYROSINASE PRECURSOR (SINIGRINASE) (THIOGLUCOSIDASE) 
>gi|13620061pir| 1S56653 thioglucosidase (EC 3.2.3.1) - Arabidopsis thaliana 
>gi | 304115 (L11454) thioglucosidase [Arabidopsis thaliana] 

>gi 1 871990 | emb 1 CAA557 8 6 i (X79194) E=le-196, N=l 

- % Identity: 97 

- Alignment Length: 2 64 

- Location of Alignment in SEQ ID NO 231: from 111 to 374 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 232 

- Ceres seq_id 1582257 
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- Location of start within SEQ ID NO 229: at 775 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glycosyl hydrolase family 1 

- Location within SEQ ID NO 232: from 107 to 349 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 121 

- gi No. 585536 

- Description: MYROSINASE PRECURSOR ( SINIGRINASE ) (THIOGLUCOSIDASE) 
>gi|1362006|pir| IS56653 thioglucosidase (EC 3.2.3.1) - Arabidopsis thaliana 
>gi I 304115 (L11454) thioglucosidase [Arabidopsis thaliana] 
>gi|871990|emb|CAA55786| (X79194) E=le-196, N-l 

- % Identity: 97 

- Alignment Length: 264 

- Location of Alignment in SEQ ID NO 232: from 107 to 370 

Maximum Length Sequence: 

related to: 
Clone IDs: 

101841 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 233 

- Ceres seq_id 1582293 
(B) Polypeptide Sequence 

- Pat, Appln. SEQ ID NO 234 

- Ceres seq_id 1582294 

- Location of start within SEQ ID NO 233: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No, 122 

- gi No. 5452942 

- Description: (AF066061) glucosidase II beta-subunit [Mus 

musculus] 

- % Identity: 75 

- Alignment Length: 24 

- Location of Alignment in SEQ ID NO 234: from 208 to 231 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 235 

- Ceres seq_id 1582295 

- Location of start within SEQ ID NO 233: at 12 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 123 

- gi No. 5452942 

- Description: (AF066061) glucosidase II beta-subunit [Mus 

musculus] 

- % Identity: 75 

- Alignment Length: 24 

- Location of Alignment in SEQ ID NO 235: from 205 to 228 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 236 

- Ceres seq_id 1582296 

- Location of start within SEQ ID NO 233: at 33 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 124 

- gi No. 5452942 

- Description: (AF066061) glucosidase II beta-subunit [Mus 

musculus ] 

- % Identity: 75 

- Alignment Length: 24 

- Location of Alignment in SEQ ID NO 236: from 198 to 221 



Maximum Length Sequence: 

related to: 
Clone IDs: 
33316 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 237 

- Ceres seq_id 1582315 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 238 

- Ceres seq_id 1582316 

- Location of start within SEQ ID NO 237: at 3 nt. 

<C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 238: from 27 to 102 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 125 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 
>gi|1361999!pir| IS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 238: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 239 

- Ceres seq_id 1582317 

- Location of start within SEQ ID NO 237: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide <s) 

- Metallothionein 

- Location within SEQ ID NO 239: from 1 to 76 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 6 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 
>giil361999|piri IS57862 metallothionein 2b - Arabidopsis thaliana >gijl086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 239: from 1 to 77 



Maximum Length Sequence: 
related to: 
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Clone IDs: 

114950 
33601 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 240 

- Ceres seq_id 1582339 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 241 

- Ceres seq_id 1582340 

- Location of start within SEQ ID NO 240: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 24 2 

- Ceres seq_id 1582341 

- Location of start within SEQ ID NO 240: at 7 6 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 127 

- gi No. 4039153 

- Description: (AF104221) low temperature and salt responsive 
protein LTI6A [Arabidopsis thaliana] >gi 1 4325217 | gb | AAD17302 | (AF122005) 
hydrophobic protein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 54 

- Location of Alignment in SEQ ID NO 242: from 1 to 54 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 243 

- Ceres seq_id 1582342 

- Location of start within SEQ ID NO 240: at 171 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 
34141 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 244 

- Ceres seq_id 1582349 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 245 

- Ceres seq_id 1582350 

- Location of start within SEQ ID NO 244: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- 3-beta hydroxysteroid dehydrogenase/isomerase family 

- Location within SEQ ID NO 245: from 43 to 275 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 128 

- gi No. 2960364 
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- Description: (AJ224986) cinnamoyl CoA reductase [Populus 
balsamifera subsp. trichocarpa] 

- % Identity: 76.9 

- Alignment Length: 339 

- Location of Alignment in SEQ ID NO 245: from 35 to 371 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 24 6 

- Ceres seq_id 1582351 

- Location of start within SEQ ID NO 244: at 103 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

- 3-beta hydroxysteroid dehydrogenase/ isome rase family 

- Location within SEQ ID NO 24 6: from 9 to 241 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 9 

- gi No. 2960364 

- Description: (AJ224986) cinnamoyl CoA reductase [Populus 
balsamifera subsp. trichocarpa] 

- % Identity: 7 6.9 

- Alignment Length: 339 

- Location of Alignment in SEQ ID NO 24 6: from 1 to 337 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 247 

- Ceres seq_id 1582352 

- Location of start within SEQ ID NO 244: at 397 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- 3-beta hydroxysteroid dehydrogenase/isomerase family 

- Location within SEQ ID NO 247: from 1 to 143 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 130 

- gi No. 2960364 

- Description: (AJ224 986) cinnamoyl CoA reductase [Populus 
balsamifera subsp. trichocarpa] 

- % Identity: 76.9 

- Alignment Length: 339 

- Location of Alignment in SEQ ID NO 247: from 1 to 239 

Maximum Length Sequence: 

related to: 
Clone IDs: 
38357 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 248 

- Ceres seq_id 1582398 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 249 

- Ceres seq_id 1582399 

- Location of start within SEQ ID NO 248: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 249: from 27 to 102 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 131 

- gi No. 2497886 

- Description: METALLOTH I ONE IN-LIKE PROTEIN 2B (MT-2B) 
>gi|1361999|pir| 1S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 249: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 25 0 

- Ceres seq_id 1582400 

- Location of start within SEQ ID NO 248: at 81 nt . 

<C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 250: from 1 to 76 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 132 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi ! 1361999 Ipirj ! S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 250: from 1 to 77 

Maximum Length Sequence: 

related to: 
Clone IDs: 

118677 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 251 

- Ceres seq_id 1582409 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 252 

- Ceres seq_id 1582410 

- Location of start within SEQ ID NO 251: at 4 60 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 252: from 66 to 130 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 133 

- gi No. 4033468 

- Description: ARGININE/ SERINE-RICH SPLICING FACTOR RSP40 
>gi|2582641|embjCAA67800S (X99437) splicing factor [Arabidopsis thaliana] 
>gii 2980800 ! emb i CAA18176 . 1 i (AL022197) splicing factor At-SRp40 [Arabidopsis 
thaliana] 

- % Identity: 100 

- Alignment Length: 315 

- Location of Alignment in SEQ ID NO 252: from 3 to 317 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 253 

- Ceres seq_id 1582411 
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- Location of start within SEQ ID NO 251: at 484 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 253: from 58 to 122 aa . 

{Dp} Related Amino Acid Sequences 

- Alignment No. 134 

- gi No. 4033468 

- Description: ARGININE/ SERINE-RICH SPLICING FACTOR RSP40 
>gi|2582641iemb|CAA67800| (X99437) splicing factor [Arabidopsis thaliana] 
>gi ! 2980800 | emblCAA1817 6.1 [ (AL022197) splicing factor At-SRp40 [Arabidopsis 
thaliana] 

- % Identity: 100 

- Alignment Length: 315 

- Location of Alignment in SEQ ID NO 253: from 1 to 309 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 254 

- Ceres seq_id 1582412 

- Location of start within SEQ ID NO 251: at 637 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 254: from 7 to 71 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 135 

- gi No. 4033468 

- Description: ARGININE /SERINE-RICH SPLICING FACTOR RSP40 

>gi 12582641 jemb | CAA67800 | (X99437) splicing factor [Arabidopsis thaliana] 
>gi I 2980800 | emb | CAA18176.il (AL022197) splicing factor At-SRp40 [Arabidopsis 
thaliana] 

- % Identity: 100 

- Alignment Length: 315 

- Location of Alignment in SEQ ID NO 254: from 1 to 258 

Maximum Length Sequence: 

related to: 
Clone IDs: 

256719 

34593 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 255 

- Ceres seq_id 1582416 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 256 

- Ceres seq_id 1582417 

» Location of start within SEQ ID NO 255: at 4 9 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide {s ) 

- Zinc-binding dehydrogenases 

- Location within SEQ ID NO 256: from 18 to 352 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 136 

- gi No. 4914445 
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- Description: (AL050351) cinnamyl-alcohol dehydrogenase CADI 
[Arabidopsis thaliana] 

- % Identity: 99.7 

- Alignment Length: 360 

- Location of Alignment in SEQ ID NO 256: from 1 to 360 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 257 

- Ceres seq_id 1582418 

- Location of start within SEQ ID NO 255: at 406 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Zinc-binding dehydrogenases 

- Location within SEQ ID NO 257: from 1 to 233 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 137 

- gi No. 4914445 

- Description: (AL050351) cinnamyl-alcohol dehydrogenase CADI 
[Arabidopsis thaliana] 

- % Identity: 99.7 

- Alignment Length: 360 

- Location of Alignment in SEQ ID NO 257: from 1 to 241 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 258 

- Ceres seq___id 1582419 

- Location of start within SEQ ID NO 255: at 571 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Zinc-binding dehydrogenases 

- Location within SEQ ID NO 258: from 1 to 178 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 138 

- gi No. 4914445 

- Description: (AL050351) cinnamyl-alcohol dehydrogenase CADI 
[Arabidopsis thaliana] 

- % Identity: 99.7 

- Alignment Length: 360 

- Location of Alignment in SEQ ID NO 258: from 1 to 186 

Maximum Length Sequence: 

related to: 
Clone IDs: 
12459 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 259 

- Ceres seq_id 1582420 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 260 

- Ceres seq_id 1582421 

- Location of start within SEQ ID NO 259: at 52 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Prion protein 

- Location within SEQ ID NO 260: from 48 to 124 aa . 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 139 

- gi No. 166374 

- Description: (M74190) environmental stress-induced protein 
[Medicago sativa] 

- % Identity: 70.2 

- Alignment Length: 4 7 

- Location of Alignment in SEQ ID NO 260: from 56 to 102 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 261 

- Ceres seq_id 1582422 

- Location of start within SEQ ID NO 259: at 136 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Prion protein 

- Location within SEQ ID NO 261: from 20 to 96 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 140 

- gi No. 166374 

- Description: (M74190) environmental stress-induced protein 
[Medicago sativa] 

- % Identity: 70.2 

- Alignment Length: 47 

- Location of Alignment in SEQ ID NO 261: from 28 to 74 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 62 

- Ceres seq__id 1582423 

- Location of start within SEQ ID NO 259: at 191 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ELM2 domain 

» Location within SEQ ID NO 262: from 1 to 64 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

11937 

20712 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2 63 

- Ceres seq_id 1582548 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 264 

- Ceres seq_id 1582549 

- Location of start within SEQ ID NO 263: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 141 

- gi No. 2765837 

- Description: (Z96936) NAP1 6kDa protein [Arabidopsis thaliana 

- % Identity: 100 

- Alignment Length: 118 
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- Location of Alignment in SEQ ID NO 264: from 32 to 14 9 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 265 

- Ceres seq_id 1582550 

- Location of start within SEQ ID NO 263: at 95 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 14 2 

- gi No. 2765837 

- Description: (Z96936) NAP16kDa protein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 265: from 1 to 118 

Maximum Length Sequence: 

related to: 
Clone IDs: 

123936 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 266 

- Ceres seq_id 1582551 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 67 

- Ceres seq_id 1582552 

- Location of start within SEQ ID NO 266: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 14 3 

- gi No. 3881836 

- Description: (Z78019) Similarity to Yeast LPG22P protein 
(TR:G1151240) ; cDNA EST EMBL:T00686 comes from this gene; cDNA EST 
EMBL-.C12415 comes from this gene; cDNA EST EMBL:C12728 comes from this gene; 
cDNA EST EMBL:C10626 comes from this ge... 

- % Identity: 74.1 

- Alignment Length: 205 

- Location of Alignment in SEQ ID NO 267: from 45 to 249 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 68 

- Ceres seq_id 1582553 

- Location of start within SEQ ID NO 266: at 32 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 14 4 

- gi No. 3881836 

- Description: (Z78019) Similarity to Yeast LPG22P protein 
(TR:G1151240) ; cDNA EST EMBL:T00686 comes from this gene; cDNA EST 
EMBL:C12415 comes from this gene; cDNA EST EMBL:C12728 comes from this gene; 
cDNA EST EMBL:C10626 comes from this ge . . . 

- % Identity: 7 4.1 

- Alignment Length: 205 

- Location of Alignment in SEQ ID NO 268: from 35 to 239 
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(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 269 

- Ceres seq__id 1582554 

- Location of start within SEQ ID NO 266: at 789 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Acetyltransf erase (GNAT) family 

- Location within SEQ ID NO 269: from 154 to 302 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 145 

- gi No. 3881836 

- Description: (Z78019) Similarity to Yeast LPG22P protein 
(TR:G1151240) ; cDNA EST EMBL:T00686 comes from this gene; cDNA EST 
EMBL:C12415 comes from this gene; cDNA EST EMBL:C12728 comes from this gene 
cDNA EST EMBL:C10626 comes from this ge . . . 

- % Identity: 74.3 

- Alignment Length: 335 

- Location of Alignment in SEQ ID NO 269: from 1 to 312 



Maximum Length Sequence : 

related to: 
Clone IDs: 

122924 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 270 

- Ceres seq_id 1582559 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 271 

- Ceres seq__id 1582560 

- Location of start within SEQ ID NO 270: at 55 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 1 Cold-shock ' DNA-binding domain 

- Location within SEQ ID NO 271: from 24 to 75 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 14 6 

- gi No. 3036806 

- Description: (AL022373) glycine-rich protein [Arabidopsis 

thaliana] 

- % Identity: 99.4 

- Alignment Length: 17 6 

- Location of Alignment in SEQ ID NO 271: from 1 to 17 6 



Maximum Length Sequence : 

related to: 
Clone IDs: 

29177 

159151 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 272 

- Ceres seq_id 1582614 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 273 

- Ceres seq_id 1582615 

- Location of start within SEQ ID NO 272: at 107 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- short chain dehydrogenase 

- Location within SEQ ID NO 273: from 21 to 208 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 274 

- Ceres seq_id 1582616 

- Location of start within SEQ ID NO 272: at 485 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- short chain dehydrogenase 

- Location within SEQ ID NO 274: from 1 to 82 aa . 
{Dp} Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 275 

- Ceres seq_id 1582617 

- Location of start within SEQ ID NO 272: at 524 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- short chain dehydrogenase 

- Location within SEQ ID NO 275: from 1 to 69 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 
19958 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 27 6 

- Ceres seq_id 1582642 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 277 

- Ceres seq_id 1582643 

- Location of start within SEQ ID NO 27 6: at 123 nt . 

<C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 8 

- Ceres seq_id 1582644 

- Location of start within SEQ ID NO 276: at 135 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 9 

- Ceres seq_id 1582645 

- Location of start within SEQ ID NO 276: at 1249 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 27 9: from 1 to 155 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 
13741 
147816 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 280 

- Ceres seq_Id 1582654 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 281 

- Ceres seq_id 1582655 

- Location of start within SEQ ID NO 280: at 242 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 147 

- gi No. 1363484 

- Description: IAA13 protein - Arabidopsis thaliana >gi ! 972929 
(U18415) IAA13 [Arabidopsis thaliana] >gi 12459414 (AC002332) auxin inducible 
protein, IAA13 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 24 6 

- Location of Alignment in SEQ ID NO 281: from 1 to 246 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 282 

- Ceres seq_Id 1582656 

- Location of start within SEQ ID NO 280: at 260 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 148 

- gi No. 1363484 

- Description: IAA13 protein - Arabidopsis thaliana >gi I 972929 
(U18415) IAA13 [Arabidopsis thaliana] >gi 12459414 (AC002332) auxin inducible 
protein, IAA13 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 246 

- Location of Alignment in SEQ ID NO 282: from 1 to 240 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 283 

- Ceres seq_id 1582657 

- Location of start within SEQ ID NO 280: at 500 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 14 9 

- gi No. 1363484 

- Description: IAA13 protein - Arabidopsis thaliana >gi I 972929 
(U18415) IAA13 [Arabidopsis thaliana] >gi i 2459414 (AC002332) auxin inducible 
protein, IAA13 [Arabidopsis thaliana] 

- % Identity: 100 
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- Alignment Length: 24 6 

- Location of Alignment in SEQ ID NO 283: from 1 to 160 



Maximum Length Sequence: 

related to: 
Clone IDs: 

105146 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 284 

- Ceres seq_id 1582658 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 285 

- Ceres seq_id 1582659 

- Location of start within SEQ ID NO 284: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Leucine Rich Repeat 

- Location within SEQ ID NO 285: from 202 to 225 aa. 
(Dp) Related Amino Acid Sequences 



- Alignment No. 150 

- gi No. 2739389 

- Description: (AC002505) Cf-2.2 like protein [Arabidopsis 



thaliana] 

- % Identity: 74.5 

- Alignment Length: 47 9 

- Location of Alignment in SEQ ID NO 285: from 4 to 479 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 28 6 

- Ceres seq_id 1582660 

- Location of start within SEQ ID NO 284: at 12 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

- Leucine Rich Repeat 

- Location within SEQ ID NO 286: from 199 to 222 aa. 
(Dp) Related Amino Acid Sequences 



- Alignment No. 151 

- gi No. 2739389 

- Description: (AC002505) Cf-2.2 like protein [Arabidopsis 



thaliana] 

- % Identity: 74.5 

- Alignment Length: 4 79 

- Location of Alignment in SEQ ID NO 28 6: from 1 to 47 6 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 287 

- Ceres seq_id 1582661 

- Location of start within SEQ ID NO 284: at 609 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Leucine Rich Repeat 

- Location within SEQ ID NO 287: from 1 to 23 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 152 

- gi No. 2739389 
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- Description: (AC002505) Cf-2.2 like protein [Arabidopsis 

thaliana] 

- % Identity: 74.5 

- Alignment Length: 47 9 

- Location of Alignment in SEQ ID NO 287: from 1 to 277 

Maximum Length Sequence: 

related to: 
Clone IDs: 

150823 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 288 

- Ceres seq_id 1582689 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 28 9 

- Ceres seq_id 1582690 

- Location of start within SEQ ID NO 288: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glycosyl hydrolase family 9 

- Location within SEQ ID NO 289: from 77 to 535 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 153 

- gi No. 3377800 

- Description: (AF075597) similar to glycosyl hydrolases family 9 
(PFam: glycosyl JiydroS.hmm, score: 100.70) [Arabidopsis thaliana] 

- % Identity: 99.2 

- Alignment Length: 516 

- Location of Alignment in SEQ ID NO 289: from 27 to 542 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 90 

- Ceres seq_id 1582691 

- Location of start within SEQ ID NO 288: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glycosyl hydrolase family 9 

- Location within SEQ ID NO 290: from 51 to 509 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 154 

- gi No. 3377800 

- Description: (AF075597) similar to glycosyl hydrolases family 9 
(PFam:glycosyl_hydro5.hmm, score: 100.70) [Arabidopsis thaliana] 

- % Identity: 99.2 

- Alignment Length: 516 

- Location of Alignment in SEQ ID NO 290: from 1 to 516 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 291 

- Ceres seq_id 1582692 

- Location of start within SEQ ID NO 288: at 312 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Glycosyl hydrolase family 9 

- Location within SEQ ID NO 291: from 1 to 432 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 155 

- gi No. 3377800 

- Description: (AF075597) similar to glycosyl hydrolases family 9 
(PFam:glycosyl_hydro5.hmm, score: 100.70) [Arabidopsis thaliana] 

- % Identity: 99.2 

- Alignment Length: 516 

- Location of Alignment in SEQ ID NO 291: from 1 to 439 

Maximum Length Sequence: 

related to: 
Clone IDs: 
41563 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 292 

- Ceres seq_id 1582700 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 293 

- Ceres seq__id 1582701 

- Location of start within SEQ ID NO 292: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 156 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 

>gi i 1361999 |pir i | S57862 metallothionein 2b - Arabidopsis thaliana >gill086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 2 4 

- Location of Alignment in SEQ ID NO 293: from 27 to 50 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 294 

- Ceres seq_id 1582702 

- Location of start within SEQ ID NO 2 92: at 81 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 157 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 
>giU3619991pirMS57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 24 

- Location of Alignment in SEQ ID NO 294: from 1 to 24 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 95 

- Ceres seq_id 1582703 

- Location of start within SEQ ID NO 292: at 354 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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Maximum Length Sequence : 
related to: 
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Clone IDs: 
41891 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 296 

- Ceres seq_id 1582712 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 297 

- Ceres seq_id 1582713 

- Location of start within SEQ ID NO 296: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 297: from 27 to 102 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 158 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 

>gi i 1361999 |pir M S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 2 97: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 298 

- Ceres seq__id 1582714 

- Location of start within SEQ ID NO 296: at 81 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 298: from 1 to 76 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 15 9 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B (MT-2B) 
>giU361999|pir| IS57862 metallothionein 2b - Arabidopsis thaliana >gijl086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 298: from 1 to 77 

Maximum Length Sequence: 

related to: 
Clone IDs: 
5024 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2 99 

- Ceres seq_id 1582741 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 300 

- Ceres seq_id 1582742 

- Location of start within SEQ ID NO 299: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 300: from 27 to 102 aa. 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 7 0 



(Dp) Related Amino Acid Sequences 

- Alignment No. 160 

- gi No. 2497886 

- Description: METALLOT HI ONE IN-LIKE PROTEIN 2B (MT-2B) 

>gi I 1361999 ipir i | S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 300: from 27 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 301 

- Ceres seq_id 1582743 

- Location of start within SEQ ID NO 299: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 301: from 1 to 7 6 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 161 

- gi No. 2497886 

- Description: METALLOTHIONEIN-LIKE PROTEIN 2B { MT - 2 B ) 

>gi I 1361999 ipir M S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 301: from 1 to 77 

Maximum Length Sequence: 

related to: 
Clone IDs: 

2420 

3475 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 302 

- Ceres seq_id 1582786 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 303 

- Ceres seq_id 1582787 

- Location of start within SEQ ID NO 302: at 98 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 162 

- gi No. 1352054 

- Description: ATP SYNTHASE 6 KD SUBUNIT, MITOCHONDRIAL 

- % Identity: 72 

- Alignment Length: 25 

- Location of Alignment in SEQ ID NO 303: from 1 to 25 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 304 

- Ceres seq__id 1582788 

- Location of start within SEQ ID NO 302: at 115 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

109339 

10517 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 305 

- Ceres seq_id 1582825 
( B ) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 306 

- Ceres seq_id 1582826 

- Location of start within SEQ ID NO 305: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 306: from 53 to 169 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 163 

- gi No. 1346251 

- Description: HISTONE H2B.4 >gi | 577 8 1 9 | emb j CAA4 9585 1 (X69961) H2B 
histone [Zea mays] 

- % Identity: 83.9 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 306: from 54 to 170 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 307 

- Ceres seq_id 1582827 

- Location of start within SEQ ID NO 305: at 224 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 307: from 1 to 95 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 164 

- gi No. 1346251 

- Description: HISTONE H2B.4 >gi | 577819 i emb i CAA49585 | (X69961) H2B 
histone [Zea mays] 

- % Identity: 83.9 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 307: from 1 to 96 

Maximum Length Sequence: 

related to: 
Clone IDs: 

6336 

24664 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 308 

- Ceres seq_id 1582927 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 309 

- Ceres seq_id 1582928 

- Location of start within SEQ ID NO 308: at 3 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- alpha/beta hydrolase fold 

- Location within SEQ ID NO 309: from 73 to 340 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 310 

- Ceres seq_id 1582929 

- Location of start within SEQ ID NO 308: at 60 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- alpha/beta hydrolase fold 

- Location within SEQ ID NO 310: from 54 to 321 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

■ Pat. Appln. SEQ ID NO 311 

- Ceres seq_id 1582930 

- Location of start within SEQ ID NO 308: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- alpha/beta hydrolase fold 

- Location within SEQ ID NO 311: from 47 to 314 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 
22677 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ, ID NO 312 

- Ceres seq_id 1582959 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 313 

- Ceres seq_id 1582960 

- Location of start within SEQ ID NO 312: at 80 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Dehydrins 

- Location -within SEQ ID NO 313: from 37 to 113 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 165 

- gi No. 4972049 

- Description: (AL078470) glycine-rich protein like [Arabidopsis 

thaliana] 

- % Identity: 100 

- Alignment Length: 115 

- Location of Alignment in SEQ ID NO 313: from 1 to 115 

Maximum Length Sequence : 

related to: 
Clone IDs: 
10008 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 314 

- Ceres seq_icl 1582997 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 315 

- Ceres seq_id 1582998 

- Location of start within SEQ ID NO 314: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Proteasome A-type and B-type 

- Location within SEQ ID NO 315: from 44 to 190 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 166 

- gi No. 2511588 

- Description: (Y13691) multicatalytic endopeptidase complex, 
proteasome component, alpha subunit [Arabidopsis thaliana] 

- % Identity: 99.6 

- Alignment Length: 245 

- Location of Alignment in SEQ ID NO 315: from 8 to 252 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 316 

- Ceres seq_id 1582999 

- Location of start within SEQ ID NO 314: at 21 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Proteasome A-type and B-type 

- Location within SEQ ID NO 316: from 38 to 184 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 167 

- gi No. 2511588 

- Description: (Y13691) multicatalytic endopeptidase complex, 
proteasome component, alpha subunit [Arabidopsis thaliana] 

- % Identity: 99.6 

- Alignment Length: 245 

- Location of Alignment in SEQ ID NO 316: from 2 to 24 6 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 317 

- Ceres seq_id 1583000 

- Location of start within SEQ ID NO 314: at 345 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Proteasome A-type and B-type 

- Location within SEQ ID NO 317: from 1 to 76 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 168 

- gi No. 2511588 

- Description: (Y13691) multicatalytic endopeptidase complex, 
proteasome component, alpha subunit [Arabidopsis thaliana] 

- % Identity: 99.6 

- Alignment Length: 24 5 

- Location of Alignment in SEQ ID NO 317: from 1 to 138 



Maximum Length Sequence: 
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related to: 
Clone IDs: 
95922 
30539 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 318 

- Ceres seq_id 1583044 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 319 

- Ceres seq_id 1583045 

- Location of start within SEQ ID NO 318: at 99 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ATP synthase B/B' CF(0) 

- Location within SEQ ID NO 319: from 87 to 218 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 169 

- gi No. 2864617 

- Description: (AL021811) H+-transporting ATP synthase chain9 - 
like protein [Arabidopsis thalianaj >gi I 573014 1 [ emb | CAB52473 . 1 1 (AJ245574) 
ATP synthase beta chain precursor (subunit II) [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 219 

- Location of Alignment in SEQ ID NO 319: from 1 to 219 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 320 

- Ceres seq_id 1583046 

- Location of start within SEQ ID NO 318: at 117 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ATP synthase B/B 1 CF(0) 

- Location within SEQ ID NO 320: from 81 to 212 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 17 0 

- gi No. 2864617 

- Description: (AL021811) H+-transporting ATP synthase chain9 - 
like protein [Arabidopsis thaliana] >gi I 5730141 1 emb j CAB52473 . 1 1 (AJ245574) 
ATP synthase beta chain precursor (subunit II) [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 219 

- Location of Alignment in SEQ ID NO 320: from 1 to 213 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 321 

- Ceres seq_id 1583047 

- Location of start within SEQ ID NO 318: at 312 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ATP synthase B/B' CF(0) 

- Location within SEQ ID NO 321: from 16 to 147 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 171 

- gi No. 2864617 
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- Description: (AL021811) H+-transporting ATP synthase chain9 - 
like protein [Arabidopsis thaliana] >gi j 5730141 | emb I CAB52473 * 1 [ (AJ245574) 
ATP synthase beta chain precursor (subunit II) [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 219 

- Location of Alignment in SEQ ID NO 321: from 1 to 148 



Maximum Length Sequence: 

related to: 
Clone IDs: 
2662 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 322 

- Ceres seq_id 1583080 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 323 

- Ceres seq_id 1583081 

- Location of start within SEQ ID NO 322: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glutamine synthetase 

- Location within SEQ ID NO 323: from 47 to 374 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 172 

- gi No. 99697 

- Description: glutamate — ammonia ligase (EC 6.3.1.2), cytosolic 
{clone lambdaAtgsr2) - Arabidopsis thaliana 

- % Identity: 98 

- Alignment Length: 358 

- Location of Alignment in SEQ ID NO 323: from 31 to 386 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 324 

- Ceres seq_id 1583082 

- Location of start within SEQ ID NO 322: at 91 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glutamine synthetase 

- Location within SEQ ID NO 324: from 17 to 344 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 17 3 

- gi No. 99697 

- Description: glutamate — ammonia ligase (EC 6.3.1.2), cytosolic 
(clone lambdaAtgsr2) - Arabidopsis thaliana 

- % Identity: 98 

- Alignment Length: 358 

- Location of Alignment in SEQ ID NO 324: from 1 to 356 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 325 

- Ceres seq_id 1583083 

- Location of start within SEQ ID NO 322: at 181 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glutamine synthetase 

- Location within SEQ ID NO 325: from 1 to 314 aa . 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 174 

- gi No. 99697 

- Description: glutamate — ammonia ligase (EC 6.3.1.2), cytosolic 
(clone lambdaAtgsr2) - Arabidopsis thaliana 

- % Identity: 98 

- Alignment Length: 358 

- Location of Alignment in SEQ ID NO 325: from 1 to 326 

Maximum Length Sequence: 

related to: 
Clone IDs: 
6935 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 326 

- Ceres seq_id 1583099 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 327 

- Ceres seq_id 1583100 

- Location of start within SEQ ID NO 326: at 109 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Tubulin 

- Location within SEQ ID NO 327: from 1 to 324 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 175 

- gi No. 135467 

- Description: TUBULIN BETA- 4 CHAIN >gi | 212954 6 j pir |[ S68122 beta 
tubulin 4 - Arabidopsis thaliana >gi 1166640 (M21415) beta-tubulin 
[Arabidopsis thaliana] 

- % Identity: 99.5 

- Alignment Length: 444 

- Location of Alignment in SEQ ID NO 327: from 1 to 324 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 328 

- Ceres seq_id 1583101 

- Location of start within SEQ ID NO 326: at 304 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Tubulin 

- Location within SEQ ID NO 328: from 1 to 259 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 17 6 

- gi No. 135467 

- Description: TUBULIN BETA- 4 CHAIN >gi 1 212 954 6 | pir [| S68 122 bets 
tubulin 4 - Arabidopsis thaliana >gi 1166640 (M21415) beta-tubulin 
[Arabidopsis thaliana] 

- % Identity: 99.5 

- Alignment Length: 4 44 

- Location of Alignment in SEQ ID NO 328: from 1 to 259 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 329 

- Ceres seq_id 1583102 

- Location of start within SEQ ID NO 326: at 325 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Tubulin 

- Location within SEQ ID NO 329: from 1 to 252 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 17 7 

- gi No. 135467 

- Description: TUBULIN BETA- 4 CHAIN >gi | 212954 6 | pir i | S68122 beta- 
tubulin 4 - Arabidopsis thaliana >gi 1166640 (M21415) beta-tubulin 
[Arabidopsis thaliana] 

- % Identity: 99.5 

- Alignment Length: 44 4 

- Location of Alignment in SEQ ID NO 329: from 1 to 252 

Maximum Length Sequence : 

related to: 
Clone IDs: 
12970 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 330 

- Ceres seq_id 1583160 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 331 

- Ceres seq__id 1583161 

- Location of start within SEQ ID NO 330: at 157 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 17 8 

- gi No. 1944132 

- Description: (AB002560) CUC2 [Arabidopsis thaliana] 

- % Identity: 78.6 

- Alignment Length: 154 

- Location of Alignment in SEQ ID NO 331: from 23 to 174 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 332 

- Ceres seq__id 1583162 

- Location of start within SEQ ID NO 330: at 178 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 17 9 

- gi No. 1944132 

- Description: (AB002560) CUC2 [Arabidopsis thaliana] 

- % Identity: 78.6 

- Alignment Length: 154 

- Location of Alignment in SEQ ID NO 332: from 16 to 167 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 333 

- Ceres seq_id 1583163 

- Location of start within SEQ ID NO 330: at 217 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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- Alignment No. 180 

- gi No. 1944132 

- Description: (AB002560) CUC2 [Arabidopsis thaliana] 

- % Identity: 78.6 

- Alignment Length: 154 

- Location of Alignment in SEQ ID NO 333: from 3 to 154 

Maximum Length Sequence : 

related to: 
Clone IDs: 

122389 

34635 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 334 

- Ceres seq_id 1583171 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 335 

- Ceres seq__id 1583172 

- Location of start within SEQ ID NO 334: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide {s ) 

- SRF-type transcription factor {DNA-binding and dimerisation 

domain) 

- Location within SEQ ID NO 335: from 11 to 69 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 181 

- gi No. 543815 

- Description: FLORAL HOMEOTIC PROTEIN APETALA3 

>gi| 282855 |pirj|A42095 homeotic protein AP3 - Arabidopsis thaliana >gi|166608 
(M8 6357) APETELA3 [Arabidopsis thaliana] 

- % Identity: 99.1 

- Alignment Length: 232 

- Location of Alignment in SEQ ID NO 335: from 11 to 242 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 336 

- Ceres seq_id 1583173 

- Location of start within SEQ ID NO 334: at 33 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- SRF-type transcription factor {DNA-binding and dimerisation 

domain) 

- Location within SEQ ID NO 336: from 1 to 5 9 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 182 

- gi No. 543815 

- Description: FLORAL HOMEOTIC PROTEIN APETALA3 

>gi i 282855 |pirMA42095 homeotic protein AP3 - Arabidopsis thaliana >gi|166608 
(M8 6357) APETELA3 [Arabidopsis thaliana] 

- % Identity: 99.1 

- Alignment Length: 232 

- Location of Alignment in SEQ ID NO 336: from 1 to 232 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 337 

- Ceres seq_id 1583174 

- Location of start within SEQ ID NO 334: at 171 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 183 

- gi No. 543815 

- Description: FLORAL HOMEOTIC PROTEIN APETALA3 

>gi I 282855 tpir | iA42095 homeotic protein AP3 - Arabidopsis thaliana >gi I 166608 
(M8 6357) APETELA3 [Arabidopsis thaliana] 

- % Identity: 99.1 

- Alignment Length: 232 

- Location of Alignment in SEQ ID NO 337: from 1 to 18 6 

Maximum Length Sequence: 

related to: 
Clone IDs: 
21192 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 338 

- Ceres seq_id 1583175 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 339 

- Ceres seq_id 1583176 

- Location of start within SEQ ID NO 338: at 80 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Lipase/Acylhydrolase with GDSL-like motif 

- Location within SEQ ID NO 339: from 40 to 184 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 340 

- Ceres seq_id 1583177 

- Location of start within SEQ ID NO 338: at 260 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Lipase/Acylhydrolase with GDSL-like motif 

- Location within SEQ ID NO 340: from 1 to 124 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 341 

- Ceres seq_id 1583178 

- Location of start within SEQ ID NO 338: at 362 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Lipase/Acylhydrolase with GDSL-like motif 

- Location within SEQ ID NO 341: from 1 to 90 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

38106 

38538 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 342 

- Ceres seq_id 1583318 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 34 3 

- Ceres seq__id 1583319 

- Location of start within SEQ ID NO 342: at 2 nt . 

{C} Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Uricase 

- Location within SEQ ID NO 343: from 58 to 343 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 18 4 

- gi No. 3075395 

- Description: (AC004484) nodulin-35 homologue [Arabidopsis 

thaliana] 

- % Identity: 100 

- Alignment Length: 2 93 

- Location of Alignment in SEQ ID NO 343: from 51 to 343 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 34 4 

- Ceres seq_id 1583320 

- Location of start within SEQ ID NO 342: at 152 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Uricase 

- Location within SEQ ID NO 344: from 8 to 293 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 185 

- gi No. 3075395 

- Description: (AC004484) nodulin-35 homologue [Arabidopsis 

thaliana] 

- % Identity: 100 

- Alignment Length: 2 93 

- Location of Alignment in SEQ ID NO 344: from 1 to 293 



Maximum Length Sequence : 

related to: 
Clone IDs: 
3807 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 34 5 

- Ceres seq_id 1583381 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 34 6 

- Ceres seq__id 1583382 

- Location of start within SEQ ID NO 345: at 38 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- LBP / BPI / CETP family 

- Location within SEQ ID NO 346: from 31 to 405 aa. 



(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 347 

- Ceres seq_id 1583383 

- Location of start within SEQ ID NO 345: at 230 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- LBP / BPI / CETP family 

- Location within SEQ ID NO 347: from 1 to 341 aa . 



(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 348 

- Ceres seq_id 1583384 

- Location of start within SEQ ID NO 345: at 371 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s } 

- LBP / BPI / CETP family 

- Location within SEQ ID NO 348: from 1 to 294 aa. 



(Dp) Related Amino Acid Sequences 



Maximum Length Sequence: 

related to: 
Clone IDs: 

10344 

11375 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 34 9 

- Ceres seq_id 1583403 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 350 

- Ceres seq_id 1583404 

- Location of start within SEQ ID NO 34 9: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 351 

- Ceres seq_id 1583405 

- Location of start within SEQ ID NO 34 9: at 46 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 352 

- Ceres seq_id 1583406 

- Location of start within SEQ ID NO 34 9: at 381 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Helix-loop-helix DNA-binding domain 

- Location within SEQ ID NO 352: from 1 to 31 aa. 



(Dp) Related Amino Acid Sequences 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

120540 

253173 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 353 

- Ceres seq__id 1583478 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 354 

- Ceres seq_id 1583479 

- Location of start within SEQ ID NO 353: at 121 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- O-methyltransf erase 

- Location within SEQ ID NO 354: from 97 to 338 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 18 6 

- gi No. 2781394 

- Description: (U70424) O-methyltransf erase 1 [Arabidopsis 

thaliana] 

- % Identity: 99.7 

- Alignment Length: 363 

- Location of Alignment in SEQ ID NO 354: from 1 to 363 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 355 

- Ceres seq_id 1583480 

- Location of start within SEQ ID NO 353: at 190 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- O-methyltransferase 

- Location within SEQ ID NO 355: from 74 to 315 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 187 

- gi No. 2781394 

- Description: (U70424) O-methyltransferase 1 [Arabidopsis 

thaliana] 

- % Identity: 99.7 

- Alignment Length: 3 63 

- Location of Alignment in SEQ ID NO 355: from 1 to 340 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 356 

- Ceres seq_id 1583481 

- Location of start within SEQ ID NO 353: at 220 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- O-methyltransferase 

- Location within SEQ ID NO 356: from 64 to 305 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 188 

- gi No. 2781394 

- Description: (U70424) O-methyltransferase 1 [Arabidopsis 

thaliana] 
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- % Identity: 99.7 

- Alignment Length: 363 

- Location of Alignment in SEQ ID NO 356: from 1 to 330 

Maximum Length Sequence: 

related to: 
Clone IDs: 

253505 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 357 

- Ceres seq_id 1583482 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 358 

- Ceres seq_id 1583483 

- Location of start within SEQ ID NO 357: at 206 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 358: from 83 to 360 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 18 9 

- gi No. 2351097 

- Description: (AB006810) ATMRK1 [Arabidopsis thaliana] 

- % Identity: 99.5 

- Alignment Length: 391 

- Location of Alignment in SEQ ID NO 358: from 1 to 391 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 35 9 

- Ceres seq_id 1583484 

- Location of start within SEQ ID NO 357: at 356 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 359: from 33 to 310 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 190 

- gi No. 2351097 

- Description: (AB006810) ATMRK1 [Arabidopsis thaliana] 

- % Identity: 99.5 

- Alignment Length: 391 

- Location of Alignment in SEQ ID NO 35 9: from 1 to 341 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3 60 

- Ceres seq_id 1583485 

- Location of start within SEQ ID NO 357: at 422 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 360: from 11 to 288 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 191 

- gi No. 2351097 

- Description: (AB006810) ATMRK1 [Arabidopsis thaliana] 
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- % Identity: 99.5 

- Alignment Length: 3 91 

- Location of Alignment in SEQ ID NO 360: from 1 to 319 

Maximum Length Sequence; 

related to: 
Clone IDs: 

263181 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 361 

- Ceres seq_id 1583521 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 362 

- Ceres seq_id 1583522 

- Location of start within SEQ ID NO 361: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Metallothionein 

- Location within SEQ ID NO 362: from 24 to 99 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 192 

- gi No. 2497886 

- Description: METALLOTHIONE IN-LIKE PROTEIN 2B (MT-2B) 
>gi|1361999|pir IJS57862 metallothionein 2b - Arabidopsis thaliana >gi{1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 362: from 24 to 100 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 363 

- Ceres seq_id 1583523 

- Location of start within SEQ ID NO 361: at 70 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 363: from 1 to 76 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 193 

- gi No. 2497886 

- Description: METALLOTHIONE IN- LIKE PROTEIN 2B (MT-2B) 

>gi | 1361999|pir j (S57862 metallothionein 2b - Arabidopsis thaliana >gi|1086463 
(U11256) metallothionein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 7 7 

- Location of Alignment in SEQ ID NO 363: from 1 to 77 



Maximum Length Sequence: 

related to: 
Clone IDs: 

153182 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 364 

- Ceres seq_id 1583528 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 365 

- Ceres seq_id 1583529 
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- Location of start within SEQ ID NO 3 64: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 194 

- gi No. 2462746 

- Description: {AC002292} Similar to ATP-citrate-lyase [Arabidopsis 

thaliana] 

- % Identity: 92.4 

- Alignment Length: 105 

- Location of Alignment in SEQ ID NO 365: from 1 to 104 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 366 

- Ceres seq_id 1583530 

- Location of start within SEQ ID NO 364: at 66 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 195 

- gi No. 2462746 

- Description: (AC002292) Similar to ATP-citrate-lyase [Arabidopsis 

thaliana] 

- % Identity: 92.4 

- Alignment Length: 105 

- Location of Alignment in SEQ ID NO 366: from 1 to 83 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 367 

- Ceres seq__id 1583531 

- Location of start within SEQ ID NO 364: at 111 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 196 

- gi No. 2462746 

- Description: (AC002292) Similar to ATP-citrate-lyase [Arabidopsis 

thaliana] 

- % Identity: 92.4 

- Alignment Length: 105 

- Location of Alignment in SEQ ID NO 367: from 1 to 68 

Maximum Length Sequence : 

related to: 
Clone IDs: 

256020 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 368 

- Ceres seq_id 1583561 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 369 

- Ceres seq_id 1583562 

- Location of start within SEQ ID NO 368: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 369: from 20 to 279 aa. 
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(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 37 0 

- Ceres seq_id 1583563 

- Location of start within SEQ ID NO 368: at 195 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 370: from 1 to 215 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 371 

- Ceres seq_id 1583564 

- Location of start within SEQ ID NO 368: at 219 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 371: from 1 to 207 aa . 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 372 

- Ceres seq_id 1583637 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 373 

- Ceres seq__id 1583638 

- Location of start within SEQ ID NO 372: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 197 

- gi No. 5724774 

- Description; (AF160183) contains similarity to retrotransposi 
may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 91 

- Alignment Length: 145 

- Location of Alignment in SEQ ID NO 373: from 1 to 145 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 37 4 

- Ceres seq_id 1583640 

- Location of start within SEQ ID NO 372: at 142 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 198 

- gi No. 5724774 

- Description: (AF160183) contains similarity to retrotranspos 
may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 91 
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- Alignment Length: 145 

- Location of Alignment in SEQ ID NO 374: from 1 to 98 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 375 

- Ceres seq_id 1583860 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 37 6 

- Ceres seq_id 1583861 

- Location of start within SEQ ID NO 375: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 37 6: from 127 to 287 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 37 7 

- Ceres seq_JLd 1583862 

- Location of start within SEQ ID NO 375: at 44 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 377: from 113 to 273 aa. 
{Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 37 8 

- Ceres seq_id 1583863 

- Location of start within SEQ ID NO 375: at 50 nt . 

fC) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 378: from 111 to 271 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 
94594 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 37 9 

- Ceres seq_id 1583911 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 38 0 

- Ceres seq_id 1583912 

- Location of start within SEQ ID NO 37 9: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- haloacid dehalogenase-like hydrolase 

- Location within SEQ ID NO 380: from 129 to 314 aa . 
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(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 381 

- Ceres seq_id 1583914 

- Location of start within SEQ ID NO 379: at 157 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- haloacid dehalogenase-like hydrolase 

- Location within SEQ ID NO 381: from 77 to 262 aa. 

(Dp) Related Amino Acid Sequences 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 382 

- Ceres seq_id 1584116 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 383 

- Ceres seq_id 1584117 

- Location of start within SEQ ID NO 382: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glycosyl hydrolase family 3 

- Location within SEQ ID NO 383: from 75 to 138 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 199 

- gi No. 3582436 

- Description: (AB017502) beta-D-glucan exohydrolase [Nicotiana 

tabacum] 

- % Identity: 81.8 

- Alignment Length: 137 

- Location of Alignment in SEQ ID NO 383: from 1 to 137 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 38 4 

- Ceres seq_id 1584187 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 385 

- Ceres seq_id 1584188 

- Location of start within SEQ ID NO 384: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 200 

- gi No. 4325366 

- Description: (AF128396) similar to maize transposon MuDR-like 
proteins [Arabidopsis thaliana] 

- % Identity: 91 

- Alignment Length: 8 9 

- Location of Alignment in SEQ ID NO 385: from 3 to 91 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 386 
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- Ceres seq_id 1584189 

- Location of start within SEQ ID NO 384: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 387 

- Ceres seq_id 1584335 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 388 

- Ceres seq_id 1584336 

- Location of start within SEQ ID NO 387: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 01 

- gi No. 2828267 

- Description: (Y14044) geranylgeranyl reductase [Arabidopsis 

thaliana] 

- % Identity: 100 

- Alignment Length: 169 

- Location of Alignment in SEQ ID NO 388: from 1 to 169 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 389 

- Ceres seq_id 1584337 

- Location of start within SEQ ID NO 387: at 169 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 202 

- gi No. 2828267 

- Description: (Y14 044) geranylgeranyl reductase [Arabidopsis 

thaliana] 

- % Identity: 100 

- Alignment Length: 169 

- Location of Alignment in SEQ ID NO 389: from 1 to 113 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 390 

- Ceres seq_id 1584338 

- Location of start within SEQ ID NO 387: at 214 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 203 

- gi No. 2828267 

- Description: (Y14044) geranylgeranyl reductase [Arabidopsis 

thaliana] 

- % Identity: 100 

- Alignment Length: 169 

- Location of Alignment in SEQ ID NO 390: from 1 to 98 
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Maximum Length Sequence: 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 391 

- Ceres seq_id 1584543 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 392 

- Ceres seq_id 1584544 

- Location of start within SEQ ID NO 391: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sialyltransf erase family 

- Location within SEQ ID NO 392: from 152 to 217 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3 93 

- Ceres seq_id 1584545 

- Location of start within SEQ ID NO 391: at 293 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sialyltransf erase family 

- Location within SEQ ID NO 393: from 55 to 120 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 394 

- Ceres seq_id 1584546 

- Location of start within SEQ ID NO 391: at 38 9 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sialyltransf erase family 

- Location within SEQ ID NO 394: from 23 to 88 aa. 



Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 395 

- Ceres seq_id 1585005 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 396 

- Ceres seq_id 1585006 

- Location of start within SEQ ID NO 395: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- alpha/beta hydrolase fold 

- Location within SEQ ID NO 396: from 97 to 348 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 397 

- Ceres seq_id 1585007 

- Location of start within SEQ ID NO 395: at 264 nt . 



(Dp) Related Amino Acid Sequences 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- alpha/beta hydrolase fold 

- Location within SEQ ID NO 397: from 10 to 261 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 398 

- Ceres seq_id 1585008 

- Location of start within SEQ ID NO 395: at 492 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- alpha/beta hydrolase fold 

- Location within SEQ ID NO 398: from 1 to 185 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 399 

- Ceres seq__id 1585020 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 400 

- Ceres seq_id 1585021 

- Location of start within SEQ ID NO 399: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- NADH-ubiquinone/plastoquinone oxidoreductase chain 6 

- Location within SEQ ID NO 400: from 279 to 416 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 04 

- gi No. 3834302 

- Description: (AC005679) Similar to gb|D45384 vacuolar H+- 
pyrophosphatase from Oryza sativa. ESTs gb|F14272 and gb|F14273 come from 
this gene. [Arabidopsis thaliana] 

- % Identity: 91.7 

- Alignment Length: 519 

- Location of Alignment in SEQ ID NO 400: from 1 to 519 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 401 

- Ceres seq_id 1585022 

- Location of start within SEQ ID NO 399: at 5 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- NADH-ubiquinone/plastoquinone oxidoreductase chain 6 

- Location within SEQ ID NO 401: from 278 to 415 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 205 

- gi No. 3834302 

- Description: (AC005679) Similar to gbiD45384 vacuolar H+- 
pyrophosphatase from Oryza sativa. ESTs gb|Fl4272 and gb|F!4273 come fron 
this gene. [Arabidopsis thaliana] 
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- % Identity: 91.7 

- Alignment Length; 519 

- Location of Alignment in SEQ ID NO 401: from 1 to 518 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 402 

- Ceres seq_id 1585023 

- Location of start within SEQ ID NO 399: at 62 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- NADH-ubiquinone/plastoquinone oxidoreductase chain 6 

- Location within SEQ ID NO 402: from 259 to 396 aa. 

{Dp) Related Amino Acid Sequences 

- Alignment No. 20 6 

- gi No. 3834302 

- Description: (AC005679) Similar to gb|D45384 vacuolar H + - 
pyrophosphatase from Oryza sativa. ESTs gb|F14272 and gbjF14273 come from 
this gene. [Arabidopsis thaliana] 

- % Identity: 91.7 

- Alignment Length: 519 

- Location of Alignment in SEQ ID NO 402: from 1 to 499 
Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 4 03 

- Ceres seq_id 1585024 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 404 

- Ceres seq__id 1585025 

- Location of start within SEQ ID NO 403: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- NADH-ubiquinone/plastoquinone oxidoreductase chain 6 

- Location within SEQ ID NO 404: from 279 to 416 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 207 

- gi No. 3834302 

- Description: (AC005679) Similar to gb|D45384 vacuolar H+- 
pyrophosphatase from Oryza sativa. ESTs gb|F14272 and gb|F14273 come from 
this gene. [Arabidopsis thaliana] 

- % Identity: 91.7 

- Alignment Length: 519 

- Location of Alignment in SEQ ID NO 404: from 1 to 519 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 405 

- Ceres seq_id 1585026 

- Location of start within SEQ ID NO 403: at 5 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- NADH-ubiquinone/plastoquinone oxidoreductase chain 6 

- Location within SEQ ID NO 405: from 278 to 415 aa. 



(Dp) Related Amino Acid Sequences 
- Alignment No. 208 
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- gi No. 3834302 

- Description: (AC005679) Similar to gb|D45384 vacuolar H+- 
pyrophosphatase from Oryza sativa. ESTs gb|F14272 and gb)F14273 come from 
this gene. [Arabidopsis thaliana] 

- % Identity: 91.7 

- Alignment Length: 519 

- Location of Alignment in SEQ ID NO 405: from 1 to 518 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 06 

- Ceres seq_id 1585027 

- Location of start within SEQ ID NO 403: at 62 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- NADH-ubiquinone/plastoquinone oxidoreductase chain 6 

- Location within SEQ ID NO 406: from 259 to 396 aa. 

{Dp) Related Amino Acid Sequences 

- Alignment No. 209 

- gi No. 3834302 

- Description: (AC005679) Similar to gbjD45384 vacuolar H+- 
pyrophosphatase from Oryza sativa. ESTs gblF14272 and gbjF14273 come from 
this gene. [Arabidopsis thaliana] 

- % Identity: 91.7 

- Alignment Length: 519 

- Location of Alignment in SEQ ID NO 406: from 1 to 499 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 4 07 

- Ceres seq_id 1585047 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 408 

- Ceres seq_id 1585048 

- Location of start within SEQ ID NO 407: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 210 

- gi No. 3695397 

- Description: (AF096372) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 86.4 

- Alignment Length: 345 

- Location of Alignment in SEQ ID NO 408: from 1 to 339 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 09 

- Ceres seq_id 1585050 

- Location of start within SEQ ID NO 407: at 439 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 211 

- gi No. 3695397 

- Description: (AF096372) No definition line found [Arabidopsis 

thaliana] 
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- % Identity: 86.4 

- Alignment Length: 34 5 

- Location of Alignment in SEQ ID NO 409: from 1 to 193 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 410 

- Ceres seq_id 1585207 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 411 

- Ceres seq_id 1585208 

- Location of start within SEQ ID NO 410: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

(Dp) Related Amino Acid Sequences 

- Alignment No. 212 

- gi No. 3695395 

- Description: (AF096372) contains similarity to reverse 
transcriptase (Pfam: PF00078 rvt, E=4.3e-Q8) [Arabidopsis thaliana] 

- % Identity: 74.3 

- Alignment Length: 113 

- Location of Alignment in SEQ ID NO 411: from 317 to 429 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 412 

- Ceres seq_id 1585210 

- Location of start within SEQ ID NO 410: at 247 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 213 

- gi No. 3695395 

- Description: (AF096372) contains similarity to reverse 
transcriptase (Pfam: PF00078 rvt, E=4.3e-08) [Arabidopsis thaliana] 

- % Identity: 74.3 

- Alignment Length: 113 

- Location of Alignment in SEQ ID NO 412: from 235 to 347 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 413 

- Ceres seq_id 1585238 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 414 

- Ceres seq_id 1585239 

- Location of start within SEQ ID NO 413: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cytochrome P45 0 

- Location within SEQ ID NO 414: from 20 to 220 aa. 
(Dp) Related Amino Acid Sequences 

{B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 415 

- Ceres seq_id 1585240 
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- Location of start within SEQ ID NO 413: at 150 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cytochrome P450 

- Location within SEQ ID NO 415: from 1 to 171 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 416 

- Ceres seq_id 1585241 

- Location of start within SEQ ID NO 413: at 324 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cytochrome P450 

- Location within SEQ ID NO 416: from 1 to 113 aa . 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 417 

- Ceres seq_id 1585308 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 418 

- Ceres seq__id 1585309 

- Location of start within SEQ ID NO 417: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Zinc finger, C3HC4 type (RING finger) 

- Location within SEQ ID NO 418: from 94 to 135 aa. 

(Dp) Related Amino Acid Sequences 
Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 419 

- Ceres seq_id 1585351 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 20 

- Ceres seq__id 1585352 

- Location of start within SEQ ID NO 419: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 214 

- gi No. 3695397 

- Description: (AF096372) No definition line found [Arabidopsi 



- % Identity: 89.8 

- Alignment Length: 157 

- Location of Alignment in SEQ ID NO 420: from 1 to 15 4 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 421 



thaliana] 
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- Ceres seq_id 1585353 

- Location of start within SEQ ID NO 419: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 4 22 

- Ceres seq_id 1585458 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 423 

- Ceres seq_id 1585459 

- Location of start within SEQ ID NO 422: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Prion protein 

- Location within SEQ ID NO 423: from 16 to 131 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 215 

- gi No. 5107374 

- Description: (AF154272) PINHEAD 

- % Identity: 7 6.4 

- Alignment Length: 4 24 

- Location of Alignment in SEQ ID 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 424 

- Ceres seq_id 1585460 

- Location of start within SEQ ID NO 422: at 9 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Prion protein 

- Location within SEQ ID NO 424: from 14 to 129 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 216 

- gi No. 5107374 

- Description: (AF154272) PINHEAD 

- % Identity: 7 6.4 

- Alignment Length: 4 24 

- Location of Alignment in SEQ ID 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 425 

- Ceres seq__id 1585461 

- Location of start within SEQ ID NO 422: at 549 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 217 

- gi No. 5107374 

- Description: (AF154272) PINHEAD [Arabidopsis thaliana] 

- % Identity: 76.4 

- Alignment Length: 424 



[Arabidopsis thaliana] 
NO 423: from 326 to 748 



[Arabidopsis thaliana] 
NO 424: from 324 to 746 
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- Location of Alignment in SEQ ID NO 425: from 144 to 566 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 426 

- Ceres seq_id 1585462 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 27 

- Ceres seq_id 1585463 

- Location of start within SEQ ID NO 42 6: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Prion protein 

- Location within SEQ ID NO 427: from 14 to 129 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 218 

- gi No. 5107374 

- Description: (AF154272) PINHEAD [Arabidopsis thaliana] 

- % Identity: 74.3 

- Alignment Length: 4 05 

- Location of Alignment in SEQ ID NO 427: from 321 to 724 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 428 

- Ceres seq_id 1585465 

- Location of start within SEQ ID NO 426: at 541 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 219 

- gi No. 5107374 

- Description: (AF154272) PINHEAD [Arabidopsis thaliana] 

- % Identity: 74.3 

- Alignment Length: 4 05 

- Location of Alignment in SEQ ID NO 428: from 141 to 544 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 42 9 

- Ceres seq_id 1585469 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 430 

- Ceres seq_id 1585470 

- Location of start within SEQ ID NO 429: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s } 

(Dp) Related Amino Acid Sequences 

- Alignment No. 220 

- gi No. 4773913 

- Description: (AF147259) No definition line found [Arabidopsi 

thaliana] 

- % Identity: 77.1 

- Alignment Length: 4 8 

- Location of Alignment in SEQ ID NO 4 30: from 3 to 5 0 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 431 

- Ceres seq_id 1585471 

- Location of start within SEQ ID NO 429: at 233 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 4 32 

- Ceres seq_id 1585624 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 33 

- Ceres seq_id 1585625 

- Location of start within SEQ ID NO 432: at 1 nt, 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 221 

- gi No. 5724774 

- Description: (AF160183) contains similarity to retrotransposons ; 
may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 94.9 

- Alignment Length: 197 

- Location of Alignment in SEQ ID NO 4 33: from 1 to 197 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 434 

- Ceres seq_id 1585627 

- Location of start within SEQ ID NO 432: at 142 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 222 

- gi No. 5724774 

- Description: (AF160183) contains similarity to retrotransposons 
may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 94.9 

- Alignment Length: 197 

- Location of Alignment in SEQ ID NO 434: from 1 to 150 



Maximum Length Sequence: 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 435 

- Ceres seq_id 1585636 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 436 

- Ceres seq_id 1585637 

- Location of start within SEQ ID NO 435: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- NB-ARC domain 

- Location within SEQ ID NO 436: from 28 to 229 aa. 
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(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 437 

- Ceres seq_id 1585638 

- Location of start within SEQ ID NO 435: at 127 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- NB-ARC domain 

- Location within SEQ ID NO 437: from 1 to 187 aa . 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 4 38 

- Ceres seq__id 1585642 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 439 

- Ceres seq_id 1585643 

- Location of start within SEQ ID NO 438: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Cyclin 

- Location within SEQ ID NO 439: from 113 to 208 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 223 

- gi No. 2146728 

- Description: cyclin cyclb - Arabidopsis thaliana >gi 1 136064 6 
(L27223) cyclin [Arabidopsis thaliana] 

- % Identity: 79 

- Alignment Length: 17 6 

- Location of Alignment in SEQ ID NO 439: from 52 to 208 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 440 

- Ceres seq_id 1585644 

- Location of start within SEQ ID NO 438: at 51 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cyclin 

- Location within SEQ ID NO 440: from 97 to 192 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 224 

- gi No. 2146728 

- Description: cyclin cyclb - Arabidopsis thaliana >gi 1136064 6 
(L27223) cyclin [Arabidopsis thaliana] 

- % Identity: 7 9 

- Alignment Length: 17 6 

- Location of Alignment in SEQ ID NO 440: from 36 to 192 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 441 

- Ceres seq_id 1585645 

- Location of start within SEQ ID NO 438: at 270 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Cyclin 

- Location within SEQ ID NO 441: from 24 to 119 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 225 

- gi No. 2146728 

- Description: cyclin cyclb - Arabidopsis thaliana >gi 1 1360646 
(L27223) cyclin [Arabidopsis thaliana] 

- % Identity: 7 9 

- Alignment Length: 17 6 

- Location of Alignment in SEQ ID NO 441: from 1 to 119 
Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 442 

- Ceres seq_id 1585650 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 43 

- Ceres seq_id 1585651 

- Location of start within SEQ ID NO 442: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s } 

(Dp) Related Amino Acid Sequences 

- Alignment No. 22 6 

- gi No. 3176670 

- Description: (AC004393) Contains similarity to 41.9 KD protein 
SLL0898 gb| 1001369 from sequence of Synechocystis sp. gb|D64006. [Arabidopsis 
thaliana] 

- % Identity: 83.8 

- Alignment Length: 341 

- Location of Alignment in SEQ ID NO 443: from 1 to 336 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 444 

- Ceres seq_id 1585653 

- Location of start within SEQ ID NO 442: at 271 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 227 

- gi No. 3176670 

- Description: (AC004393) Contains similarity to 41.9 KD protein 
SLL0898 gb | 1001369 from sequence of Synechocystis sp. gb|D64006. [Arabidopsis 
thaliana] 

- % Identity: 83.8 

- Alignment Length: 341 

- Location of Alignment in SEQ ID NO 444: from 1 to 24 6 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 44 5 

- Ceres seq_id 1585658 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 446 
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- Ceres seq_id 1585659 

- Location of start within SEQ ID NO 445: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 22 8 

- gi No. 4585988 

- Description: (AC005287) Similar to phosphoprotein phosphatase 2A 
regulatory subunit [Arabidopsis thaliana] 

- % Identity: 88.7 

- Alignment Length: 213 

- Location of Alignment in SEQ ID NO 446: from 1 to 213 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 447 

- Ceres seq_id 1585660 

- Location of start within SEQ ID NO 445: at 49 nt * 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 22 9 

- gi No. 4585988 

- Description: (AC00528 7) Similar to phosphoprotein phosphatase 2A 
regulatory subunit [Arabidopsis thaliana] 

- % Identity: 88.7 

- Alignment Length: 213 

- Location of Alignment in SEQ ID NO 447: from 1 to 197 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 44 8 

- Ceres seq_id 1585661 

- Location of start within SEQ ID NO 445: at 142 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s } 

(Dp) Related Amino Acid Sequences 

- Alignment No. 230 

- gi No. 4585988 

- Description: (AC005287) Similar to phosphoprotein phosphatase 2A 
regulatory subunit [Arabidopsis thaliana] 

- % Identity: 88.7 

- Alignment Length: 213 

- Location of Alignment in SEQ ID NO 448: from 1 to 166 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 44 9 

- Ceres seq_id 1585675 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 450 

- Ceres seq_id 1585676 

- Location of start within SEQ ID NO 449: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 231 

- gi No. 3695399 
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- Description: (AF096372) contains similarity to Arabidopsis 
thaliana retrotransposon Athila (GB:X81801) [Arabidopsis thaliana] 

- % Identity: 86 

- Alignment Length: 171 

- Location of Alignment in SEQ ID NO 450: from 149 to 319 
Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 451 

- Ceres seq_id 1585682 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 452 

- Ceres seq_id 1585683 

- Location of start within SEQ ID NO 451: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
~ Alignment No. 232 

- gi No. 5724774 

- Description: {AF160183} contains similarity to retrotransposons 
may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 71.1 

- Alignment Length: 4 02 

- Location of Alignment in SEQ ID NO 452: from 29 to 415 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 453 

- Ceres seq_id 1585685 

- Location of start within SEQ ID NO 451: at 520 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 233 

- gi No. 5724774 

- Description: (AF160183) contains similarity to retrotransposons 
may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 71.1 

- Alignment Length: 4 02 

- Location of Alignment in SEQ ID NO 453: from 1 to 242 
Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 454 

- Ceres seq_id 1585695 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 455 

- Ceres seq_id 1585696 

- Location of start within SEQ ID NO 454: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- tRNA synthetases class II (G, H, P and S) 

- Location within SEQ ID NO 455: from 53 to 155 aa . 

(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 45 6 

- Ceres seq_id 1585698 

- Location of start within SEQ ID NO 454: at 157 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide {s ) 

- tRNA synthetases class II (G, H, P and S) 

- Location within SEQ ID NO 456; from 1 to 103 aa. 

(Dp) Related Amino Acid Sequences 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 457 

- Ceres seq_id 1585732 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 458 

- Ceres seq_id 1585733 

- Location of start within SEQ ID NO 457: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 234 

- gi No. 2160692 

- Description: (U73527) B ' regulatory subunit of PP2A [Arabidopsis 



thaliana] 

- % Identity: 97.7 

- Alignment Length: 87 

- Location of Alignment in SEQ ID NO 458: from 1 to 87 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 45 9 

- Ceres seq_id 1585734 

- Location of start within SEQ ID NO 457: at 163 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 235 

- gi No. 2160692 

- Description: (U73527) B * regulatory subunit of PP2A [Arabidopsis 



thaliana] 

- % Identity: 97.7 

- Alignment Length; 87 

- Location of Alignment in SEQ ID NO 459: from 1 to 33 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 60 

- Ceres seq_id 1585735 

- Location of start within SEQ ID NO 457: at 169 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 236 

- gi No. 2160692 

- Description: (U73527) B f regulatory subunit of PP2A [Arabidopsis 



thaliana] 



- % Identity: 97.7 
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- Alignment Length: 87 

- Location of Alignment in SEQ ID NO 4 60: from 1 to 31 



Maximum Length Sequence: 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 4 61 

- Ceres seq_id 1585740 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 62 

- Ceres seq_id 1585741 

- Location of start within SEQ ID NO 461: at 224 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Ribosomal protein Lll 

- Location within SEQ ID NO 462: from 17 to 148 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 237 

- gi No. 400983 

- Description: 50S RIBOSOMAL PROTEIN Lll, CHLOROPLAST PRECURSOR 
(CL11) >gi S 279648 | pir | IR5SP11 ribosomal protein Lll precursor - spinach 
>gi]213131embjCAA39950| (X56615) ribosomal protein Lll [Spinacia oleracea] 

- % Identity: 74.9 

- Alignment Length: 22 6 

- Location of Alignment in SEQ ID NO 462: from 1 to 160 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 63 

- Ceres seq_id 1585742 

- Location of start within SEQ ID NO 4 61: at 353 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein Lll 

- Location within SEQ ID NO 463: from 1 to 105 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 2 38 

- gi No. 400983 

- Description: 50S RIBOSOMAL PROTEIN Lll, CHLOROPLAST PRECURSOR 
(CL11) >gi ! 27 9648 i pir | |R5SPll ribosomal protein Lll precursor - spinach 
>gi|21313|emb|CAA39950| (X56615) ribosomal protein Lll [Spinacia oleracea] 

- % Identity: 74.9 

- Alignment Length: 22 6 

- Location of Alignment in SEQ ID NO 4 63: from 1 to 117 



Maximum Length Sequence: 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 4 64 

- Ceres seq__id 1585784 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 65 

- Ceres seq_id 1585785 

- Location of start within SEQ ID NO 4 64: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Helicases conserved C-terminal domain 
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- Location within SEQ ID NO 4 65: from 2 to 85 aa. 
{Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 66 

- Ceres seq_id 1585786 

- Location of start within SEQ ID NO 4 64: at 15 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Helicases conserved C-terrninal domain 

- Location within SEQ ID NO 4 66: from 1 to 81 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 67 

- Ceres seq_id 1585787 

- Location of start within SEQ ID NO 4 64: at 57 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Helicases conserved C-terminal domain 

- Location within SEQ ID NO 4 67: from 1 to 67 aa . 

(Dp) Related Amino Acid Sequences 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 468 

- Ceres seq_id 1585887 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 69 

- Ceres seq_id 1585888 

- Location of start within SEQ ID NO 4 68: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 239 

- gi No. 5668783 

- Description: (AC007894) F21H2.8 [Arabidopsis thaliana] 

- % Identity: 8 9.5 

- Alignment Length: 171 

- Location of Alignment in SEQ ID NO 469: from 1 to 171 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 470 

- Ceres seq_id 1585890 

- Location of start within SEQ ID NO 468: at 172 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 24 0 

- gi No. 5668783 

- Description: (AC007894) F21H2.8 [Arabidopsis thaliana] 

- % Identity: 89.5 

- Alignment Length: 171 
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- Location of Alignment in SEQ ID NO 470: from 1 to 114 
Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 471 

- Ceres seq_id 1585950 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 472 

- Ceres seq_id 1585951 

- Location of start within SEQ ID NO 471: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 241 

- gi No. 1076385 

- Description: protein kinase (EC 2.7.1.37) tousled - Arabidopsis 
thaliana >gi| 433052 (L23985) protein kinase [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 22 6 

- Location of Alignment in SEQ ID NO 4 72: from 1 to 225 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 73 

- Ceres seq_id 1585952 

- Location of start within SEQ ID NO 471: at 34 nt ♦ 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 242 

- gi No. 1076385 

- Description: protein kinase (EC 2.7.1.37) tousled - Arabidopsis 
thaliana >gi 1433052 (L23985) protein kinase [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 226 

- Location of Alignment in SEQ ID NO 473: from 1 to 214 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 47 4 

- Ceres seq_id 1585953 

- Location of start within SEQ ID NO 471: at 307 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 243 

- gi No. 1076385 

- Description: protein kinase (EC 2.7.1.37) tousled - Arabidopsis 
thaliana >gi 1433052 (L23985) protein kinase [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 226 

- Location of Alignment in SEQ ID NO 474: from 1 to 123 
Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 47 5 

- Ceres seq_id 1586052 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 47 6 

- Ceres seq_id 1586053 

- Location of start within SEQ ID NO 475: at 1 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 244 

- gi No. 1330342 

- Description: (U58755) C34D4.11 gene product [Caenorhafoditis 

elegans J 

- % Identity: 72.4 

- Alignment Length: 2 9 

- Location of Alignment in SEQ ID NO 476: from 187 to 215 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 477 

- Ceres seq__id 1586055 

- Location of start within SEQ ID NO 475: at 169 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 245 

- gi No. 1330342 

- Description: (U58755) C34D4.11 gene product [Caenorhabditis 

elegans] 

- % Identity: 72.4 

- Alignment Length: 29 

- Location of Alignment in SEQ ID NO 477: from 131 to 159 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 478 

- Ceres seq_id 1586145 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 479 

- Ceres seq_id 1586146 

- Location of start within SEQ ID NO 478: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cys/Met metabolism PLP-dependent enzyme 

- Location within SEQ ID NO 479: from 1 to 164 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 24 6 

- gi No. 1742961 

- Description: (X94756) cystathionine gamma -synthase [Arabidopsis 

thaliana] 

- % Identity: 81.8 

- Alignment Length: 165 

- Location of Alignment in SEQ ID NO 47 9: from 1 to 164 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 480 

- Ceres seq_id 1586147 

- Location of start within SEQ ID NO 478: at 232 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Cys/Met metabolism PLP-dependent enzyme 

- Location within SEQ ID NO 480: from 1 to 87 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 247 

- gi No. 1742961 

- Description: (X94756) cystathionine gamma -synthase [Arabidopsis 

thaliana] 

- % Identity: 81.8 

- Alignment Length: 165 

- Location of Alignment in SEQ ID NO 4 80: from 1 to 87 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 481 

- Ceres seq_id 1586148 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 482 

- Ceres seq__id 1586149 

- Location of start within SEQ ID NO 481: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 248 

- gi No. 1076385 

- Description: protein kinase (EC 2.7.1.37) tousled - Arabidopsis 
thaliana >gi 1433052 (L23985) protein kinase [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 98 

- Location of Alignment in SEQ ID NO 4 82: from 1 to 98 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 483 

- Ceres seq_id 1586187 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 484 

- Ceres seq_id 1586188 

- Location of start within SEQ ID NO 483: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 484: from 5 to 263 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 24 9 

- gi No. 1076385 

- Description: protein kinase (EC 2.7.1.37) tousled - Arabidopsis 
thaliana >gi 1 433052 (L23985) protein kinase [Arabidopsis thaliana] 

- % Identity: 99.6 

- Alignment Length: 2 64 

- Location of Alignment in SEQ ID NO 484: from 5 to 268 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 85 
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- Ceres seq__id 1586190 

- Location of start within SEQ ID NO 483: at 190 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 485: from 1 to 200 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 250 

- gi No. 1076385 

- Description: protein kinase {EC 2.7.1.37) tousled - Arabidopsis 
thaliana >gi I 433052 (L23985) protein kinase [Arabidopsis thaliana] 

- % Identity: 99. 6 

- Alignment Length: 2 64 

- Location of Alignment in SEQ ID NO 4 85: from 1 to 205 



Maximum Length Sequence: 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 486 

- Ceres seq_id 1586191 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 487 

- Ceres seq_id 1586192 

- Location of start within SEQ ID NO 486: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 487: from 5 to 237 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 251 

- gi No. 1076385 

- Description: protein kinase (EC 2.7.1.37) tousled - Arabidopsis 
thaliana >gi| 433052 (L23985) protein kinase [Arabidopsis thaliana] 

- % Identity: 99.6 

- Alignment Length: 233 

- Location of Alignment in SEQ ID NO 487: from 5 to 237 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 488 

- Ceres seq__id 1586194 

- Location of start within SEQ ID NO 486: at 190 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 488: from 1 to 174 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 252 

- gi No. 1076385 

- Description: protein kinase (EC 2.7.1.37) tousled - Arabidopsis 
thaliana >gi 1433052 (L23985) protein kinase [Arabidopsis thaliana] 

- % Identity: 99.6 

- Alignment Length: 233 

- Location of Alignment in SEQ ID NO 488: from 1 to 174 



Maximum Length Sequence: 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 489 

- Ceres seq_id 1586195 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 490 

- Ceres seq_id 1586196 

- Location of start within SEQ ID NO 489: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Integral membrane protein 

- Location within SEQ ID NO 490: from 53 to 182 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 91 

- Ceres seq_id 1586197 

- Location of start within SEQ ID NO 489: at 160 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Integral membrane protein 

- Location within SEQ ID NO 4 91: from 1 to 12 9 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 92 

- Ceres seq_id 1586198 

- Location of start within SEQ ID NO 489: at 220 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Integral membrane protein 

- Location within SEQ ID NO 492: from 1 to 109 aa. 



Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 493 

- Ceres seq_id 1586199 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 94 

- Ceres seq_id 1586200 

- Location of start within SEQ ID NO 4 93: at 1 nt, 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Integral membrane protein 

- Location within SEQ ID NO 4 94: from 53 to 182 aa . 
(Dp) Related Amino Acid Sequences 



- Alignment No. 253 

- gi No. 3281846 

- Description: (AJ006404) late elongated hypocotyl [Arabidopsis 



(Dp) Related Amino Acid Sequences 



thaliana] 



Identity: 100 
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- Alignment Length: 5 0 

- Location of Alignment in SEQ ID NO 4 94: from 24 6 to 295 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 95 

- Ceres seq_id 1586201 

- Location of start within SEQ ID NO 4 93: at 160 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Integral membrane protein 

- Location within SEQ ID NO 495: from 1 to 129 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 254 

- gi No. 3281846 

- Description: (AJ006404) late elongated hypocotyl [Arabidopsis 

thaliana] 

- % Identity: 100 

- Alignment Length: 50 

- Location of Alignment in SEQ ID NO 4 95: from 193 to 242 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 496 

- Ceres seq_id 1586202 

- Location of start within SEQ ID NO 4 93: at 220 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Integral membrane protein 

- Location within SEQ ID NO 496: from 1 to 109 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 255 

- gi No. 3281846 

- Description: (AJ006404) late elongated hypocotyl [Arabidopsis 

thaliana] 

- % Identity: 100 

- Alignment Length: 5 0 

- Location of Alignment in SEQ ID NO 496: from 173 to 222 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 497 

- Ceres seq_id 1586207 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 98 

- Ceres seq_id 1586208 

- Location of start within SEQ ID NO 4 97: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- BTB/POZ domain 

- Location within SEQ ID NO 498: from 59 to 229 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 4 99 

- Ceres seq_id 1586210 
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- Location of start within SEQ ID NO 497: at 103 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- BTB/POZ domain 

- Location within SEQ ID NO 499: from 25 to 195 aa. 



(Dp) Related Amino Acid Sequences 



Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 500 

- Ceres seq__id 1586296 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 501 

- Ceres seq_id 1586297 

- Location of start within SEQ ID NO 500: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Aldose 1-epimerase 

- Location within SEQ ID NO 501: from 1 to 157 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 256 

- gi No. 4741197 

- Description: (AL049746) aldose 1-epimerase-like protein 
[Arabidopsis thaliana] 

- % Identity: 98.1 

- Alignment Length: 15 9 

- Location of Alignment in SEQ ID NO 501: from 1 to 159 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 502 

- Ceres seq_id 1586298 

- Location of start within SEQ ID NO 500: at 38 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Aldose 1-epimerase 

- Location within SEQ ID NO 502: from 1 to 145 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 257 

- gi No. 4741197 

- Description: (AL049746) aldose 1-epimerase-like protein 
[Arabidopsis thaliana] 

- % Identity: 98.1 

- Alignment Length: 159 

- Location of Alignment in SEQ ID NO 502: from 1 to 147 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 503 

- Ceres seq_id 1586299 

- Location of start within SEQ ID NO 500: at 71 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Aldose 1-epimerase 

- Location within SEQ ID NO 503: from 1 to 134 aa. 
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{Dp} Related Amino Acid Sequences 

- Alignment No. 258 

- gi No. 4741197 

- Description: (AL04 974 6) aldose 1-epiraerase-like protein 
[Arabidopsis thaliana] 

- % Identity: 98.1 

- Alignment Length: 15 9 

- Location of Alignment in SEQ ID NO 503: from 1 to 136 



Maximum Length Sequence: 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 504 

- Ceres seq__id 1586345 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 5 05 

- Ceres seq_id 1586346 

- Location of start within SEQ ID NO 504: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Peroxidase 

- Location within SEQ ID NO 505: from 67 to 140 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 25 9 

- gi No. 99735 

- Description: L-ascorbate peroxidase (EC 1.11.1.11) precursor 
Arabidopsis thaliana (fragment) 

- % Identity: 97.9 

- Alignment Length: 14 2 

- Location of Alignment in SEQ ID NO 505: from 1 to 140 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 506 

- Ceres seq_id 1586347 

- Location of start within SEQ ID NO 504: at 15 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Peroxidase 

- Location within SEQ ID NO 506: from 63 to 136 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 260 

- gi No. 99735 

- Description: L-ascorbate peroxidase (EC 1.11.1.11) precursor 
Arabidopsis thaliana (fragment) 

- % Identity: 97.9 

- Alignment Length: 142 

- Location of Alignment in SEQ ID NO 506: from 1 to 136 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 507 

- Ceres seq_id 1586348 

- Location of start within SEQ ID NO 504: at 120 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Peroxidase 
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- Location within SEQ ID NO 507: from 28 to 101 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 61 

- gi No. 99735 

- Description: L-ascorbate peroxidase (EC 1.11.1.11) precursor 
Arabidopsis thaliana (fragment) 

- % Identity: 97.9 

- Alignment Length: 142 

- Location of Alignment in SEQ ID NO 507: from 1 to 101 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 508 

- Ceres seq_id 1586393 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 509 

- Ceres seq_id 1586394 

- Location of start within SEQ ID NO 508: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 509: from 266 to 331 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 510 

- Ceres seq_id 1586396 

- Location of start within SEQ ID NO 508: at 4 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 510: from 265 to 330 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 511 

- Ceres seq_id 1586438 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 512 

- Ceres seq_id 1586439 

- Location of start within SEQ ID NO 511: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 262 

- gi No. 3327392 

- Description: (AC004483) reverse-transcriptase-like protein 
[Arabidopsis thaliana] 

- % Identity: 91.7 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 512: from 122 to 133 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 513 

- Ceres seq_id 1586440 

- Location of start within SEQ ID NO 511: at 17 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 63 

- gi No. 3327392 

- Description: (AC004483) reverse-transcriptase-like protein 
[Arabidopsis thaliana] 

- % Identity: 91.7 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 513: from 117 to 128 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 514 

- Ceres seq__id 1586441 

- Location of start within SEQ ID NO 511: at 71 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 
~ Alignment No. 2 64 

- gi No. 3327392 

- Description: (AC004483) reverse-transcriptase-like protein 
[Arabidopsis thaliana] 

- % Identity: 91.7 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 514: from 99 to 110 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 515 

- Ceres seq_id 1586467 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 516 

- Ceres seq_id 158 64 68 

- Location of start within SEQ ID NO 515: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Myb-like DNA-binding domain 

- Location within SEQ ID NO 516: from 15 to 54 aa. 

(Dp) Related Amino Acid Sequences 
~ Alignment No. 2 65 

- gi No. 1732513 

- Description: (U62743) snapdragon myb protein 305 homolog 
[Arabidopsis thaliana] 

- % Identity: 77.4 

- Alignment Length: 31 

- Location of Alignment in SEQ ID NO 516: from 33 to 63 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 517 

- Ceres seq_id 1586469 

- Location of start within SEQ ID NO 515: at 91 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 66 

- gi No. 1732513 

- Description: (U62743) snapdragon myb protein 305 homolog 
[Arabidopsis thaliana] 

- % Identity: 77.4 

- Alignment Length: 31 

- Location of Alignment in SEQ ID NO 517: from 3 to 33 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 518 

- Ceres seq_id 1586470 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 519 

- Ceres seq_id 1586471 

- Location of start within SEQ ID NO 518: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Myb-like DNA-binding domain 

- Location within SEQ ID NO 519: from 15 to 54 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 67 

- gi No. 1732513 

- Description: (U62743) snapdragon myb protein 305 homolog 
[Arabidopsis thaliana] 

- % Identity: 77.4 

- Alignment Length: 31 

- Location of Alignment in SEQ ID NO 519: from 33 to 63 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 520 

- Ceres seq_id 1586472 

- Location of start within SEQ ID NO 518: at 91 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 68 

- gi No. 1732513 

- Description: (U62743) snapdragon myb protein 305 homolog 
[Arabidopsis thaliana] 

- % Identity: 77.4 

- Alignment Length: 31 

- Location of Alignment in SEQ ID NO 520: from 3 to 33 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 521 

- Ceres seq_id 1586546 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 522 

- Ceres seq_id 1586547 

- Location of start within SEQ ID NO 521: at 1 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- TPR Domain 

- Location within SEQ ID NO 522: from 75 to 103 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 69 

- gi No. 5281051 

- Description: (AL080318) stress-induced protein stil-like protein 
[Arabidopsis thaliana] 

- % Identity: 91.8 

- Alignment Length: 122 

- Location of Alignment in SEQ ID NO 522: from 1 to 120 

(B> Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 523 

- Ceres seq__id 1586548 

- Location of start within SEQ ID NO 521: at 55 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- TPR Domain 

- Location within SEQ ID NO 523: from 57 to 85 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 27 0 

- gi No. 5281051 

- Description: (AL080318) stress-induced protein stil-like protein 
[Arabidopsis thaliana] 

- % Identity: 91.8 

- Alignment Length: 122 

- Location of Alignment in SEQ ID NO 523: from 1 to 102 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 52 4 

- Ceres seq_id 158 654 9 

- Location of start within SEQ ID NO 521: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- TPR Domain 

- Location within SEQ ID NO 524: from 36 to 64 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 271 

- gi No. 5281051 

- Description: (AL080318) stress-induced protein stil-like protein 
[Arabidopsis thaliana] 

- % Identity: 91.8 

- Alignment Length: 122 

- Location of Alignment in SEQ ID NO 524: from 1 to 81 



Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 525 

- Ceres seq_id 1586602 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 526 

- Ceres seq_id 1586603 

- Location of start within SEQ ID NO 525: at 2 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- E1-E2 ATPases 

- Location within SEQ ID NO 526: from 583 to 669 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 527 

- Ceres seq_id 1586604 

- Location of start within SEQ ID NO 525: at 83 nt * 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- E1-E2 ATPases 

- Location within SEQ ID NO 527: from 556 to 642 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 528 

- Ceres seq_id 1586605 

- Location of start within SEQ ID NO 525: at 260 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- E1-E2 ATPases 

- Location within SEQ ID NO 528: from 4 97 to 583 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 529 

- Ceres seq_id 1586689 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 530 

- Ceres seq_id 1586690 

- Location of start within SEQ ID NO 529: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 27 2 

- gi No. 2134213 

- Description: protamine I - American alligator 

- % Identity: 75 

- Alignment Length: 16 

- Location of Alignment in SEQ ID NO 530: from 23 to 38 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 531 

- Ceres seq_id 1586691 

- Location of start within SEQ ID NO 52 9: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 27 3 
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- gi No. 392799 

- Description: (U00691) G5/D6 ORF [ Dictyostelium discoideum] 

- % Identity: 7 6 

- Alignment Length: 25 

- Location of Alignment in SEQ ID NO 531: from 1 to 25 



Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 532 

- Ceres seq__id 1586696 
( B ) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 533 

- Ceres seq__id 1586697 

- Location of start within SEQ ID NO 532: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 27 4 

- gi No. 4325351 

- Description: (AF128394) similar to Antirrhinum majus (garden 
snapdragon) TNP2 protein (GB:X57297) [Arabidopsis thaliana] 

- % Identity: 79.7 

- Alignment Length: 18 8 

- Location of Alignment in SEQ ID NO 533: from 59 to 234 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 534 

- Ceres seq_id 1586699 

- Location of start within SEQ ID NO 532: at 40 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 27 5 

- gi No. 4325351 

- Description: (AF128394) similar to Antirrhinum majus (garden 
snapdragon) TNP2 protein (GB:X57297) [Arabidopsis thaliana] 

- % Identity: 79.7 

- Alignment Length: 18 8 

- Location of Alignment in SEQ ID NO 534: from 46 to 221 



Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 535 

- Ceres seq_id 1587283 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 536 

- Ceres seq_id 1587284 

- Location of start within SEQ ID NO 535: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 27 6 

- gi No. 3695382 

- Description: (AF096370) contains similarity to Arabidopsis 
thaliana retrotransposon Tall-1 (GB:L47193) [Arabidopsis thaliana] 

- % Identity: 71 
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- Alignment Length: 18 6 

- Location of Alignment in SEQ ID NO 53 6: from 8 to 193 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 537 

- Ceres seq_id 1587286 

- Location of start within SEQ ID NO 535: at 133 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 27 7 

- gi No. 3695382 

- Description: (AF096370) contains similarity to Arabidopsis 
thaliana retrotransposon Tall-1 (GB:L47193) [Arabidopsis thaliana] 

- % Identity: 71 

- Alignment Length: 186 

- Location of Alignment in SEQ ID NO 537: from 1 to 149 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 538 

- Ceres seq^id 1587308 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 539 

- Ceres seq__id 1587309 

- Location of start within SEQ ID NO 538: at 181 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 27 8 

- gi No. 2244924 

- Description: (Z97339) glutaredoxin [Arabidopsis thaliana] 

- % Identity: 74.5 

- Alignment Length: 102 

- Location of Alignment in SEQ ID NO 539: from 1 to 102 

<B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 540 

- Ceres seq_id 1587310 

- Location of start within SEQ ID NO 538: at 187 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 27 9 

- gi No. 2244924 

- Description: (Z97339) glutaredoxin [Arabidopsis thaliana] 

- % Identity: 74.5 

- Alignment Length: 102 

- Location of Alignment in SEQ ID NO 54 0: from 1 to 100 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 541 

- Ceres seq__id 1587311 

- Location of start within SEQ ID NO 538: at 199 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 28 0 

- gi No. 2244924 

- Description: (Z97339) glutaredoxin [Arabidopsis thaliana] 

- % Identity: 74.5 

- Alignment Length: 102 

- Location of Alignment in SEQ ID NO 541: from 1 to 96 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 542 

- Ceres seq_id 1587406 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 543 

- Ceres seq_id 1587407 

- Location of start within SEQ ID NO 542: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 544 

- Ceres seq_id 1587409 

- Location of start within SEQ ID NO 542: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 281 

- gi No. 3021336 

- Description: (AJ224 957) RGA-like [Arabidopsis thaliana] 

- % Identity: 83.3 

- Alignment Length: 30 

- Location of Alignment in SEQ ID NO 544: from 3 to 32 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 545 

- Ceres seq__id 1587537 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 54 6 

- Ceres seq_id 1587538 

- Location of start within SEQ ID NO 545: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Integrase 

- Location within SEQ ID NO 54 6: from 27 to 131 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 547 

- Ceres seq_id 1587539 

- Location of start within SEQ ID NO 545: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 548 

- Ceres seq__id 1587540 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 54 9 

- Ceres seq_id 1587541 

- Location of start within SEQ ID NO 548: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 282 

- gi No. 3377846 

- Description: (AF07 627 4) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 83.6 

- Alignment Length: 67 

- Location of Alignment in SEQ ID NO 549: from 620 to 685 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 550 

- Ceres seq_id 1587543 

- Location of start within SEQ ID NO 548: at 262 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No* 28 3 

- gi No. 3377846 

- Description: (AF076274) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 83.6 

- Alignment Length: 67 

- Location of Alignment in SEQ ID NO 550: from 533 to 598 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 551 

- Ceres seq_id 1587563 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 552 

- Ceres seq__id 1587564 

- Location of start within SEQ ID NO 551: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 28 4 

- gi No. 1086591 

- Description: (U41007) similar to S. cervisiae nuclear protein 
SNF2 (SP:P22082) in a a region of gly-arg repeats [Caenorhabditis elegans] 

- % Identity: 81.3 

- Alignment Length: 16 

- Location of Alignment in SEQ ID NO 552: from 98 to 113 
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- Pat. Appln. SEQ ID NO 553 

- Ceres seq_id 1587565 

- Location of start within SEQ ID NO 551: at 7 nt . 



{C} Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 285 

- gi No. 1086591 

- Description: (U41007) similar to S. cervisiae nuclear protein 
SNF2 (SP:P22082) in a a region of gly-arg repeats [Caenorhabditis elegans] 

- % Identity: 81.3 

- Alignment Length: 16 

- Location of Alignment in SEQ ID NO 553: from 96 to 111 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 554 

- Ceres seq_id 1587566 

- Location of start within SEQ ID NO 551: at 64 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 28 6 

- gi No. 1086591 

- Description: {U41007} similar to S. cervisiae nuclear protein 
SNF2 (SP:P22082) in a a region of gly-arg repeats [Caenorhabditis elegans] 

- % Identity: , 81.3 

- Alignment Length: 16 

- Location of Alignment in SEQ ID NO 554: from 77 to 92 



Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 555 

- Ceres seq_id 1587579 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 55 6 

- Ceres seq_id 1587580 

- Location of start within SEQ ID NO 555: at 3 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Oxidoreductase FAD/NAD-binding domain 

- Location within SEQ ID NO 556: from 1 to 78 aa. 



(Dp)' Related Amino Acid Sequences 

- Alignment No. 2 87 

- gi No. 3913653 

- Description: FERREDOXIN — NADP REDUCTASE, EMBRYO ISOZYME PRECURSOR 
(FNR) >gi j 1778686 jdbj i BAA13417 [ (D87547) precursor f erredoxin-NADP-f 
oxidoreductase [Oryza sativa] 

- % Identity: 88.9 

- Alignment Length: 108 

- Location of Alignment in SEQ ID NO 556: from 1 to 108 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 557 

- Ceres seq_id 1587581 

- Location of start within SEQ ID NO 555: at 120 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 28 8 

- gi No. 3913653 

- Description: FERREDOXIN— NADP REDUCTASE, EMBRYO ISOZYME PRECURSOR 
(FNR) >gi | 17786861 dbj I BAA13417 j (D87547) precursor f erredoxin-NADP+ 
oxidoreductase [Oryza sativa] 

- % Identity: 88.9 

- Alignment Length: 108 

- Location of Alignment in SEQ ID NO 557: from 1 to 69 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 558 

- Ceres seq_id 1587594 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 559 

- Ceres seq_id 1587595 

- Location of start within SEQ ID NO 558: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Glycosyl transferases 

- Location within SEQ ID NO 559: from 171 to 347 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 5 60 

- Ceres seq_id 1587597 

- Location of start within SEQ ID NO 558: at 517 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Glycosyl transferases 

- Location within SEQ ID NO 560: from 1 to 175 aa . 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 561 

- Ceres seq_id 1587598 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 562 

- Ceres seq_id 1587599 

- Location of start within SEQ ID NO 561: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Glycosyl transferases 

- Location within SEQ ID NO 562: from 227 to 403 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 5 63 

- Ceres seq_id 1587601 
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- Location of start within SEQ ID NO 561: at 610 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Glycosyl transferases 

- Location within SEQ ID NO 563: from 24 to 200 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 564 

- Ceres seq_id 1587617 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 5 65 

- Ceres seq_id 1587618 

- Location of start within SEQ ID NO 564: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Tropomyosins 

- Location within SEQ ID NO 565: from 287 to 461 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 28 9 

- gi No. 2435616 

- Description: (AF026215) No definition line found [Caenorhabditis 



- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 565: from 559 to 572 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 5 66 

- Ceres seq_id 1587620 

- Location of start within SEQ ID NO 564: at 121 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Tropomyosins 

- Location within SEQ ID NO 566: from 247 to 421 aa . 
(Dp) Related Amino Acid Sequences 



- Alignment No. 2 90 

- gi No. 2435616 

- Description: (AF026215) No definition line found [Caenorhabditis 



- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 566: from 519 to 532 



Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 5 67 

- Ceres seq_id 1587654 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 568 

- Ceres seq_id 1587655 

- Location of start within SEQ ID NO 567: at 1 nt . 



elegans] 



elegans] 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 126 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- NB-ARC domain 

- Location within SEQ ID NO 568: from 188 to 343 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 5 69 

- Ceres seq_id 1587657 

- Location of start within SEQ ID NO 567: at 154 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- NB-ARC domain 

- Location within SEQ ID NO 569: from 137 to 292 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 570 

- Ceres seq_id 1587778 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 571 

- Ceres seq__id 1587779 

- Location of start within SEQ ID NO 570: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 91 

- gi No. 4914316 

- Description: (AC005489) F14N23.2 [Arabidopsis thaliana] 

- % Identity: 74.6 

- Alignment Length: 68 

- Location of Alignment in SEQ ID NO 571: from 1 to 67 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 572 

- Ceres seq_id 1587780 

- Location of start within SEQ ID NO 570: at 15 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 92 

- gi No. 4914316 

- Description: (AC005489) F14N23.2 [Arabidopsis thaliana] 

- % Identity: 74.6 

- Alignment Length: 68 

- Location of Alignment in SEQ ID NO 572: from 1 to 63 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 573 

- Ceres seq_id 1587781 

- Location of start within SEQ ID NO 570: at 230 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 574 

- Ceres seq_id 1587821 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 575 

- Ceres seq__id 1587822 

- Location of start within SEQ ID NO 574: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 93 

- gi No. 3047073 

- Description: (AF058825) contains similarity to retrotransposon- 
like proteins [Arabidopsis thaliana] 

- % Identity: 89.9 

- Alignment Length: 119 

- Location of Alignment in SEQ ID NO 575: from 45 to 163 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 57 6 

- Ceres seq_id 1587823 

- Location of start within SEQ ID NO 574: at 10 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 294 

- gi No. 3047073 

- Description: (AF058825) contains similarity to retrotransposon- 
like proteins [Arabidopsis thaliana] 

- % Identity: 89.9 

- Alignment Length: 119 

- Location of Alignment in SEQ ID NO 576: from 42 to 160 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 577 

- Ceres seq_id 1587824 

- Location of start within SEQ ID NO 574: at 133 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 295 

- gi No. 3047073 

- Description: (AF058825) contains similarity to retrotransposon- 
like proteins [Arabidopsis thaliana] 

- % Identity: 89.9 

- Alignment Length: 119 

- Location of Alignment in SEQ ID NO 577: from 1 to 119 
Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 57 8 
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- Ceres seq_id 1587825 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 57 9 

- Ceres seq_id 1587826 

- Location of start within SEQ ID NO 578: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 296 

- gi No. 3047073 

- Description: (AF058825) contains similarity to retrotransposon- 
like proteins [Arabidopsis thaliana] 

- % Identity: 83.3 

- Alignment Length: 30 

- Location of Alignment in SEQ ID NO 57 9: from 96 to 124 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 580 

- Ceres seq_id 1587913 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 581 

- Ceres seq_id 1587914 

- Location of start within SEQ ID NO 580: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Fibrillarin 

- Location within SEQ ID NO 581: from 60 to 292 aa. 

Related Amino Acid Sequences 
Alignment No. 2 97 
gi No. 4914455 

Description: (AL050400) f ibrillarin-like protein [Arabidopsis 

% Identity: 91.4 
Alignment Length: 305 

Location of Alignment in SEQ ID NO 581: from 1 to 298 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 582 

- Ceres seq_id 1587915 

- Location of start within SEQ ID NO 580: at 168 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Fibrillarin 

- Location within SEQ ID NO 582: from 5 to 237 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 2 98 

- gi No. 4914455 

- Description: (AL050400) f ibrillarin-like protein [Arabidopsis 

thaliana] 

- % Identity: 91.4 

- Alignment Length: 305 

- Location of Alignment in SEQ ID NO 582: from 1 to 243 
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Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 129 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 583 

- Ceres seq_id 1588129 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 584 

- Ceres seq_id 1588130 

- Location of start within SEQ ID NO 583: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 299 

- gi No. 927025 

- Description: (L44134) SPFl-like DNA-binding protein [Cucumis 

sativus] 

- % Identity: 70 

- Alignment Length: 20 

- Location of Alignment in SEQ ID NO 584: from 39 to 57 

Maximum Length Sequence : 

related to: 
Clone IDs: 
26678 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 585 

- Ceres seq_id 1592521 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 586 

- Ceres seq_id 1592522 

- Location of start within SEQ ID NO 585: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 587 

- Ceres seq__id 1592523 

- Location of start within SEQ ID NO 585: at 124 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 300 

- gi No. 755866 

- Description: (U23165) latex protein homolog [Brassica napus] 

- % Identity: 7 4.3 

- Alignment Length: 35 

- Location of Alignment in SEQ ID NO 587: from 80 to 114 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 588 

- Ceres seq_id 1592524 

- Location of start within SEQ ID NO 585: at 286 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 301 

- gi No. 755866 
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- Description: (U23165) latex protein homolog [Brassica napus] 

- % Identity: 74.3 

- Alignment Length: 35 

- Location of Alignment in SEQ ID NO 588: from 26 to 60 
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Maximum Length Sequence: 

related to: 
Clone IDs: 
27062 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 589 

- Ceres seq__id 1592525 
(B) Polypeptide Sequence 

- Pat, Appln. SEQ ID NO 590 

- Ceres seq_id 1592526 

- Location of start within SEQ ID NO 589: at 63 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 302 

- gi No. 2129641 

- Description: major latex protein type 1 - Arabidopsis thaliana 
>gii 1107493 iemb | CAA63026 1 (X91960) major latex protein typel [Arabidopsis 
thaliana] 

- % Identity: 72.7 

- Alignment Length: 12 8 

- Location of Alignment in SEQ ID NO 590: from 1 to 128 



Maximum Length Sequence: 

related to: 
Clone IDs: 
39215 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 591 

- Ceres seq___id 1592527 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 5 92 

- Ceres seq_id 1592528 

- Location of start within SEQ ID NO 591: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 593 

- Ceres seq_id 1592529 

- Location of start within SEQ ID NO 591: at 99 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L5 

- Location within SEQ ID NO 593: from 9 to 62 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 303 

- gi No. 1172969 

- Description: 60S RIBOSOMAL PROTEIN Lll (L16) 

>gi I 629552 |pir | I S49033 ribosomal protein Lll.e - Arabidopsis thaliana 
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>gi|550544[emb]CAA57394! (X81798) ribosoraal protein L16 [Arabidopsis 
thaliana] 

- % Identity: 92.6 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 593: from 1 to 80 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 594 

- Ceres seq_id 1592530 

- Location of start within SEQ ID NO 591: at 129 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L5 

- Location within SEQ ID NO 594: from 1 to 52 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 304 

- gi No. 1172969 

- Description: 60S RIBOSOMAL PROTEIN Lll (L16) 

>gi| 629552 !pir | IS49033 ribosomal protein Lll.e - Arabidopsis thaliana 
>gi| 550544 | emb | CAA57 394 | (X817 98) ribosomal protein L16 [Arabidopsis 
thaliana] 

- % Identity: 92.6 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 594: from 1 to 70 

Maximum Length Sequence: 

related to: 
Clone IDs: 
41466 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 5 95 

- Ceres seq_id 1592531 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 596 

- Ceres seq_id 1592532 

- Location of start within SEQ ID NO 595: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 305 

- gi No. 3914996 

- Description: PHOSPHOSERINE AMINOTRANSFERASE PRECURSOR (PSAT) 
>gi I 1665831 |dbj 1 BAA13640 | (D88541) phosphoserine aminotransferase 
[Arabidopsis thaliana] >gi I 2804260 i dbj I BAA244 41 1 (AB010408) phosphoserine 
aminotransferase [Arabidopsis thaliana] 

- % Identity: 70.6 

- Alignment Length: 34 

- Location of Alignment in SEQ ID NO 596: from 19 to 51 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 5 97 

- Ceres seq_id 1592533 

- Location of start within SEQ ID NO 5 95: at 55 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 30 6 
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- gi No. 3914996 

- Description: PHOSPHOSERINE AMINOTRANSFERASE PRECURSOR (PSAT) 
>gi 1 1665831 1 dbj | BAA13640 | (D88541) phosphoserine aminotransferase 
[Arabidopsis thaliana] >gi | 2804260 1 dbj | BAA24441 | (AB010408) phosphoserine 
aminotransferase [Arabidopsis thaliana] 

- % Identity: 70.6 

- Alignment Length: 34 

- Location of Alignment in SEQ ID NO 597: from 1 to 33 

Maximum Length Sequence: 

related to: 
Clone IDs: 
93160 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 5 98 

- Ceres seq_id 1592538 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 599 

- Ceres seq_id 1592539 

- Location of start within SEQ ID NO 598: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 600 

- Ceres seq_id 1592540 

- Location of start within SEQ ID NO 598: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 307 

- gi No. 1778141 

- Description: (U66321) phosphate /phosphoenolpyruvate translocator 
precursor; PPT [Arabidopsis thaliana] 

- % Identity: 99 

- Alignment Length: 97 

- Location of Alignment in SEQ ID NO 600: from 1 to 96 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 601 

- Ceres seq_id 1592541 

- Location of start within SEQ ID NO 598: at 11 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 308 

- gi No. 1778141 

- Description: (U66321) phosphate/phosphoenolpyruvate translocator 
precursor; PPT [Arabidopsis thaliana] 

- % Identity: 99 

- Alignment Length: 97 

- Location of Alignment in SEQ ID NO 601: from 1 to 93 

Maximum Length Sequence : 

related to: 
Clone IDs: 
94593 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 133 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 602 

- Ceres seq_id 1592546 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 603 

- Ceres seq_id 1592547 

- Location of start within SEQ ID NO 602: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 604 

- Ceres seq_id 1592548 

- Location of start within SEQ ID NO 602: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 605 

- Ceres seq_id 1592549 

- Location of start within SEQ ID NO 602: at 40 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 309 

- gi No. 1084336 

- Description: chlorophyll a/b-binding protein type II - 
Arabidopsis thaliana >gi 1541565 (U03395) PSI type II chlorophyll a/b-binding 
protein [Arabidopsis thaliana] 

- % Identity: 98.3 

- Alignment Length: 60 

- Location of Alignment in SEQ ID NO 605: from 1 to 59 



Maximum Length Sequence : 

related to: 
Clone IDs: 
98393 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 60 6 

- Ceres seq_id 1592567 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 607 

- Ceres seq_id 1592568 

- Location of start within SEQ ID NO 606: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ATP synthase subunit C 

- Location within SEQ ID NO 607: from 24 to 8 9 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No* 310 

- gi No. 2118222 

- Description: H+-transporting ATPase (EC 3.6.1.35), vacuolar, 16K 
(clone AVA-P4) - Arabidopsis thaliana >gi| 926935 (L44584) vacuolar H+- 
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pumping ATPase 16 kDa proteolipid [Arabidopsis thaliana] 
>gi! 5053005 ] gb I AAD38 8 03 . 1 |AF153677_1 (AF15367 7) vacuolar 

- % Identity: 93.4 

- Alignment Length: 91 

- Location of Alignment in SEQ ID NO 607: from 10 to 100 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 608 

- Ceres seq_id 1592569 

- Location of start within SEQ ID NO 606: at 29 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- ATP synthase subunit C 

- Location within SEQ ID NO 608: from 15 to 80 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 311 

- gi No. 2118222 

- Description: H+-transporting ATPase {EC 3.6.1.35), vacuolar, 16K 
chain (clone AVA-P4) - Arabidopsis thaliana >gi | 926935 (L44584) vacuolar H+- 
pumping ATPase 16 kDa proteolipid [Arabidopsis thaliana] 

>gi | 5053005 1 gb | AAD38 8 03 . 1 [ AF153677_1 (AF153677) vacuolar 

- % Identity: 93.4 

- Alignment Length: 91 

- Location of Alignment in SEQ ID NO 608: from 1 to 91 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 609 

- Ceres seq_id 1592570 

- Location of start within SEQ ID NO 606: at 113 nt. 

{C} Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ATP synthase subunit C 

- Location within SEQ ID NO 609: from 1 to 52 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 312 

- gi No. 2118222 

- Description: H+-transporting ATPase (EC 3.6.1.35), vacuolar, 16K 
chain (clone AVA-P4) - Arabidopsis thaliana >gi i 926935 (L44584) vacuolar H+- 
pumping ATPase 16 kDa proteolipid [Arabidopsis thaliana] 

>gi i 5053005 | gb ! AAD38 8 03 . 1 [AF153 67 7_1 (AF153 67 7) vacuolar 

- % Identity: 93.4 

- Alignment Length: 91 

- Location of Alignment in SEQ ID NO 609: from 1 to 63 

Maximum Length Sequence : 

related to: 
Clone IDs: 

104983 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 610 

- Ceres seq_id 1592602 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 611 

- Ceres seq_id 1592603 

- Location of start within SEQ ID NO 610: at 3 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 612 

- Ceres seq_id 1592604 

- Location of start within SEQ ID NO 610: at 88 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 313 

- gi No. 2832664 

- Description: (AL021710) pollen-specific protein - like 
[Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 98 

- Location of Alignment in SEQ ID NO 612: from 1 to 98 

Maximum Length Sequence : 

related to: 
Clone IDs : 

108482 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 613 

- Ceres seq_id 1592627 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 614 

- Ceres seq_id 1592628 

- Location of start within SEQ ID NO 613: at 79 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Adenylate kinase 

- Location within SEQ ID NO 614: from 19 to 140 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 314 

- gi No. 2497486 

- Description: URIDYL ATE KINASE (UK) (URIDINE MONOPHOSPHATE KINASE) 
(UMP KINASE) >gi | 2121275 (AF000147) UMP/CMP kinase [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 14 0 

- Location of Alignment in SEQ ID NO 614; from 1 to 140 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 615 

- Ceres seq_id 1592629 

- Location of start within SEQ ID NO 613: at 262 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Adenylate kinase 

- Location within SEQ ID NO 615: from 1 to 7 9 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 315 

- gi No. 2497486 

- Description: URIDYLATE KINASE (UK) (URIDINE MONOPHOSPHATE KINASE) 
(UMP KINASE) >gi 1 2121275 (AF000147) UMP/CMP kinase [Arabidopsis thaliana] 
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- % Identity: 100 

- Alignment Length: 140 

- Location of Alignment in SEQ ID NO 615: from 1 to 7 9 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 616 

- Ceres seq_id 1592630 

- Location of start within SEQ ID NO 613: at 274 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Adenylate kinase 

- Location within SEQ ID NO 616: from 1 to 75 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 316 

- gi No. 2497486 

- Description: URIDYLATE KINASE (UK) (URIDINE MONOPHOSPHATE KINASE) 
(UMP KINASE) >gi 12121275 {AF000147} UMP/CMP kinase [Arabidopsis thaliana] 

- % Identity; 100 

- Alignment Length: 140 

- Location of Alignment in SEQ ID NO 616: from 1 to 75 

Maximum Length Sequence: 

related to: 
Clone IDs: 

110339 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 617 

- Ceres seq_id 1592639 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 618 

- Ceres seq_id 1592640 

- Location of start within SEQ ID NO 617: at 23 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 317 

- gi No. 4406787 

- Description: {AC00 6532) NADH dehydrogenase [Arabidopsis thaliana] 

- % Identity: 99 

- Alignment Length: 103 

- Location of Alignment in SEQ ID NO 618: from 1 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 619 

- Ceres seq_id 1592641 

- Location of start within SEQ ID NO 617: at 50 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 318 

- gi No. 4406787 

- Description: (AC006532) NADH dehydrogenase [Arabidopsis thaliana] 

- % Identity: 99 

- Alignment Length: 103 

- Location of Alignment in SEQ ID NO 619: from 1 to 94 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 62 0 

- Ceres seq__id 1592642 

- Location of start within SEQ ID NO 617: at 71 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 319 

- gi No. 4406787 

- Description; (AC006532) NADH dehydrogenase [Arabidopsis thaliana] 

- % Identity: 99 

- Alignment Length: 103 

- Location of Alignment in SEQ ID NO 620: from 1 to 87 



Maximum Length Sequence : 

related to: 
Clone IDs: 

110638 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 621 

- Ceres seq_id 1592649 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 622 

- Ceres seq_id 1592650 

- Location of start within SEQ ID NO 621: at 148 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Pyruvate kinase 

- Location within SEQ ID NO 622: from 30 to 111 aa. 
{Dp} Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 623 

- Ceres seq_id 1592651 

- Location of start within SEQ ID NO 621: at 187 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Pyruvate kinase 

- Location within SEQ ID NO 623: from 17 to 98 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 624 

- Ceres seq_id 1592652 

- Location of start within SEQ ID NO 621: at 307 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Pyruvate kinase 

- Location within SEQ ID NO 624: from 1 to 58 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 



111347 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 625 

- Ceres seq__Id 1592659 
(B) Polypeptide Sequence 

- Pat, Appln. SEQ ID NO 62 6 

- Ceres seq__id 1592660 

- Location of start within SEQ ID NO 625: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 320 

- gi No. 1709535 

- Description: DELTA 1 - P YRROL I NE - 5 - C ARBOX Y LAT E SYNTHETASE B (P5CS 
B) [CONTAINS: GLUTAMATE 5-KINASE (GAMMA- GLUTAMYL KINASE) (GK) ; GAMMA- GLUTAMYL 
PHOSPHATE REDUCTASE (GPR) (GLUTAMATE-5-SEMIALDEHYDE DEHYDROGENASE) synthetase 
[Arabidopsis thaliana] 

- % Identity: 98.4 

- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 626: from 1 to 127 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 627 

- Ceres seq_id 1592661 

- Location of start within SEQ ID NO 625: at 85 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 321 

- gi No. 1709535 

- Description: DELTA 1 - P YRROL I NE - 5 - CARBOX Y LAT E SYNTHETASE B (P5CS 
B) [CONTAINS: GLUTAMATE 5-KINASE { GAMMA- GLUTAMYL KINASE) (GK) ; GAMMA- GLUTAMYL 
PHOSPHATE REDUCTASE (GPR) (GLUTAMATE-5-SEMIALDEHYDE DEHYDROGENASE) synthetase 
[Arabidopsis thaliana] 

- % Identity: 98.4 

- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 627: from 1 to 99 

Maximum Length Sequence: 

related to: 
Clone IDs: 

114611 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 628 

- Ceres seq_id 1592685 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 62 9 

- Ceres seq_id 1592686 

- Location of start within SEQ ID NO 628: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 630 

- Ceres seq_id 1592687 

- Location of start within SEQ ID NO 628: at 113 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 322 

- gi No. 1778141 

- Description: (U66321) phosphate/phosphoenolpyruvate translocator 
precursor; PPT [Arabidopsis thaliana] 

- % Identity: 99.2 

- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 630: from 1 to 126 

Maximum Length Sequence: 

related to: 
Clone IDs: 

118747 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 631 

- Ceres seq_id 1592718 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 632 

- Ceres seq_id 1592719 

- Location of start within SEQ ID NO 631: at 63 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 323 

- gi No. 2129641 

- Description: major latex protein type 1 - Arabidopsis thaliana 
>gi | 11074 93 j emb | CAA6302 6 J (X91960) major latex protein typel [Arabidopsis 
thaliana] 

- % Identity: 71.9 

- Alignment Length: 121 

- Location of Alignment in SEQ ID NO 632: from 1 to 121 

Maximum Length Sequence : 

related to: 
Clone IDs: 

122840 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 633 

- Ceres seq_id 1592742 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 634 

- Ceres seq_id 1592743 

- Location of start within SEQ ID NO 633: at 106 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 324 

- gi No. 1778141 

- Description: (U66321) phosphate/phosphoenolpyruvate translocator 
precursor; PPT [Arabidopsis thaliana] 

- % Identity: 99.2 

- Alignment Length: 12 9 

- Location of Alignment in SEQ ID NO 634: from 1 to 121 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 635 

- Ceres seq_id 1592744 
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- Location of start within SEQ ID NO 633: at 211 nt. 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 325 

- gi No. 1778141 

- Description; (U66321) phosphate/phosphoenolpyruvate translocato 
precursor; PPT [Arabidopsis thaliana] 

- % Identity: 99.2 

- Alignment Length: 12 9 

- Location of Alignment in SEQ ID NO 635: from 1 to 86 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 636 

- Ceres seq_id 1592745 

- Location of start within SEQ ID NO 633: at 259 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 326 

- gi No. 1778141 

- Description: (U66321) phosphate/phosphoenolpyruvate translocato 
precursor; PPT [Arabidopsis thaliana] 

- % Identity: 99.2 

- Alignment Length: 12 9 

- Location of Alignment in SEQ ID NO 636: from 1 to 70 

Maximum Length Sequence: 

related to: 
Clone IDs: 

124632 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 637 

- Ceres seq_id 1592760 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 638 

- Ceres seq_id 1592761 

- Location of start within SEQ ID NO 637: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 60s Acidic ribosomal protein 

- Location within SEQ ID NO 638: from 62 to 138 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 327 

- gi No. 1710591 

- Description: 60S ACIDIC RIBOSOMAL PROTEIN P2 

- % Identity: 96.2 

- Alignment Length: 7 8 

- Location of Alignment in SEQ ID NO 638: from 61 to 138 



Maximum Length Sequence: 

related to: 
Clone IDs: 

143402 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 639 

- Ceres seq_id 1592789 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 640 

- Ceres seq_id 1592790 

- Location of start within SEQ ID NO 639: at 249 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin-conjugating enzyme 

- Location within SEQ ID NO 64 0: from 1 to 81 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 328 

- gi No. 5734741 

- Description: (AC007651) Similar to Ubiquitin Conjugating Enzyme 
[Arabidopsis thaliana] 

- % Identity: 98.8 

- Alignment Length: 82 

- Location of Alignment in SEQ ID NO 640: from 1 to 81 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 641 

- Ceres seq_id 1592791 

- Location of start within SEQ ID NO 639; at 319 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 64 2 

- Ceres seq_id 1592792 

- Location of start within SEQ ID NO 639: at 343 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

144256 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 643 

- Ceres seq_id 1592801 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 64 4 

- Ceres seq_id 1592802 

- Location of start within SEQ ID NO 643: at 174 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 329 

- gi No. 5452942 

- Description: (AF066061) glucosidase II beta-subunit [Mus 

ruusculus] 

- % Identity: 76.2 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 644: from 34 to 54 
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(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 64 5 

- Ceres seq_id 1592803 

- Location of start within SEQ ID NO 643: at 216 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 330 

- gi No. 5452942 

- Description: (AF066061) glucosidase II beta-subunit [Mus 

musculus] 

- % Identity: 76.2 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 645: from 20 to 40 

Maximum Length Sequence: 

related to: 
Clone IDs: 

145649 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 646 

- Ceres seq_id 1592822 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 647 

- Ceres seq_id 1592823 

- Location of start within SEQ ID NO 64 6: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 64 8 

- Ceres seq_id 1592824 

- Location of start within SEQ ID NO 646: at 224 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 331 

- gi No. 2407802 

- Description: (Y12576} histone H2B [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 648: from 1 to 12 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 64 9 

- Ceres seq_id 1592825 

- Location of start within SEQ ID NO 646: at 304 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- HIT family 

- Location within SEQ ID NO 649: from 18 to 61 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 332 

- gi No. 629858 

- Description: protein kinase C inhibitor - maize 

- % Identity: 80.6 
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- Alignment Length: 62 

- Location of Alignment in SEQ ID NO 64 9: from 1 to 61 

Maximum Length Sequence: 

related to: 
Clone IDs: 

146133 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 65 0 

- Ceres seq_id 1592832 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 651 

- Ceres seq_id 1592833 

- Location of start within SEQ ID NO 650: at 116 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 333 

- gi No. 3355483 

- Description: (AC004218) gibberellin-regulated protein (GASA5) 
like [Arabidopsis thaliana] 

- % Identity: 7 6.4 

- Alignment Length: 8 9 

- Location of Alignment in SEQ ID NO 651: from 1 to 8 9 

Maximum Length Sequence : 

related to: 
Clone IDs: 

147156 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 652 

- Ceres seq__id 1592838 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 653 

- Ceres seq__id 1592839 

- Location of start within SEQ ID NO 652: at 59 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L30 

- Location within SEQ ID NO 653: from 84 to 127 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 334 

- gi No. 445613 

- Description: ribosomal protein L7 [Solanum tuberosum] 

- % Identity: 74.8 

- Alignment Length: 123 

- Location of Alignment in SEQ ID NO 653: from 5 to 127 



Maximum Length Sequence : 

related to: 
Clone IDs: 

147603 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 65 4 

- Ceres seq_id 1592844 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 655 

- Ceres seq_id 1592845 
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- Location of start within SEQ ID NO 654; at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 335 

- gi No. 642484 

- Description: (U16371) androgen receptor [Homo sapiens] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 655: from 8 9 to 99 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 656 

- Ceres seq_id 1592846 

- Location of start within SEQ ID NO 654: at 60 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 336 

- gi No. 642484 

- Description: (U16371) androgen receptor [Homo sapiens] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 656: from 70 to 80 

Maximum Length Sequence: 

related to: 
Clone IDs: 

148965 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 657 

- Ceres seq_id 1592855 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 658 

- Ceres seq_id 1592856 

- Location of start within SEQ ID NO 657: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Eukaryotic initiation factor 5A hypusine (eIF-5A) 

- Location within SEQ ID NO 658: from 38 to 89 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 337 

- gi No. 100277 

- Description: translation initiation factor eIF-5A. 1 - curled- 
leaved tobacco (fragment) >gi ! 829282 | emb | CAA45103 | (X63541) eukaryotic 
initiation factor 5A (1) [Nicotiana plumbaginif olia] 

- % Identity: 71.4 

- Alignment Length: 35 

- Location of Alignment in SEQ ID NO 658: from 41 to 75 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 65 9 

- Ceres seq__id 1592857 

- Location of start within SEQ ID NO 657: at 84 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 
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- Eukaryotic initiation factor 5A hypusine (eIF-5A) 

- Location within SEQ ID NO 659: from 11 to 62 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 338 

- gi No. 100277 

- Description: translation initiation factor eIF-5A.l - curled- 
leaved tobacco (fragment) >gi 1 829282 | emb ) CAA45103 | (X63541) eukaryotic 
initiation factor 5A (1) [Nicotiana plumbaginif olia] 

- % Identity: 71.4 

- Alignment Length: 35 

- Location of Alignment in SEQ ID NO 659: from 14 to 48 

Maximum Length Sequence: 

related to: 
Clone IDs: 

149101 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 660 

- Ceres seq_id 1592862 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 661 

- Ceres seq_ji_d 1592863 

- Location of start within SEQ ID NO 660: at 79 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 339 

- gi No. 3478700 

- Description: (AF034387) AFT protein [Arabidopsis thaliana] 

- % Identity: 99.3 

- Alignment Length: 138 

- Location of Alignment in SEQ ID NO 661: from 1 to 136 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 6 62 

- Ceres seq_id 1592864 

- Location of start within SEQ ID NO 660: at 193 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 34 0 

- gi No. 3478700 

- Description: (AF03438 7) AFT protein [Arabidopsis thaliana] 

- % Identity: 99.3 

- Alignment Length: 138 

- Location of Alignment in SEQ ID NO 662: from 1 to 98 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 663 

- Ceres seq_id 1592865 

- Location of start within SEQ ID NO 660: at 232 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 341 

- gi No. 3478700 

- Description: (AF034387) AFT protein [Arabidopsis thaliana] 
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- % Identity: 99.3 

- Alignment Length: 138 

- Location of Alignment in SEQ ID NO 663: from 1 to 85 

Maximum Length Sequence: 

related to: 
Clone IDs: 

149228 

{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 664 

- Ceres seq_id 1592866 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 665 

- Ceres seq_id 1592867 

- Location of start within SEQ ID NO 664: at 2 nt, 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide ( s) 

- Terpene synthase family 

- Location within SEQ ID NO 665: from 16 to 118 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 6 66 

- Ceres seq_id 1592868 

- Location of start within SEQ ID NO 664: at 26 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Terpene synthase family 

- Location within SEQ ID NO 666: from 8 to 110 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

150409 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 667 

- Ceres seq_id 1592877 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 668 

- Ceres seq_id 1592878 

- Location of start within SEQ ID NO 667: at 55 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Glutathione S-transf erases . 

- Location within SEQ ID NO 668: from 9 to 10 6 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

152522 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 669 

- Ceres seq_id 1592898 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 67 0 

- Ceres seq_id 1592899 

- Location of start within SEQ ID NO 669: at 54 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Myb-like DNA-binding domain 

- Location within SEQ ID NO 670: from 35 to 80 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 342 

- gi No. 2129650 

- Description: myb-related transcription factor 24, 7K - Arabidopsi 
thaliana >gi | 1197190 | emb 1 CAA9228 0 I (Z68157) myb-related transcription factor 
[Arabidopsis thaliana] 

- % Identity: 97.8 

- Alignment Length: 135 

- Location of Alignment in SEQ ID NO 670: from 1 to 135 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 671 

- Ceres seq_id 1592900 

- Location of start within SEQ ID NO 669: at 96 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Myb-like DNA-binding domain 

- Location within SEQ ID NO 671: from 21 to 66 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 34 3 

- gi No. 2129650 

- Description: myb-related transcription factor 24, 7K - Arabidopsi 
thaliana >gi | 1197190 | emb | CAA92280 ( (Z68157) myb-related transcription factor 
[Arabidopsis thaliana] 

- % Identity: 97.8 

- Alignment Length: 135 

- Location of Alignment in SEQ ID NO 671: from 1 to 121 

Maximum Length Sequence: 

related to: 
Clone IDs: 

154383 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 672 

- Ceres seq__id 1592920 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 67 3 

- Ceres seq_id 1592921 

- Location of start within SEQ ID NO 672: at 139 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 673: from 4 to 118 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 34 4 

- gi No. 1168529 
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- Description: SERINE/ THREONINE -PROTEIN KINASE ASKl 

>gi I 541890 [pir | | S36944 probable serine/threonine-specif ic protein kinase {EC 
2.7.1.-) {clone ASKl) - Arabiciopsis thaliana >gijl66882 (M91548) 
serine/threonine kinase [Arabidopsis thaliana] >gi 11931648 

- % Identity: 97.5 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 673: from 1 to 118 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 67 4 

- Ceres seq__id 1592922 

- Location of start within SEQ ID NO 672: at 199 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 674: from 1 to 98 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No, 345 

- gi No. 1168529 

- Description: SERINE/THREONINE- PROTEIN KINASE ASKl 

>gi i 541890 ! pir j | S36944 probable serine/threonine-specif ic protein kinase (EC 
2.7.1.-) (clone ASKl) - Arabidopsis thaliana >gi [ 166882 (M91548) 
serine/threonine kinase [Arabidopsis thaliana] >gi [ 1931648 

- % Identity: 97.5 

- Alignment Length: 118 

- Location of Alignment In SEQ ID NO 674: from 1 to 98 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 67 5 

- Ceres seq__id 1592923 

- Location of start within SEQ ID NO 672: at 232 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 675: from 1 to 87 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 34 6 

- gi No. 1168529 

- Description: SERINE /THREONINE- PROTEIN KINASE ASKl 

>gi S 541890 | pir 1 | S3694 4 probable serine/threonine-specif Ic protein kinase {EC 
2.7.1.-) (clone ASKl) - Arabidopsis thaliana >gi i 166882 (M91548) 
serine/threonine kinase [Arabidopsis thaliana] >gi( 1931648 

- % Identity: 97.5 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 675: from 1 to 87 

Maximum Length Sequence : 

related to: 
Clone IDs: 

155007 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 67 6 

- Ceres seq_id 1592924 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 677 

- Ceres seq_id 1592925 

- Location of start within SEQ ID NO 676: at 37 nt ♦ 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 347 

- gi No. 3123331 

- Description: (AJ005930) squalene epoxidase homologue [Arabidopsis 

thaliana] 

- % Identity: 99.3 

- Alignment Length: 14 6 

- Location of Alignment in SEQ ID NO 677: from 3 to 147 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 67 8 

- Ceres seq_id 1592926 

- Location of start within SEQ ID NO 676: at 79 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 34 8 

- gi No. 3123331 

- Description: (AJ00593G) squalene epoxidase homologue [Arabidopsis 

thaliana] 

- % Identity: 99.3 

- Alignment Length: 14 6 

- Location of Alignment in SEQ ID NO 678: from 1 to 133 

Maximum Length Sequence: 

related to: 
Clone IDs: 

155827 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 67 9 

- Ceres seq_id 1592932 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 680 

- Ceres seq_id 1592933 

- Location of start within SEQ ID NO 67 9: at 206 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 34 9 

- gi No. 2462746 

- Description: (AC002292) Similar to ATP-citrate-lyase [Arabidopsis 

thaliana] 

- % Identity: 94.9 

- Alignment Length: 98 

- Location of Alignment in SEQ ID NO 680: from 1 to 97 



Maximum Length Sequence: 

related to: 
Clone IDs: 

158695 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 681 

- Ceres seq_id 1592963 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 682 

- Ceres seq_id 1592964 
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- Location of start within SEQ ID NO 681: at 84 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 350 

- gi No. 421836 

- Description: G-box-binding factor GF14 - Arabidopsis thaliana 
>gi 1 553040 (M96855) GF14 [Arabidopsis thaliana] 

- % Identity: 78.6 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 682: from 8 to 21 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 683 

- Ceres seq_id 1592965 

- Location of start within SEQ ID NO 681: at 170 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 14-3-3 proteins 

- Location within SEQ ID NO 683: from 1 to 98 aa. 

{Dp) Related Amino Acid Sequences 

- Alignment No. 351 

- gi No. 1702987 

- Description: 14-3-3-LIKE PROTEIN GF14 PHI >gi( 1493805 (L09111) 
GF14 protein phi chain [Arabidopsis thaliana] >gi | 2232146 (AF001414) 14-3-3- 
like protein GF14 phi [Arabidopsis thaliana] 

- % Identity: 96.4 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 683: from 1 to 98 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 684 

- Ceres seq_id 1592966 

- Location of start within SEQ ID NO 681: at 182 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 14-3-3 proteins 

- Location within SEQ ID NO 684: from 1 to 94 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 352 

- gi No. 1702987 

- Description: 14-3-3-LIKE PROTEIN GF14 PHI >gi| 1493805 (L09111) 
GF14 protein phi chain [Arabidopsis thaliana] >gi| 2232146 {AF001414) 14-3-3- 
like protein GF14 phi [Arabidopsis thaliana] 

- % Identity: 96.4 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 684: from 1 to 94 



Maximum Length Sequence : 

related to: 
Clone IDs: 

159039 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 685 

- Ceres seq_id 1592975 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 68 6 

- Ceres seq_id 1592976 

- Location of start within SEQ ID NO 685: at 62 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 353 

- gi No. 1769905 

- Description: (X98108) 23 kDa polypeptide of oxygen-evolving 
comlex (OEC) [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 145 

- Location of Alignment in SEQ ID NO 68 6: from 1 to 14 4 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 687 

- Ceres seq_id 1592977 

- Location of start within SEQ ID NO 685: at 198 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 
17903 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 688 

- Ceres seq_id 1593000 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 68 9 

- Ceres seq_id 1593001 

- Location of start within SEQ ID NO 68 8: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 354 

- gi No. 5454190 

- Description: (AC005698) T3P18.4 [Arabidopsis thaliana] 

- % Identity: 92.7 

- Alignment Length: 82 

- Location of Alignment in SEQ ID NO 689: from 32 to 113 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 690 

- Ceres seq_id 1593002 

- Location of start within SEQ ID NO 688: at 96 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 355 

- gi No. 5454190 

- Description: (AC005698) T3P18.4 [Arabidopsis thaliana] 

- % Identity: 92.7 

- Alignment Length: 82 

- Location of Alignment in SEQ ID NO 690: from 1 to 82 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

206245 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 691 

- Ceres seq__id 1593014 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 6 92 

- Ceres seq_id 1593015 

- Location of start within SEQ ID NO 691: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 35 6 

- gi No. 4587564 

- Description: (AC006550) Strong similarity to gb(X14017 
photosystem I reaction centre subunit II precursor (psaD) from Spinacia 
oleracea. ESTs gb|R30423, gb|T42998 / gb]Z18178, gb!T14133 / gb|N65521, 
gb]T42498 7 gbjT41918, gbjN38Q24... 

- % Identity: 77.8 

- Alignment Length: 18 

- Location of Alignment in SEQ ID NO 692: from 16 to 33 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 693 

- Ceres seq_id 1593016 

- Location of start within SEQ ID NO 691: at 47 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 357 

- gi No. 4587564 

- Description: (AC006550) Strong similarity to gb|X14017 
photosystem I reaction centre subunit II precursor (psaD) from Spinacia 
oleracea. ESTs gb|R30423, gb|T42998, gb|Z18178, gb|T14133, gb|N65521, 
gb|T42498, gb|T41918, gb|N38024... 

- % Identity: 77.8 

- Alignment Length: 18 

- Location of Alignment in SEQ ID NO 693: from 1 to 18 

Maximum Length Sequence: 

related to: 
Clone IDs: 

206636 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 694 

- Ceres seq_id 1593017 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 695 

- Ceres seq_id 1593018 

- Location of start within SEQ ID NO 694: at 66 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



Table 1 
Page 152 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 696 
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- Ceres seq_id 1593019 

- Location of start within SEQ ID NO 694: at 84 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 697 

- Ceres seq_id 1593020 

- Location of start within SEQ ID NO 694: at 235 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 358 

- gi No. 4959712 

- Description: (AF136010) SPP30 [Solarium chacoense] 

- % Identity: 95.9 

- Alignment Length: 4 9 

- Location of Alignment in SEQ ID NO 697: from 9 to 57 



Maximum Length Sequence: 

related to: 
Clone IDs: 
28703 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 6 98 

- Ceres seq__id 1593051 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 699 

- Ceres seq_id 1593052 

- Location of start within SEQ ID NO 698: at 155 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 35 9 

- gi No. 3929364 

- Description: NADH- UBIQUINONE OXIDOREDUCTASE 23 KD SUBUNIT 
PRECURSOR (COMPLEX I-23KD) (CI-23KD) >gi i 1 07 635 6 i pir M S5238 0 NADH 
dehydrogenase (EC 1.6.99.3) - Arabidopsis thaliana >gi ] 666977 1 ernb 1 CAA59061 
(X84318) NADH dehydrogenase [Arabidopsis 

- % Identity: 87.1 

- Alignment Length: 101 

- Location of Alignment in SEQ ID NO 699: from 1 to 100 



Maximum Length Sequence: 

related to: 
Clone IDs: 
29272 

(Ac) cDNA Polynucleotide Seouence 

- Pat. Appln. SEQ ID NO 700 

- Ceres seq_id 1593053 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 701 

- Ceres seq_id 1593054 

- Location of start within SEQ ID NO 700: at 44 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- SRF-type transcription factor (DNA-binding and dimerisation 

domain) 

- Location within SEQ ID NO 701: from 1 to 59 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 3 60 

- gi No. 3912988 

- Description; FLORAL HOMEOTIC PROTEIN AGL9 >gi 12345158 (AF015552) 
AGL9 [Arabidopsis thaliana] >gi | 2829878 (AC002396) AGL9 [Arabidopsis 
thaliana] 

- % Identity: 100 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 701: from 1 to 111 



Maximum Length Sequence: 

related to: 
Clone IDs: 
32693 

(Ac) cDNA Polynucleotide Sequence 

- Pat, Appln. SEQ ID NO 7 02 

- Ceres seq_id 1593057 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 703 

- Ceres seq_id 1593058 

- Location of start within SEQ ID NO 702: at 35 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ubiquitin family 

- Location within SEQ ID NO 703: from 1 to 7 6 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 361 

- gi No. 99772 

- Description: ubiquitin 81-aa extension protein 2 - Arabidopsis 
thaliana >gi 1166936 (J05540) ubiquitin extension protein { UBQ 6 ) [Arabidopsis 
thaliana] >gi | 3522 953 | gb f AAC34235 . 1 ! (AC004411) ubiquitin extension protein 
(UBQ6) [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length; 113 

- Location of Alignment in SEQ ID NO 7 03: from 1 to 113 



Maximum Length Sequence; 

related to: 
Clone IDs: 
40702 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 704 

- Ceres seq__id 1593077 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 705 

- Ceres seq_id 1593078 

- Location of start within SEQ ID NO 704: at 38 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Lipase /Acylhydrolase with GDSL-like motif 

- Location within SEQ ID NO 7 05: from 34 to 90 aa . 



(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 706 

- Ceres seq_id 1593079 

- Location of start within SEQ ID NO 704: at 53 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s ) 

- Lipase/Acylhydrolase with GDSL-like motif 

- Location within SEQ ID NO 706: from 29 to 85 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

207676 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 7 07 

- Ceres seq_id 1593101 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 708 

- Ceres seq_id 1593102 

- Location of start within SEQ ID NO 7 07: at 54 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- KH domain 

- Location within SEQ ID NO 708: from 47 to 95 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 362 

- gi No. 1173253 

- Description: 40S RIBOSOMAL PROTEIN S3 >gi 1 543317 | pir [] S4 1170 
ribosomal protein S3 - mouse >gi I 57728 j emb | CAA35916 | (X51536) ribosomal 
protein S3 (AA 1-243} [Rattus rattus] >gi I 439522 i emb 1 CAA54167 | (X7 6772) 
ribosomal protein S3 [Mus musculus] 

- % Identity: 87.4 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 7 08: from 1 to 111 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 709 

- Ceres seq_id 1593103 

- Location of start within SEQ ID NO 707: at 183 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- KH domain 

- Location within SEQ ID NO 709: from 4 to 52 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 363 

- gi No. 1173253 

- Description: 40S RIBOSOMAL PROTEIN S3 >gi I 543317 | pir i | S41170 
ribosomal protein S3 - mouse >gi 1 57728 i emb | CAA3591 6 | (X51536) ribosomal 
protein S3 (AA 1-243) [Rattus rattus] >gi [ 439522 1 emb I CAA54167 | (X76772) 
ribosomal protein S3 [Mus musculus] 

- % Identity: 87.4 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 709: from 1 to 68 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

147930 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 710 

- Ceres seq_id 1593104 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 711 

- Ceres seq_id 1593105 

- Location of start within SEQ ID NO 710: at 63 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 364 

- gi No. 131398 

- Description: PHOTOS YSTEM II 10 KD POLYPEPTIDE PRECURSOR 
>gil 72714 |pir | [F2MU10 photosystem II 10K protein precursor - Arabidopsis 
thaliana >gi i 16447 | emb | CAA3944 1 i (X55970) photosystem II 10 kDa polypeptL 
[Arabidopsis thaliana] >gi 13152571 (AC002986) 

- % Identity: 84.6 

- Alignment Length: 52 

- Location of Alignment in SEQ ID NO 711: from 1 to 52 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 712 

- Ceres seq_id 1593106 

- Location of start within SEQ ID NO 710: at 78 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 365 

- gi No. 131398 

- Description: PHOTOSYSTEM II 10 KD POLYPEPTIDE PRECURSOR 

>gi I 72714 Ipir I | F2MU10 photosystem II 10K protein precursor - Arabidopsis 
thaliana >gi I 16447 j emb | CAA39441 | (X55970) photosystem II 10 kDa polypeptx 
[Arabidopsis thaliana] >gi 13152571 (AC002986) 

- % Identity: 84.6 

- Alignment Length: 52 

- Location of Alignment in SEQ ID NO 712: from 1 to 47 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 713 

- Ceres seq__id 1593107 

- Location of start within SEQ ID NO 710: at 236 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No . 36 6 

- gi No. 131398 

- Description: PHOTOSYSTEM II 10 KD POLYPEPTIDE PRECURSOR 
>gi|72714|piri | F2MU10 photosystem II 10K protein precursor - Arabidopsis 
thaliana >gi i 16447 | emb I CAA3944 1 | (X55970) photosystem II 10 kDa polypeptr< 
[Arabidopsis thaliana] >gi 13152571 (AC002986) 

- % Identity: 94.3 

- Alignment Length: 88 

- Location of Alignment in SEQ ID NO 713: from 1 to 7 4 
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Maximum Length Sequence : 

related to: 
Clone IDs: 

142813 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 714 

- Ceres seq_id 1593153 
{B} Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 715 

- Ceres seq_id 1593154 

- Location of start within SEQ ID NO 714: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S10 

- Location within SEQ ID NO 715: from 48 to 102 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 367 

- gi No. 5541704 

- Description: (AL096860) 40S RIBOSOMAL PROTEIN S20 homolog 
[Arabidopsis thaliana] 

- % Identity: 95.9 

- Alignment Length: 7 4 

- Location of Alignment in SEQ ID NO 715: from 29 to 102 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 716 

- Ceres seq__id 1593155 

- Location of start within SEQ ID NO 714: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 717 

- Ceres seq_id 1593156 

- Location of start within SEQ ID NO 714: at 94 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ribosomal protein S10 

- Location within SEQ ID NO 717: from 17 to 71 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 368 

- gi No. 5541704 

- Description: (AL096860) 40S RIBOSOMAL PROTEIN S20 homolog 
[Arabidopsis thaliana] 

- % Identity: 95.9 

- Alignment Length: 7 4 

- Location of Alignment in SEQ ID NO 717: from 1 to 71 



Maximum Length Sequence: 

related to: 
Clone IDs: 
41039 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 718 

- Ceres seq_id 1593183 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 719 

- Ceres seq_id 1593184 

- Location of start within SEQ ID NO 718: at 55 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 369 

- gi No. 458468 

- Description: (U07025) chitinase [ Janthinobacterium lividum] 

- % Identity: 70.6 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 719: from 83 to 99 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 720 

- Ceres seq_id 1593185 

- Location of start within SEQ ID NO 718: at 64 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

(Dp) Related Amino Acid Sequences 

- Alignment No. 37 0 

- gi No. 458468 

- Description: (U07025) chitinase [Janthinobacterium lividum] 

- % Identity: 70.6 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 720: from 80 to 96 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 721 

- Ceres seq_id 1593186 

- Location of start within SEQ ID NO 718: at 148 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

(Dp) Related Amino Acid Sequences 



- Alignment No. 371 

- gi No. 458468 

™ Description: (U07025) chitinase [Janthinobacterium lividum] 

- % Identity: 70.6 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 721: from 52 to 68 



Maximum Length Sequence: 

related to: 
Clone IDs: 

106269 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 722 

- Ceres seq_id 1593202 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 723 

- Ceres seq__id 1593203 

- Location of start within SEQ ID NO 722: at 176 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
- Alignment No. 37 2 
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- gi No. 1170169 

- Description; HOMEOBOX-LEUCINE ZIPPER PROTEIN HAT 2 {HD-ZIP PROTEIN 
2) >gi | 549886 (U09335) homeobox protein [Arabidopsis thaliana] 

- % Identity: 80 

- Alignment Length: 35 

- Location of Alignment in SEQ ID NO 723: from 77 to 111 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 724 

- Ceres seq_id 1593204 

- Location of start within SEQ ID NO 722: at 179 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 373 

- gi No. 1170169 

- Description: HOMEOBOX-LEUCINE ZIPPER PROTEIN HAT 2 { HD-ZIP PROTEIN 
2) >gi (549886 (U09335) homeobox protein [Arabidopsis thaliana] 

- % Identity: 80 

- Alignment Length: 35 

- Location of Alignment in SEQ ID NO 724: from 76 to 110 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 725 

- Ceres seq__id 1593205 

- Location of start within SEQ ID NO 722: at 182 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 37 4 

- gi No. 1170169 

- Description: HOMEOBOX-LEUCINE ZIPPER PROTEIN HAT 2 (HD-ZIP PROTEIN 
2) >gi 1549886 (U09335) homeobox protein [Arabidopsis thaliana] 

- % Identity: 80 

- Alignment Length: 35 

- Location of Alignment in SEQ ID NO 725: from 75 to 109 

Maximum Length Sequence: 

related to: 
Clone IDs: 

116470 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 726 

- Ceres seq_id 1593228 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 727 

- Ceres seq_id 1593229 

- Location of start within SEQ ID NO 726: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Uncharacterized protein family UPF0016 

- Location within SEQ ID NO 727: from 33 to 174 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 37 5 

- gi No. 5734713 

- Description; (AC008075) Is a member of PF 101169 Uncharacterized 
(transmembrane domain) protein family. [Arabidopsis thaliana] 
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- % Identity: 100 

- Alignment Length: 147 

- Location of Alignment in SEQ ID NO 727: from 29 to 175 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 728 

- Ceres seq_id 1593230 

- Location of start within SEQ ID NO 726: at 35 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Uncharacterized protein family UPF0016 

- Location within SEQ ID NO 728: from 5 to 146 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 37 6 

- gi No. 5734713 

- Description: (AC008075) Is a member of PF 101169 Uncharacterized 
(transmembrane domain) protein family. [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 147 

- Location of Alignment in SEQ ID NO 728: from 1 to 147 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 729 

- Ceres seq_id 1593231 

- Location of start within SEQ ID NO 726: at 124 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Uncharacterized protein family UPF0016 

- Location within SEQ ID NO 729: from 1 to 133 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 377 

- gi No. 5734713 

- Description: (AC008075) Is a member of PF | 01169 Uncharacterized 
(transmembrane domain) protein family. [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 147 

- Location of Alignment in SEQ ID NO 72 9: from 1 to 134 



Maximum Length Sequence: 

related to: 
Clone IDs: 
34299 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 730 

- Ceres seq_id 1593275 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 31 

- Ceres seq_id 1593276 

- Location of start within SEQ ID NO 730: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 37 8 

- gi No. 227070 

- Description: ribosomal protein CS-S5 [Spinacia oleracea] 

- % Identity: 71.4 
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- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 731: from 1 to 14 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 732 

- Ceres seq_id 1593277 

- Location of start within SEQ ID NO 730: at 141 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 
6278 

{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 7 33 

- Ceres seq_id 1593331 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 734 

- Ceres seq_id 1593332 

- Location of start within SEQ ID NO 733: at 56 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 35 

- Ceres seq_id 15 93333 

- Location of start within SEQ ID NO 733: at 293 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 37 9 

- gi No. 627071 

- Description: histidine-rich protein - Plasmodium lophurae 
(fragment) >gi 1552196 (M15317) histidine-rich protein [Plasmodium lophurae] 

- % Identity: 78.6 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 735: from 31 to 44 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 736 

- Ceres seq__id 1593334 

- Location of start within SEQ ID NO 733: at 353 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 380 

- gi No. 627071 

- Description: histidine-rich protein - Plasmodium lophurae 
(fragment) >gi 1552196 (M15317) histidine-rich protein [Plasmodium lophurae] 

- % Identity: 78.6 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 736: from 11 to 24 
Maximum Length Sequence: 
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related to: 
Clone IDs: 

109841 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 737 

- Ceres seq__id 1593344 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 738 

- Ceres seq_id 1593345 

- Location of start within SEQ ID NO 737: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide {s ) 

- Dehydrins 

- Location within SEQ ID NO 738: from 77 to 132 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 381 

- gi No. 81627 

- Description: glycine-rich protein 5 - Arabidopsis thaliana 
>gi j 259451 !bbs | 117616 (S47414) glycine-rich protein, atGRP {clone atGRP-5} 
[Arabidopsis thaliana, C24, Peptide Partial, 173 aa] [Arabidopsis thaliana] 

- % Identity: 78.3 

- Alignment Length: 60 

- Location of Alignment in SEQ ID NO 738: from 7 6 to 133 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 739 

- Ceres seq_id 1593346 

- Location of start within SEQ ID NO 737: at 57 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Dehydrins 

- Location within SEQ ID NO 739: from 59 to 114 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 38 2 

- gi No. 81627 

- Description: glycine-rich protein 5 - Arabidopsis thaliana 
>gi | 259451 |bbs | 117616 (S47414) glycine-rich protein, atGRP {clone atGRP-5} 
[Arabidopsis thaliana, C24, Peptide Partial, 173 aa] [Arabidopsis thaliana] 

- % Identity: 78.3 

- Alignment Length: 60 

- Location of Alignment in SEQ ID NO 739: from 58 to 115 

Maximum Length Sequence: 

related to: 
Clone IDs: 

248523 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 74 0 

- Ceres seq_id 1593394 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 741 

- Ceres seq_id 1593395 

- Location of start within SEQ ID NO 740: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Skpl family 
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- Location within SEQ ID NO 741: from 60 to 165 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 383 

- gi No. 1432083 

- Description: (U60981) homolog to Skplp, an evolutionarily 
conserved kinetochore protein in budding yeast [Arabidopsis thaliana] 
>gi 1 3068807 (AF059294) Skpl homolog [Arabidopsis thaliana] >gi 13719209 
(U97020) UIP1 [Arabidopsis thaliana] 

- % Identity: 95.5 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 741: from 57 to 165 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 742 

- Ceres seq_id 1593396 

- Location of start within SEQ ID NO 740: at 169 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Skpl family 

- Location within SEQ ID NO 742: from 4 to 109 aa. 

(Dp) Related Amino Acid Sequences 
~ Alignment No. 384 

- gi No. 1432083 

- Description: (U60981) homolog to Skplp, an evolutionarily 
conserved kinetochore protein in budding yeast [Arabidopsis thaliana] 
>gi (3068807 (AF059294) Skpl homolog [Arabidopsis thaliana] >gi 13719209 
(U97020) UIP1 [Arabidopsis thaliana] 

- % Identity: 95.5 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 742: from 1 to 109 

Maximum Length Sequence: 

related to: 
Clone IDs: 

253595 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 7 43 

- Ceres seq_id 1593424 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 744 

- Ceres seq_id 1593425 

- Location of start within SEQ ID NO 743: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S16 

- Location within SEQ ID NO 744: from 18 to 79 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 745 

- Ceres seq_id 1593426 

- Location of start within SEQ ID NO 743: at 28 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein SI 6 
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- Location within SEQ ID NO 745: from 9 to 70 aa. 
(Dp) Related Amino Acid Sequences 



Maximum Length Sequence: 

related to: 
Clone IDs: 

255665 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 74 6 

- Ceres seq_id 1593441 
(B) Polypeptide Sequence 



- Pat. Appln. SEQ ID NO 747 

- Ceres seq_id 1593442 

- Location of start within SEQ ID NO 74 6: at 90 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 385 

- gi No. 1710509 

- Description: 60S RIBOSOMAL PROTEIN L18A 

- % Identity: 98.5 

- Alignment Length: 67 

- Location of Alignment in SEQ ID NO 747: from 69 to 134 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 748 

- Ceres seq^id 1593443 

- Location of start within SEQ ID NO 746: at 174 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 38 6 

- gi No. 1710509 

- Description: 60S RIBOSOMAL PROTEIN L18A 

- % Identity: 98.5 

- Alignment Length: 67 

- Location of Alignment in SEQ ID NO 748: from 41 to 106 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 749 

- Ceres seq_id 1593444 

- Location of start within SEQ ID NO 746: at 267 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 387 

- gi No. 1710509 

- Description: 60S RIBOSOMAL PROTEIN L18A 

- % Identity: 98.5 

- Alignment Length: 67 

- Location of Alignment in SEQ ID NO 749: from 10 to 75 



Maximum Length Sequence: 

related to: 
Clone IDs: 

259799 

(Ac) cDNA Polynucleotide Sequence 
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- Pat. Appln. SEQ ID NO 750 

- Ceres seq_id 1593459 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 751 

- Ceres seq_id 1593460 

- Location of start within SEQ ID NO 750: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 388 

- gi No. 4884033 

- Description: (AJ133753) peptide methionine sulfoxide reductase 
[Arabidopsis thaliana] >gi i 488 4 035 I emb I CAB43187 . 1 1 (AJ133754) peptide 
methionine sulfoxide reductase [Arabidopsis thaliana] 

- % Identity: 76.2 

- Alignment Length: 130 

- Location of Alignment in SEQ ID NO 751: from 41 to 170 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 52 

- Ceres seq_id 1593461 

- Location of start within SEQ ID NO 750: at 21 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 38 9 

- gi No. 4884033 

- Description: (AJ133753) peptide methionine sulfoxide reductase 
[Arabidopsis thaliana] >gi | 4884035 ! emb | CAB43187 . 1 I (AJ133754) peptide 
methionine sulfoxide reductase [Arabidopsis thaliana] 

- % Identity: 76.2 

- Alignment Length: 130 

- Location of Alignment in SEQ ID NO 752: from 35 to 164 



Maximum Length Sequence: 

related to: 
Clone IDs: 

269234 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 753 

- Ceres seq_id 1593489 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 754 

- Ceres seq_id 1593490 

- Location of start within SEQ ID NO 753: at 43 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 3 90 

- gi No. 2246378 

- Description: (Z86094) plastid protein [Arabidopsis thali 

- % Identity: 84 

- Alignment Length: 106 

- Location of Alignment in SEQ ID NO 754: from 52 to 157 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 755 

- Ceres seq_id 1593491 
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- Location of start within SEQ ID NO 753: at 220 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 391 

- gi No. 2246378 

- Description; (Z86094) plastid protein [Arabidopsis thaliana] 

- % Identity: 84 

- Alignment Length: 106 

- Location of Alignment in SEQ ID NO 755: from 1 to 98 



Maximum Length Sequence: 

related to: 
Clone IDs: 

254378 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 756 

- Ceres seq_id 1593502 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 757 

- Ceres seq_id 1593503 

- Location of start within SEQ ID NO 756: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 3 92 

- gi No. 3123331 

- Description: (AJ005930) squalene epoxidase homologue [Arabidopsis 



- % Identity: 70.5 

- Alignment Length: 61 

- Location of Alignment in SEQ ID NO 757: from 53 to 113 



Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 758 

- Ceres seq_id 1593571 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 759 

- Ceres seq_id 1593572 

- Location of start within SEQ ID NO 758: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Oleosin 

- Location within SEQ ID NO 759: from 35 to 152 aa . 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 760 

- Ceres seq__id 1593608 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 61 

- Ceres seq_id 1593609 

- Location of start within SEQ ID NO 760: at 3 nt . 



thaliana] 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 761: from 76 to 229 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 7 62 

- Ceres seq_id 1593622 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 63 

- Ceres seq__id 1593623 

- Location of start within SEQ ID NO 7 62; at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cytochrome P4 5 0 

- Location within SEQ ID NO 763: from 1 to 378 aa. 



Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 7 64 

- Ceres seq_id 1593641 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 765 

- Ceres seq_id 1593642 

- Location of start within SEQ ID NO 764: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sugar (and other) transporter 

- Location within SEQ ID NO 765: from 53 to 424 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 393 

- gi No. 4836903 

- Description: (AC007369) Similar to phosphate transporter proteins 
[Arabidopsis thaliana] 

- % Identity: 92.6 

- Alignment Length: 10 8 

- Location of Alignment in SEQ ID NO 7 65; from 53 to 156 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 66 

- Ceres seq_id 1593644 

- Location of start within SEQ ID NO 7 64: at 7 9 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sugar (and other) transporter 

- Location within SEQ ID NO 766: from 27 to 398 aa. 



(Dp) Related Amino Acid Sequences 



(Dp) Related Amino Acid Sequences 
- Alignment No. 394 



Attorney Docket No. 2750-1237P Table 1 

Client Docket No. 80146.003 Page 168 



- gi No. 4836903 

- Description: {AC007369) Similar to phosphate transporter proteins 
[Arabidopsis thaliana] 

- % Identity: 92.6 

- Alignment Length; 108 

- Location of Alignment in SEQ ID NO 766: from 27 to 130 



Maximum Length Sequence : 



{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 7 67 

- Ceres seq_id 159364 9 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 68 

- Ceres seq_id 1593650 

- Location of start within SEQ ID NO 767: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sugar (and other) transporter 

- Location within SEQ ID NO 768: from 53 to 156 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 395 

- gi No. 4836903 

- Description: (AC007369) Similar to phosphate transporter proteins 
[Arabidopsis thaliana] 

- % Identity: 92.7 

- Alignment Length: 109 

- Location of Alignment in SEQ ID NO 7 68: from 53 to 161 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 69 

- Ceres seq_id 1593652 

- Location of start within SEQ ID NO 7 67: at 7 9 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Sugar (and other) transporter 

- Location within SEQ ID NO 769: from 27 to 130 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 396 

- gi No. 4836903 

- Description: (AC007369) Similar to phosphate transporter proteins 
[Arabidopsis thaliana] 

- % Identity: 92.7 

- Alignment Length: 109 

- Location of Alignment in SEQ ID NO 769: from 27 to 135 



Maximum Length Sequence: 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 77 0 

- Ceres seq_id 1593663 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 771 

- Ceres seq_id 1593664 

- Location of start within SEQ ID NO 770: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 397 

- gi No. 481132 

- Description: sucrose transport protein SUC1 - Arabidopsis 
thaliana >gi I 4 07094 i emb | CAA53147 | (X75365) sucrose-proton symporter 
[Arabidopsis thaliana] 

- % Identity: 75.7 

- Alignment Length: 37 

- Location of Alignment in SEQ ID NO 771: from 1 to 37 



Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 772 

- Ceres seq_id 1593707 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 773 

- Ceres seq_id 1593708 

- Location of start within SEQ ID NO 772: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 398 

- gi No. 4325351 

- Description: (AF128394) similar to Antirrhinum majus (garden 
snapdragon) TNP2 protein (GB:X57297) [Arabidopsis thaliana] 

- % Identity: 87 

- Alignment Length: 368 

- Location of Alignment in SEQ ID NO 773: from 17 to 378 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 774 

- Ceres seq_id 1593710 

- Location of start within SEQ ID NO 772: at 85 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 399 

- gi No. 4325351 

- Description: (AF128394) similar to Antirrhinum ma]us (garden 
snapdragon) TNP2 protein (GB:X57297) [Arabidopsis thaliana] 

- % Identity: 87 

- Alignment Length: 368 

- Location of Alignment in SEQ ID NO 774: from 1 to 350 



Maximum Length Sequence: 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 775 

- Ceres seq_id 1593711 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 77 6 

- Ceres seq_id 1593712 

- Location of start within SEQ ID NO 775: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicte 
Polypeptide (s) 
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- NB-ARC domain 

- Location within SEQ ID NO 776: from 139 to 250 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 777 

- Ceres seq_id 1593713 

- Location of start within SEQ ID NO 775: at 10 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- NB-ARC domain 

- Location within SEQ ID NO 777: from 136 to 247 aa . 



(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 77 8 

- Ceres seq__id 1593714 

- Location of start within SEQ ID NO 775: at 127 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- NB-ARC domain 

- Location within SEQ ID NO 778: from 97 to 208 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 779 

- Ceres seq__id 1593751 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 780 

- Ceres seq__id 1593752 

- Location of start within SEQ ID NO 77 9: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 00 

- gi No. 3377834 

- Description: (AF075598) No definition line found [Arabxdopsrs 

thaliana] 

- % Identity: 94.2 

- Alignment Length: 69 

- Location of Alignment in SEQ ID NO 780: from 1 to 69 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 781 

- Ceres seq_id 1593753 

- Location of start within SEQ ID NO 779: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 782 
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- Ceres seq_id 1593754 

- Location of start within SEQ ID NO 779: at 37 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 401 

- gi No. 3377834 

- Description: (AF075598) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 94.2 

- Alignment Length: 69 

- Location of Alignment in SEQ ID NO 7 82: from 1 to 57 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 783 

- Ceres seq_id 1593769 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 784 

- Ceres seq_id 1593770 

- Location of start within SEQ ID NO 783: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 402 

- gi No. 4038061 

- Description: (AC005897) similar to leucine zipper transcription 
factors [Arabidopsis thaliana] 

- % Identity: 7 9.5 

- Alignment Length: 122 

- Location of Alignment in SEQ ID NO 784: from 1 to 121 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln, SEQ ID NO 785 

- Ceres seq_id 1593773 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 786 

- Ceres seq_id 1593774 

- Location of start within SEQ ID NO 785: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- NB-ARC domain 

- Location within SEQ ID NO 786: from 88 to 134 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 787 

- Ceres seq_id 1593775 

- Location of start within SEQ ID NO 785: at 44 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- NB-ARC domain 

- Location within SEQ ID NO 787: from 74 to 120 aa. 
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(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 788 

- Ceres seq_id 1593776 

- Location of start within SEQ ID NO 785: at 77 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- NB-ARC domain 

- Location within SEQ ID NO 788: from 63 to 109 aa . 



(Dp) Related Amino Acid Sequences 



Maximum Length Sequence: 



{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 789 

- Ceres seq_id 1593781 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 90 

- Ceres seq_id 1593782 

- Location of start within SEQ ID NO 789: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide {s ) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 790: from 3 to 146 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 4 03 

- gi No. 99806 

- Description: extensin - rape 

- % Identity: 81.5 

- Alignment Length: 272 

- Location of Alignment in SEQ ID NO 7 90: from 1 to 2 64 



Maximum Length Sequence: 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 7 91 

- Ceres seq_id 1593809 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 92 

- Ceres seq_id 1593810 

- Location of start within SEQ ID NO 7 91: at 3 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 04 

- gi No. 5459298 

- Description: (Y17722) telomere repeat-binding protein TRP1 
[Arabidopsis thaliana] 

- % Identity: 72.7 

- Alignment Length: 55 

- Location of Alignment in SEQ ID NO 792: from 127 to 181 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 93 
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- Ceres seq_id 1593811 

- Location of start within 3EQ ID NO 7 91: at 222 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 05 

- gi No. 5459298 

- Description: (Y17722) telomere repeat-binding protein TRPl 
[Arabidopsis thaliana] 

- % Identity: 72.7 

- Alignment Length: 55 

- Location of Alignment in SEQ ID NO 793: from 54 to 108 



Maximum Length Sequence: 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 7 94 

- Ceres seq_id 1593815 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 95 

- Ceres seq_id 1593816 

- Location of start within SEQ ID NO 794: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 795: from 1 to 166 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 4 06 

- gi No. 3236253 

- Description: (AC004684) receptor-like protein kinase [Arabidopsis 

thaliana] 

- % Identity: 79.5 

- Alignment Length: 224 

- Location of Alignment in SEQ ID NO 795: from 1 to 224 



Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 7 96 

- Ceres seq_id 1593817 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 97 

- Ceres seq_id 1593818 

- Location of start within SEQ ID NO 7 96: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Phosphatidylinositol 3- and 4-kinases 

- Location within SEQ ID NO 797: from 113 to 206 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 4 07 

- gi No. 4467359 

- Description: (AJ002685) Phosphatidylinositol 4-kinase 
[Arabidopsis thaliana] 

- % Identity: 89.8 

- Alignment Length: 20 6 

- Location of Alignment in SEQ ID NO 797: from 1 to 206 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 98 

- Ceres seq_id 1593819 

- Location of start within SEQ ID NO 7 96: at 50 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Phosphatidylinositol 3- and 4-kinases 

- Location within SEQ ID NO 798: from 97 to 190 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 408 

- gi No. 4467359 

- Description: (AJ002685) Phosphatidylinositol 4-kinase 
[Arabidopsis thaliana] 

- % Identity: 89.8 

- Alignment Length: 206 

- Location of Alignment in SEQ ID NO 798: from 1 to 190 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 7 99 

- Ceres seq__id 1593820 

- Location of start within SEQ ID NO 7 96: at 212 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s ) 

- Phosphatidylinositol 3- and 4-kinases 

- Location within SEQ ID NO 799: from 43 to 136 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 409 

- gi No. 4467359 

- Description: (AJ002685) Phosphatidylinositol 4-kinase 
[Arabidopsis thaliana] 

- % Identity: 89.8 

- Alignment Length: 20 6 

- Location of Alignment in SEQ ID NO 799: from 1 to 136 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 800 

- Ceres seq_id 1593869 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 801 

- Ceres seq_id 1593870 

- Location of start within SEQ ID NO 800: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Prokaryotic-type class I peptide chain release factors 

- Location within SEQ ID NO 801: from 74 to 185 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 802 

- Ceres seq__id 1593872 

- Location of start within SEQ ID NO 800: at 10 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Prokaryotic-type class 1 peptide chain release factors 

- Location within SEQ ID NO 802: from 71 to 182 aa. 

(Dp) Related Amino Acid Sequences 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 803 

- Ceres seq_id 1593876 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 04 

- Ceres seq__id 1593877 

- Location of start within SEQ ID NO 803: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 410 

- gi No. 4914324 

- Description: (AC005489) F14N23.10 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 10 9 

- Location of Alignment in SEQ ID NO 804: from 27 to 135 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 805 

- Ceres seq_id 1593879 

- Location of start within SEQ ID NO 803: at 82 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 411 

- gi No. 4914324 

- Description: (AC005489) F14N23.10 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 10 9 

- Location of Alignment in SEQ ID NO 805: from 1 to 108 
Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 806 

- Ceres seq_id 1593888 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 807 

- Ceres seq_id 1593889 

- Location of start within SEQ ID NO 806: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 412 

- gi No. 4914324 

- Description: (AC005489) F14N23.10 [Arabidopsis thaliana] 

- % Identity: 97.9 

- Alignment Length: 47 

- Location of Alignment in SEQ ID NO 807: from 27 to 73 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 808 

- Ceres seq_id 1593891 

- Location of start within SEQ ID NO 806: at 82 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 413 

- gi No. 4914324 

- Description: (AC005489) F14N23.10 [Arabidopsis thaliana] 

- % Identity: 97.9 

- Alignment Length: 4 7 

- Location of Alignment in SEQ ID NO 808: from 1 to 4 6 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 809 

- Ceres seq__id 1594065 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 810 

- Ceres seq_id 1594066 

- Location of start within SEQ ID NO 809: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 810: from 266 to 331 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 811 

- Ceres seq_id 1594068 

- Location of start within SEQ ID NO 809: at 4 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 811: from 265 to 330 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 812 

- Ceres seq_id 1594107 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 813 

- Ceres seq__id 1594108 

- Location of start within SEQ ID NO 812: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 814 
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- Ceres seq_id 1594109 

- Location of start within SEQ ID NO 812: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 414 

- gi No. 2191183 

- Description: (AF007271) similar to the ligand-gated ionic 
channels family [Arabidopsis thaliana] 

- % Identity: 80 

- Alignment Length: 15 

- Location of Alignment in SEQ ID NO 814: from 2 to 16 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 815 

- Ceres seq_id 1594130 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 816 

- Ceres seq_id 1594131 

- Location of start within SEQ ID NO 815: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Retroviral aspartyl proteases 

- Location within SEQ ID NO 816: from 406 to 490 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 817 

- Ceres seq_id 1594133 

- Location of start within SEQ ID NO 815: at 151 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Retroviral aspartyl proteases 

- Location within SEQ ID NO 817: from 356 to 440 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 818 

- Ceres seq_id 1594134 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 819 

- Ceres seq_id 1594135 

- Location of start within SEQ ID NO 818: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Retroviral aspartyl proteases 

- Location within SEQ ID NO 819: from 299 to 383 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 415 

- gi No. 99721 
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- Description: retrovirus-related polyprotein - Arabidopsis 
thaliana retrotransposon Tal-3 >gi | 1 6534 ] emb i CAA31 653 ! (X13291) polyprotein 
[Arabidopsis thaliana] 

- % Identity: 72.1 

- Alignment Length: 61 

- Location of Alignment in SEQ ID NO 819: from 425 to 485 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 820 

- Ceres seq_id 1594137 

- Location of start within SEQ ID NO 818: at 151 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Retroviral aspartyl proteases 

- Location within SEQ ID NO 820: from 24 9 to 333 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 416 

- gi No. 99721 

- Description: retrovirus-related polyprotein - Arabidopsis 
thaliana retrotransposon Tal-3 >gi \ 16534 | emb I CAA31 653 I (X13291) polyprotein 
[Arabidopsis thaliana] 

- % Identity: 72.1 

- Alignment Length: 61 

- Location of Alignment in SEQ ID NO 820: from 375 to 435 
Maximum Length Sequence: 

{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 821 

- Ceres seq_id 1594182 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 822 

- Ceres seq_id 1594183 

- Location of start within SEQ ID NO 821: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Cytochrome P450 

- Location within SEQ ID NO 822: from 30 to 454 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 417 

- gi No. 1345644 

- Description: CYTOCHROME P450 86A1 (CYPLXXXVI) 

>gi 1 94 044 6 i emb | CAA62 082 | (X90458) cytochrome p450 [Arabidopsis thaliana] 

- % Identity: 93.9 

- Alignment Length: 4 61 

- Location of Alignment in SEQ ID NO 822: from 1 to 454 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 823 

- Ceres seq_id 1594185 

- Location of start within SEQ ID NO 821: at 214 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Cytochrome P450 

- Location within SEQ ID NO 823: from 1 to 383 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 418 

- gi No. 1345644 

- Description: CYTOCHROME P450 86A1 (CYPLXXXV1) 

>gi| 940446 | emb!CAA62082! (X90458) cytochrome p450 [Arabidopsis thaliana] 

- % Identity: 93.9 

- Alignment Length: 4 61 

- Location of Alignment in SEQ ID NO 823: from 1 to 38 3 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 824 

- Ceres seq_id 1594192 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 825 

- Ceres seq_id 1594193 

- Location of start within SEQ ID NO 824: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 419 

- gi No. 3319360 

- Description: (AF077408) contains similarity to Vicia faba 
retrotransposon-like gene (GB : AB007 4 67 ) [Arabidopsis thaliana] 

- % Identity: 78.6 

- Alignment Length: 42 

- Location of Alignment in SEQ ID NO 825: from 1 to 42 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 826 

- Ceres seq_id 1594249 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 827 

- Ceres seq_id 1594250 

- Location of start within SEQ ID NO 826: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 20 

- gi No. 4850396 

- Description: (AC007357) F3F19.15 [Arabidopsis thaliana] 

- % Identity: 83.9 

- Alignment Length: 112 

- Location of Alignment in SEQ ID NO 827: from 1 to 111 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 828 

- Ceres seq_id 1594252 

- Location of start within SEQ ID NO 826: at 154 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 421 

- gi No. 4850396 

- Description: (AC007357) F3F19.15 [Arabidopsis thaliana] 
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- % Identity: 83.9 

- Alignment Length: 112 

- Location of Alignment in SEQ ID NO 828: from 1 to 60 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 829 

- Ceres seq_id 1594282 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 830 

- Ceres seq_id 1594283 

- Location of start within SEQ ID NO 829: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- SCP-like extracellular protein 

- Location within SEQ ID NO 830: from 46 to 143 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 422 

- gi No. 224801 

- Description: protein la 7 pathogenesis related [Nicotiana sp.] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 830: from 137 to 14 8 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 831 

- Ceres seq__id 1594285 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 832 

- Ceres seq_id 1594286 

- Location of start within SEQ ID NO 831: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide {s) 

- SCP-like extracellular protein 

- Location within SEQ ID NO 832: from 54 to 186 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 833 

- Ceres seq_id 1594332 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 834 

- Ceres seq_id 1594333 

- Location of start within SEQ ID NO 833: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Zinc-binding dehydrogenases 

- Location within SEQ ID NO 834: from 11 to 348 aa . 
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- gi No. 2146727 

- Description: cinnamyl-alcohol dehydrogenase (EC 1.1.1.195) CADI 
Arabidopsis thaliana (fragment) >gi 1598069 (L37884) cinnamyl-alcohol 
dehydrogenase [Arabidopsis thaliana] 

- % Identity: 81.2 

- Alignment Length; 351 

- Location of Alignment in SEQ ID NO 834: from 5 to 355 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 835 

- Ceres seq_id 1594335 

- Location of start within SEQ ID NO 833: at 508 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Zinc-binding dehydrogenases 

- Location within SEQ ID NO 835: from 1 to 17 9 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 4 24 

- gi No. 2146727 

- Description: cinnamyl-alcohol dehydrogenase (EC 1.1.1.195) CADI 
Arabidopsis thaliana (fragment) >gi 1598069 (L37884) cinnamyl-alcohol 
dehydrogenase [Arabidopsis thaliana] 

- % Identity: 81.2 

- Alignment Length: 351 

- Location of Alignment in SEQ ID NO 835: from 1 to 186 



Maximum Length Sequence: 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 836 

- Ceres seq_id 1594437 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 837 

- Ceres seq_id 1594438 

- Location of start within SEQ ID NO 836: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 25 

- gi No. 5733883 

- Description: (AC007 932) F11A17.19 [Arabidopsis thaliana] 

- % Identity: 94.3 

- Alignment Length: 333 

- Location of Alignment in SEQ ID NO 837: from 1 to 332 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 38 

- Ceres seq_id 1594440 

- Location of start within SEQ ID NO 836: at 31 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 426 

- gi No. 5733883 

- Description: (AC007932) F11A17.19 [Arabidopsis thaliana] 

- % Identity: 94.3 

- Alignment Length: 333 
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- Location of Alignment in SEQ ID NO 838: from 1 to 322 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 839 

- Ceres seq__id 1594465 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 840 

- Ceres seq__id 1594466 

- Location of start within SEQ ID NO 839: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 27 

- gi No. 2576361 

- Description: (U39782) lysine and histidine specific transporter 
[Arabidopsis thaliana] 

- % Identity: 72.6 

- Alignment Length: 8 4 

- Location of Alignment in SEQ ID NO 840: from 2 to 85 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 841 

- Ceres seq_id 1594467 

- Location of start within SEQ ID NO 839: at 9 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 42 8 

- gi No. 2576361 

- Description: (U39782) lysine and histidine specific transporter 
[Arabidopsis thaliana] 

- % Identity: 72.6 

- Alignment Length: 8 4 

- Location of Alignment in SEQ ID NO 841: from 1 to 83 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 842 

- Ceres seq_id 1594468 

- Location of start within SEQ ID NO 839: at 24 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 42 9 

- gi No. 2576361 

- Description: (U39782) lysine and histidine specific transportei 
[Arabidopsis thaliana] 

- % Identity: 72.6 

- Alignment Length: 8 4 

- Location of Alignment in SEQ ID NO 842: from 1 to 78 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 843 

- Ceres seq_id 1594479 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 84 4 

- Ceres seq_id 1594480 

- Location of start within SEQ ID NO 843: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 30 

- gi No. 2995990 

- Description: (AF05374 6) dormancy-associated protein [Arabidopsis 
thaliana] >gi 12995992 (AF053747) dormancy-associated protein [Arabidopsis 
thaliana] 

- % Identity: 94.4 

- Alignment Length: 18 

- Location of Alignment in SEQ ID NO 844: from 7 to 24 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 845 

- Ceres seq_id 1594481 

- Location of start within SEQ ID NO 843: at 55 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 31 

- gi No. 2995990 

- Description: (AF05374 6) dormancy-associated protein [Arabidopsis 
thaliana] >gi [ 2995992 (AF053747) dormancy-associated protein [Arabidopsis 
thaliana] 

- % Identity: 100 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 845: from 26 to 36 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 84 6 

- Ceres seq_id 1594517 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 847 

- Ceres seq_id 1594518 

- Location of start within SEQ ID NO 846: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 32 

- gi No. 4325349 

- Description: (AF128394) contains similarity to Petunia PTTA T 
(GB:AF009516) [Arabidopsis thaliana] 

- % Identity: 92.9 

- Alignment Length: 155 

- Location of Alignment in SEQ ID NO 847: from 1 to 155 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 848 

- Ceres seq_id 1594520 

- Location of start within SEQ ID NO 84 6: at 718 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 4 33 

- gi No. 4325349 

- Description: (AF128394) contains similarity to Petunia PTTA F 
(GB:AF009516) [Arabidopsis thaliana] 

- % Identity: 94.4 

- Alignment Length: 125 

- Location of Alignment in SEQ ID NO 848: from 1 to 109 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 849 

- Ceres seq_id 1594543 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 850 

- Ceres seq_id 1594544 

- Location of start within SEQ ID NO 849: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- NB-ARC domain 

- Location within SEQ ID NO 850: from 141 to 273 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 851 

- Ceres seq_id 1594546 

- Location of start within SEQ ID NO 849: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- NB-ARC domain 

- Location within SEQ ID NO 851: from 102 to 234 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 852 

- Ceres seq_id 1594562 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 853 

- Ceres seq_id 1594563 

- Location of start within SEQ ID NO 852: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Forkhead-associated (FHA) domain 

- Location within SEQ ID NO 853: from 12 to 79 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 854 

- Ceres seq_id 1594667 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 855 

- Ceres seq_id 1594668 

- Location of start within SEQ ID NO 854: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ABC transporter 

- Location within SEQ ID NO 855: from 79 to 271 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 856 

- Ceres seq_id 1594670 

- Location of start within SEQ ID NO 854: at 28 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ABC transporter 

- Location within SEQ ID NO 856: from 70 to 262 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 857 

- Ceres seq_id 1594695 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 858 

- Ceres seq_id 1594696 

~ Location of start within SEQ ID NO 857: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 434 

- gi No. 283004 

- Description: DNA-binding protein Gt-2 - rice 
>gi|20249[emb|CAA48328| (X68261) gt-2 [Oryza sativa] 

- % Identity: 72.9 

- Alignment Length: 4 8 

- Location of Alignment in SEQ ID NO 858: from 5 to 52 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 859 

- Ceres seq_id 1594697 

- Location of start within SEQ ID NO 857: at 5 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 35 

- gi No. 283004 

- Description: DNA-binding protein Gt-2 - rice 
>gij 20249 | emb ! CAA4 8328 ! (X68261) gt-2 [Oryza sativa] 

- % Identity: 72.9 

- Alignment Length: 48 

- Location of Alignment in SEQ ID NO 859: from 4 to 51 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 8 60 

- Ceres seq_id 1594698 

- Location of start within SEQ ID NO 857: at 206 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 8 61 

- Ceres seq_id 1594699 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 62 

- Ceres seq_id 1594700 

- Location of start within SEQ ID NO 8 61: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 43 6 

- gi No. 283004 

- Description: DNA-binding protein Gt-2 - rice 
>gil2024 9[emb|CAA48328i (X68261) gt-2 [Oryza sativa] 

- % Identity: 72.9 

- Alignment Length: 4 8 

- Location of Alignment in SEQ ID NO 8 62: from 5 to 52 

(B) Polypeptide Sequence 

» Pat. Appln. SEQ ID NO 8 63 

- Ceres seq_id 1594701 

- Location of start within SEQ ID NO 8 61: at 5 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 37 

- gi No. 283004 

- Description: DNA-binding protein Gt-2 - rice 
>gi|2024 9|emb|CAA48328 j (X68261) gt-2 [Oryza sativa] 

- % Identity: 72.9 

- Alignment Length: 4 8 

- Location of Alignment in SEQ ID NO 863: from 4 to 51 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 64 

- Ceres seq__id 1594702 

- Location of start within SEQ ID NO 861: at 206 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 865 

- Ceres seq_id 1594703 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 66 
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- Ceres seq__id 1594704 

- Location of start within SEQ ID NO 8 65: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 38 

- gi No. 3249065 

- Description: (AC004473) Similar to HAK1 gbjU22945 high affinity 
potassium transporter from Schwanniomyces occidentalis . [Arabidopsis 
thaliana] 

- % Identity: 72.7 

- Alignment Length: 44 

- Location of Alignment in SEQ ID NO 866: from 122 to 165 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 867 

- Ceres seq__id 1594705 

- Location of start within SEQ ID NO 8 65: at 21 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 439 

- gi No. 3249065 

- Description: (AC004473) Similar to HAK1 gb|U22945 high affinity 
potassium transporter from Schwanniomyces occidentalis. [Arabidopsis 
thaliana] 

- % Identity: 72.7 

- Alignment Length: 4 4 

- Location of Alignment in SEQ ID NO 867: from 116 to 159 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 68 

- Ceres seq_id 1594706 

- Location of start within SEQ ID NO 8 65: at 24 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 40 

- gi No. 3249065 

- Description: (AC004473) Similar to HAK1 gb|U22945 high affinity 
potassium transporter from Schwanniomyces occidentalis. [Arabidopsis 
thaliana] 

- % Identity: 72.7 

- Alignment Length: 4 4 

- Location of Alignment in SEQ ID NO 8 68: from 115 to 158 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 869 

- Ceres seq__id 1594752 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 87 0 

- Ceres seq_id 1594753 

- Location of start within SEQ ID NO 8 69: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 441 

- gi No. 4056432 

- Description: (AC005990) Similar to gi 1 2245014 glucosyltransf erase 
homolog from Arabidopsis thaliana chromosome 4 contig gb|Z97341. ESTs 
gb|T20778 and gb|AA586281 come from this gene. [Arabidopsis thaliana] 

- % Identity: 73.1 

- Alignment Length: 52 

- Location of Alignment in SEQ ID NO 870: from 305 to 356 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 871 

- Ceres seq_id 1594755 

- Location of start within SEQ ID NO 869: at 529 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 42 

- gi No. 4056432 

- Description: (AC005990) Similar to gi j 2245014 glucosyltransf erase 
homolog from Arabidopsis thaliana chromosome 4 contig gb|Z97341.^ ESTs 
gb|T20778 and gbjAA586281 come from this gene. [Arabidopsis thaliana] 

- % Identity: 73.1 

- Alignment Length: 52 

- Location of Alignment in SEQ ID NO 871: from 129 to 180 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 872 

- Ceres seq__id 1594760 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 873 

- Ceres seq_id 1594761 

- Location of start within SEQ ID NO 872: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 43 

- gi No. 5724774 

- Description: (AF160183) contains similarity to retrotransposons; 
may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 77.6 

- Alignment Length: 4 9 

- Location of Alignment in SEQ ID NO 873: from 42 to 90 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 874 

- Ceres seq_id 1594763 

- Location of start within SEQ ID NO 872: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 
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- Pat. Appln. SEQ ID NO 875 
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- Ceres seq___id 1594807 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 87 6 

- Ceres seq_id 1594808 

- Location of start within SEQ ID NO 875: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 877 

- Ceres seq_id 1594810 

- Location of start within SEQ ID NO 875: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 44 4 

- gi No. 3047073 

- Description: (AF058825) contains similarity to retrotransposon- 
like proteins [Arabidopsis thaliana] 

- % Identity: 80 

- Alignment Length: 15 

- Location of Alignment in SEQ ID NO 877: from 1 to 15 



Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 87 8 

- Ceres seq_id 1594833 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 87 9 

- Ceres seq_id 1594834 

- Location of start within SEQ ID NO 878: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein Lll 

- Location within SEQ ID NO 879: from 43 to 188 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 4 45 

- gi No. 1710483 

- Description: SOS RIBOSOMAL PROTEIN Lll >gi 1 1213239 | emb i CAA65166 | 
(X95916) ribosomal protein Lll [ Streptomyces galbus] 

- % Identity: 70 

- Alignment Length: 4 0 

- Location of Alignment in SEQ ID NO 879: from 35 to 74 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 880 

- Ceres seq_id 1594836 

- Location of start within SEQ ID NO 878: at 208 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein Lll 

- Location within SEQ ID NO 880: from 1 to 119 aa . 



(Dp) Related Amino Acid Sequences 
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Maximum Length Sequence; 

{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 8 81 

- Ceres seq_ici 1594860 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 882 

- Ceres seq_id 1594861 

- Location of start within SEQ ID NO 881: at 138 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ribosomal protein L6 

- Location within SEQ ID NO 882: from 9 to 88 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 883 

- Ceres seq_id 1594936 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 884 

- Ceres seq_id 1594937 

- Location of start within SEQ ID NO 883: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 885 

- Ceres seq_id 1594938 

- Location of start within SEQ ID NO 883: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 446 

- gi No. 1174867 

- Description: UBIQUINOL-CYTOCHROME C REDUCTASE COMPLEX UBIQUINONE- 
BINDING PROTEIN QP-C (UBIQUINOL-CYTOCHROME C REDUCTASE COMPLEX 8.2 KD 
PROTEIN) >gi | 633687 [ emb ( CAA558 62 ! (X79275) ubiquinol — cytochrome c reductase 
[Solanum tuberosum] [Solanum tuberosum] 

- % Identity: 80.6 

- Alignment Length: 72 

- Location of Alignment in SEQ ID NO 885: from 12 to 83 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 886 

- Ceres seq_id 1594939 

- Location of start within SEQ ID NO 883: at 35 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 47 

- gi No. 1174867 
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- Description: UBIQUINOL-CYTOCHROME C REDUCTASE COMPLEX UBIQUINONE- 
BINDING PROTEIN QP-C ( UBIQUINOL-CYTOCHROME C REDUCTASE COMPLEX 8 . 2 KD 
PROTEIN) >gi | 633687 j emb j CAA558 62 | (X79275) ubiquinol — cytochrome c reductase 
[Solanum tuberosum] [Solanum tuberosum] 

- % Identity: 80.6 

- Alignment Length: 72 

- Location of Alignment in SEQ ID NO 88 6: from 1 to 72 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 887 

- Ceres seq_id 1594971 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 888 

- Ceres seq_id 1594972 

- Location of start within SEQ ID NO 887: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 48 

- gi No. 2995990 

- Description: (AF05374 6) dormancy-associated protein [Arabidopsis 
thaliana] >gi 12995992 {AF053747) dormancy-associated protein [Arabidopsis 
thaliana] 

- % Identity: 100 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 888: from 1 to 17 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 889 

- Ceres seq_id 1595173 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 90 

- Ceres seq_id 1595174 

- Location of start within SEQ ID NO 889: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 44 9 

- gi No. 2995990 

- Description: (AF0537 4 6) dormancy-associated protein [Arabidopsis 
thaliana] >gi 12995992 (AF053747) dormancy-associated protein [Arabidopsis 
thaliana] 

- % Identity: 100 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 8 90: from 1 to 17 

Maximum Length Sequence: 

related to: 
Clone IDs: 
92604 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 891 

- Ceres seq_id 1595550 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 92 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 192 



- Ceres seq__id 1595551 

- Location of start within SEQ ID NO 891: at 92 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Polygalacturonase (pectinase) 

- Location within SEQ ID NO 892: from 98 to 191 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 450 

- gi No. 421833 

- Description: exopolygalacturonase (clone GBGa302) - Arabidopsis 
thaliana >gi ! 311962 | emta | CAA5 16 92 | (X73222) exopolygalacturonase [Arabidopsis 
thaliana] 

- % Identity: 99.5 

- Alignment Length: 192 

- Location of Alignment in SEQ ID NO 892: from 1 to 191 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 93 

- Ceres seq_id 1595552 

- Location of start within SEQ ID NO 891: at 143 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Polygalacturonase (pectinase) 

- Location within SEQ ID NO 893: from 81 to 174 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 51 

- gi No. 421833 

- Description: exopolygalacturonase (clone GBGa302) - Arabidopsrs 
thaliana >gi i 311962 I emb i CAA51692 1 (X73222) exopolygalacturonase [Arabidopsis 
thaliana] 

- % Identity: 99.5 

- Alignment Length: 192 

- Location of Alignment in SEQ ID NO 8 93: from 1 to 17 4 

Maximum Length Sequence: 

related to: 
Clone IDs: 

100499 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 8 94 

- Ceres seq_id 1595563 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 95 

- Ceres seq_id 1595564 

- Location of start within SEQ ID NO 894: at 111 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 452 

- gi No. 2342734 

- Description: (AC002341) DNA-binding protein isolog [Arabidopsis 

thaliana] 

- % Identity: 78.1 

- Alignment Length: 17 2 

- Location of Alignment in SEQ ID NO 895: from 1 to 169 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 96 

- Ceres seq_id 1595565 

- Location of start within SEQ ID NO 894: at 318 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 53 

- gi No. 2342734 

- Description: (AC002341) DNA-binding protein isolog [Arabidopsis 

thaliana] 

- % Identity: 78.1 

- Alignment Length: 17 2 

- Location of Alignment in SEQ ID NO 896: from 1 to 100 

Maximum Length Sequence: 

related to: 
Clone IDs: 

114858 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 8 97 

- Ceres seq_id 1595581 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 98 

- Ceres seq^id 1595582 

- Location of start within SEQ ID NO 897: at 178 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 454 

- gi No. 1279640 

- Description: (X92204) NAM [Petunia x hybrida] 

- % Identity: 77.5 

- Alignment Length: 14 4 

- Location of Alignment in SEQ ID NO 898: from 1 to 142 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 8 99 

- Ceres seq_id 1595583 

- Location of start within SEQ ID NO 897: at 217 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 55 

- gi No. 1279640 

- Description: (X92204) NAM [Petunia x hybrida] 

- % Identity: 77.5 

- Alignment Length: 14 4 

- Location of Alignment in SEQ ID NO 899: from 1 to 129 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 900 

- Ceres seq_id 1595584 

- Location of start within SEQ ID NO 897: at 367 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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- Alignment No. 45 6 

- gi No. 1279640 

- Description: (X92204) NAM [Petunia x hybrida] 

- % Identity: 77.5 

- Alignment Length: 14 4 

- Location of Alignment in SEQ ID NO 900: from 1 to 7 9 

Maximum Length Sequence : 

related to: 
Clone IDs: 

116843 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 901 

- Ceres seq_id 1595585 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 902 

- Ceres seq_id 1595586 

- Location of start within SEQ ID NO 901: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Zinc finger, C3HC4 type (RING finger} 

- Location within SEQ ID NO 902: from 335 to 375 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

126472 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 903 

- Ceres seq_id 1595599 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 904 

- Ceres seq_id 1595600 

- Location of start within SEQ ID NO 903: at 198 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Intermediate filament proteins 

- Location within SEQ ID NO 904: from 191 to 398 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 905 

- Ceres seq__id 1595601 

- Location of start within SEQ ID NO 903: at 441 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Intermediate filament proteins 

- Location within SEQ ID NO 905: from 110 to 317 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 90 6 

- Ceres seq_id 1595602 

- Location of start within SEQ ID NO 903: at 732 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Intermediate filament proteins 

- Location within SEQ ID NO 906: from 13 to 220 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

143608 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 907 

- Ceres seq_id 1595607 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 908 

- Ceres seq_id 1595608 

- Location of start within SEQ ID NO 907: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 457 

- gi No. 1817544 

- Description: (D83025) proline oxidase precursor [Arabidopsis 



thaliana] 

- % Identity: 99.4 

- Alignment Length: 4 99 

- Location of Alignment in SEQ ID NO 908: from 40 to 538 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 909 

- Ceres seq_id 1595609 

- Location of start within SEQ ID NO 907: at 120 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 4 58 

- gi No. 1817544 

- Description: (D83025) proline oxidase precursor [Arabidopsis 



thaliana] 

- % Identity: 99.4 

- Alignment Length: 4 99 

- Location of Alignment in SEQ ID NO 909: from 1 to 4 99 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 910 

- Ceres seq_id 1595610 

- Location of start within SEQ ID NO 907: at 402 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 45 9 

- gi No. 1817544 

- Description: (D83025) proline oxidase precursor [Arabidopsis 



thaliana] 



- % Identity: 99.4 

- Alignment Length: 4 99 
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- Location of Alignment in SEQ ID NO 910: from 1 to 4 05 



Maximum Length Sequence: 

related to: 
Clone IDs: 
24845 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 911 

- Ceres seq_id 1595619 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 912 

- Ceres seq_id 1595620 

- Location of start within SEQ ID NO 911: at 51 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Prion protein 

- Location within SEQ ID NO 912: from 39 to 144 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 60 

- gi No. 320976 

- Description: eggshell protein - fluke (Schistosoma haematobium) 
(subclone SH.E 2-1) 

- % Identity: 72.7 

- Alignment Length: 104 

- Location of Alignment in SEQ ID NO 912: from 42 to 136 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 913 

- Ceres seq_id 1595621 

- Location of start within SEQ ID NO 911: at 135 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Prion protein 

- Location within SEQ ID NO 913: from 11 to 116 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 61 

- gi No. 320976 

- Description: eggshell protein - fluke (Schistosoma haematobium) 
(subclone SH.E 2-1) 

- % Identity: 72.7 

- Alignment Length: 104 

- Location of Alignment in SEQ ID NO 913: from 14 to 108 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 914 

- Ceres seq_id 1595622 

- Location of start within SEQ ID NO 911: at 178 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

147806 

(Ac) cDNA Polynucleotide Sequence 
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- Pat. Appln. SEQ ID NO 915 

- Ceres seq_id 1595623 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 916 

- Ceres seq_id 1595624 

- Location of start within SEQ ID NO 915: at 193 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Tropomyosins 

- Location within SEQ ID NO 916: from 82 to 133 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 4 62 

- gi No. 2191168 

- Description: (AF007270) contains similarity to myosin heavy chain 
[Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 14 0 

- Location of Alignment in SEQ ID NO 916: from 1 to 14 0 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 917 

- Ceres seq_id 1595625 

- Location of start within SEQ ID NO 915: at 714 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Tropomyosins 

- Location within SEQ ID NO 917: from 1 to 134 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 63 

- gi No. 2191168 

- Description: (AF007270) contains similarity to myosin heavy chain 
[Arabidopsis thaliana] 

- % Identity: 94.4 

- Alignment Length: 195 

- Location of Alignment in SEQ ID NO 917: from 1 to 152 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 918 

- Ceres seq_id 1595626 

- Location of start within SEQ ID NO 915: at 822 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Tropomyosins 

- Location within SEQ ID NO 918: from 1 to 98 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 4 64 

- gi No. 2191168 

- Description: (AF007270) contains similarity to myosin heavy chain 
[Arabidopsis thaliana] 

- % Identity: 94.4 

- Alignment Length: 195 

- Location of Alignment in SEQ ID NO 918: from 1 to 116 



Maximum Length Sequence: 
related to: 
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Clone IDs: 

148018 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 919 

- Ceres seq_ici 1595627 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 920 

- Ceres seq_id 1595628 

- Location of start within SEQ ID NO 919: at 154 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 65 

- gi No. 2252860 

- Description: (AF013294) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 74.4 

- Alignment Length: 20 6 

- Location of Alignment in SEQ ID NO 920: from 55 to 24 9 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 921 

- Ceres seq_id 1595629 

- Location of start within SEQ ID NO 919: at 169 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 66 

- gi No. 2252860 

- Description: (AF013294) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 74.4 

- Alignment Length: 20 6 

- Location of Alignment in SEQ ID NO 921: from 50 to 244 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 922 

- Ceres seq_id 1595630 

- Location of start within SEQ ID NO 919: at 313 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 67 

- gi No. 2252860 

- Description: (AF013294) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 74.4 

- Alignment Length: 20 6 

- Location of Alignment in SEQ ID NO 922: from 2 to 196 

Maximum Length Sequence : 

related to: 
Clone IDs: 

156655 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 923 

- Ceres seq_id 1595678 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 924 

- Ceres seq_id 1595679 

- Location of start within SEQ ID NO 923: at 296 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Tropomyosins 

- Location within SEQ ID NO 924; from 63 to 259 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

158588 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 925 

- Ceres seq_id 1595695 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 926 

- Ceres seq_id 1595696 

- Location of start within SEQ ID NO 925: at 86 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Lipase/Acylhydrolase with GDSL-like motif 

- Location within SEQ ID NO 92 6: from 30 to 81 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 927 

- Ceres seq_id 1595697 

- Location of start within SEQ ID NO 925: at 391 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 928 

- Ceres seq_id 1595698 

- Location of start within SEQ ID NO 925: at 439 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 
21428 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 929 

- Ceres seq_id 1595707 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 930 

- Ceres seq_id 1595708 

- Location of start within SEQ ID NO 929: at 196 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Asparagine synthase 

- Location within SEQ ID NO 930: from 167 to 536 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 68 

- gi No. 1351988 

- Description: ASPARAGINE SYNTHETASE ( GLUTAMINE-HYDROLYZING) 
(GLUTAMINE- DEPENDENT ASPARAGINE SYNTHETASE) >gi | 1084354 Ipir | | S52387 
asparagine synthase (glutamine-hydrolyzing) {EC 6.3.5.4) - wild cabbage 
>gi | 669057 1 emb 1 CAA59138 ! (X84448) asparagine synthase 

- % Identity: 92.3 

- Alignment Length: 58 6 

- Location of Alignment in SEQ ID NO 930: from 1 to 584 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 931 

- Ceres seq_id 1595709 

- Location of start within SEQ ID NO 929: at 541 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Asparagine synthase 

- Location within SEQ ID NO 931: from 52 to 421 aa. 



{Dp} Related Amino Acid Sequences 

- Alignment No. 4 69 

- gi No. 1351988 

- Description: ASPARAGINE SYNTHETASE (GLUTAMINE-HYDROLYZING) 
{ GLUTAMINE-DEPENDENT ASPARAGINE SYNTHETASE) >gi | 1084354 1 pir | | S52387 
asparagine synthase (glutamine-hydrolyzing) {EC 6.3.5.4) - wild cabbage 
>gil 669057 [emb [CAA59138 | (X84448) asparagine synthase 

- % Identity: 92.3 

- Alignment Length: 58 6 

- Location of Alignment in SEQ ID NO 931: from 1 to 4 69 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 932 

- Ceres seq_id 1595710 

- Location of start within SEQ ID NO 929: at 595 nt . 

{C} Nomination and Annotation of Domains within Predicted 
Polypeptide { s } 

- Asparagine synthase 

- Location within SEQ ID NO 932: from 34 to 403 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 470 

- gi No. 1351988 

- Description: ASPARAGINE SYNTHETASE (GLUTAMINE-HYDROLYZING) 
{GLUTAMINE-DEPENDENT ASPARAGINE SYNTHETASE) >gi | 10 8 4 35 4 | pir | | S52 387 
asparagine synthase (glutamine-hydrolyzing) (EC 6.3.5.4) - wild cabbage 
>gi | 669057 J emb | CAA59138 | (X84448) asparagine synthase 

- % Identity: 92.3 

- Alignment Length: 58 6 

- Location of Alignment in SEQ ID NO 932: from 1 to 451 



Maximum Length Sequence: 

related to; 
Clone IDs: 
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40594 

(Ac^'cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 933 

- Ceres seq_id 1595725 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 934 

- Ceres seq_id 1595726 

- Location of start within SEQ ID NO 933: at 122 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 934: from 312 to 507 aa. 
{Dp} Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 935 

- Ceres seq__id 1595727 

~ Location of start within SEQ ID NO 933: at 140 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 935: from 306 to 501 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 936 
» Ceres seq__id 1595728 

- Location of start within SEQ ID NO 933: at 143 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 936: from 305 to 500 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

230831 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 937 

- Ceres seq_id 1595741 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 938 

- Ceres seq_id 1595742 

- Location of start within SEQ ID NO 937: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 471 

- gi No. 4768281 

- Description: (AF085231) phytochelatin synthase 1 [Arabidops 

thaliana] 

- % Identity: 97 

- Alignment Length: 47 3 
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- Location of Alignment in SEQ ID NO 938: from 63 to 523 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 93 9 

- Ceres seq_id 1595743 

- Location of start within SEQ ID NO 937: at 117 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 47 2 

- gi No. 4768281 

- Description: (AF085231) phytochelatin synthase 1 [Arabidopsis 

thaliana] 

- % Identity: 97 

- Alignment Length: 47 3 

- Location of Alignment in SEQ ID NO 939: from 25 to 485 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 940 

- Ceres seq__id 1595744 

- Location of start within SEQ ID NO 937: at 123 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 473 

- gi No. 4768281 

- Description: (AF085231) phytochelatin synthase 1 [Arabidopsis 

thaliana] 

- % Identity: 97 

- Alignment Length: 47 3 

- Location of Alignment in SEQ ID NO 940: from 23 to 483 

Maximum Length Sequence : 

related to: 
Clone IDs: 

231507 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 941 

- Ceres seq_id 1595748 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 942 

- Ceres seq_id 1595749 

- Location of start within SEQ ID NO 941: at 109 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 47 4 

- gi No. 3935149 

- Description: (AC005106) T25N20.13 [Arabidopsis thaliana] 

- % Identity: 98 

- Alignment Length: 153 

- Location of Alignment in SEQ ID NO 942: from 1 to 143 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 943 

- Ceres seq_id 1595750 

- Location of start within SEQ ID NO 941: at 181 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 47 5 

- gi No. 3935149 

- Description: (AC005106) T25N20.13 [Arabidopsis thaliana] 

- % Identity: 98 

- Alignment Length: 153 

- Location of Alignment in SEQ ID NO 943: from 1 to 119 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 94 4 

- Ceres seq_id 1595751 

- Location of start within SEQ ID NO 941: at 582 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 47 6 

- gi No. 3935149 

- Description: (AC005106) T25N20.13 [Arabidopsis thaliana] 

- % Identity: 98.2 

- Alignment Length: 222 

- Location of Alignment in SEQ ID NO 944: from 1 to 203 

Maximum Length Sequence : 

related to: 
Clone IDs: 

250386 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 94 5 

- Ceres seq_id 1595760 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 946 

- Ceres seq_id 1595761 

- Location of start within SEQ ID NO 945: at 88 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Myb-like DNA-binding domain 

- Location within SEQ ID NO 946: from 45 to 90 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 477 

- gi No. 3047079 

- Description: (AF058914) Arabidopsis thaliana transcription factor 
ATMYB4 (GB:X95297) [Arabidopsis thaliana] 

- % Identity: 84.7 

- Alignment Length: 85 

- Location of Alignment in SEQ ID NO 94 6: from 23 to 107 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 947 

- Ceres seq__id 1595762 

- Location of start within SEQ ID NO 945: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Myb-like DNA-binding domain 

- Location within SEQ ID NO 947: from 35 to 80 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 47 8 

- gi No. 3047079 

- Description: (AF058914) Arabidopsis thaliana transcription factor 
ATMYB4 (GB:X95297) [Arabidopsis thaliana] 

- % Identity: 84.7 

- Alignment Length: 8 5 

- Location of Alignment in SEQ ID NO 947: from 13 to 97 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 948 

- Ceres seq_id 1595763 

- Location of start within SEQ ID NO 945: at 124 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s } 

- Myb-like DNA-binding domain 

- Location within SEQ ID NO 948: from 33 to 78 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 47 9 

- gi No. 3047079 

- Description: (AF058914) Arabidopsis thaliana transcription factor 
ATMYB4 (GB:X95297) [Arabidopsis thaliana] 

- % Identity: 84.7 

- Alignment Length: 85 

- Location of Alignment in SEQ ID NO 948: from 11 to 95 

Maximum Length Sequence: 

related to: 
Clone IDs: 

252033 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 94 9 

- Ceres seq_id 1595764 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 950 

- Ceres seq_id 15957 65 

- Location of start within SEQ ID NO 949: at 131 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- ubiE/COQ5 methyltrans f erase family 

- Location within SEQ ID NO 950: from 34 to 160 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 951 

- Ceres seq_id 15957 66 

- Location of start within SEQ ID NO 949: at 305 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- ubiE/COQ5 methyltransferase family 

- Location within SEQ ID NO 951: from 1 to 102 aa. 

(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 952 
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- Ceres seq_id 1595767 

- Location of start within SEQ ID NO 94 9: at 317 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ubiE/COQ5 methyltransf erase family 

- Location within SEQ ID NO 952: from 1 to 98 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

250382 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 953 

- Ceres seq_id 1595783 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 954 

- Ceres seq_id 1595784 

- Location of start within SEQ ID NO 953: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Thioester dehydrase 

- Location within SEQ ID NO 954: from 83 to 213 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 955 

- Ceres seq_id 1595785 

- Location of start within SEQ ID NO 953: at 21 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Thioester dehydrase 

- Location within SEQ ID NO 955: from 77 to 207 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 956 

- Ceres seq__id 1595786 

- Location of start within SEQ ID NO 953: at 243 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Thioester dehydrase 

- Location within SEQ ID NO 956: from 3 to 133 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

253398 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 957 

- Ceres seq_id 1595787 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 958 

- Ceres seq_id 1595788 

- Location of start within SEQ ID NO 957: at 121 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 4 80 

- gi No. 2245107 

- Description: (Z97343) thioesterase like protein [Arabidopsis 



thaliana] 

- % Identity: 92.6 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 958: from 62 to 140 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 959 

- Ceres seq_id 1595789 

- Location of start within SEQ ID NO 957: at 756 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

{Dp) Related Amino Acid Sequences 



- Alignment No. 481 

- gi No. 2245107 

- Description: (Z97343) thioesterase like protein [Arabidopsis 



Maximum Length Sequence: 

related to: 
Clone IDs: 

254137 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 960 

- Ceres seq_id 1595790 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 961 

- Ceres seq_id 1595791 

- Location of start within SEQ ID NO 960: at 517 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Transthyretin precursor (formerly prealbumin) 

- Location within SEQ ID NO 961: from 67 to 177 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 962 

- Ceres seq_id 1595792 

- Location of start within SEQ ID NO 960: at 526 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Transthyretin precursor (formerly prealbumin) 

- Location within SEQ ID NO 962: from 64 to 174 aa . 



thaliana] 



- % Identity: 98.9 

- Alignment Length: 92 

- Location of Alignment in SEQ ID NO 959: from 2 to 93 



(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 963 

- Ceres seq_id 1595793 

- Location of start within SEQ ID NO 960: at 547 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Transthyretin precursor (formerly prealbumin) 

- Location within SEQ ID NO 963: from 57 to 167 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

255203 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 964 

- Ceres seq_id 1595794 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 965 

- Ceres seq_id 15957 95 

- Location of start within SEQ ID NO 964: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S19e 

- Location within SEQ ID NO 965: from 8 to 142 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 482 

- gi No. 730456 

- Description: 40S RIBOSOMAL PROTEIN S19 

- % Identity: 73.9 

- Alignment Length: 138 

- Location of Alignment in SEQ ID NO 965: from 5 to 142 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 966 

- Ceres seq_id 15957 96 

- Location of start within SEQ ID NO 964: at 11 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S19e 

- Location within SEQ ID NO 966: from 5 to 139 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 83 

- gi No. 730456 

- Description: 40S RIBOSOMAL PROTEIN S19 

- % Identity: 73.9 

- Alignment Length: 138 

- Location of Alignment in SEQ ID NO 966: from 2 to 139 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 967 

- Ceres seq_id 1595797 

- Location of start within SEQ ID NO 964: at 191 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S19e 

- Location within SEQ ID NO 967 : from 1 to 79 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 484 

- gi No. 730456 

- Description: 40S RIBOSOMAL PROTEIN S19 

- % Identity: 73.9 

- Alignment Length: 138 

- Location of Alignment in SEQ ID NO 967: from 1 to 79 

Maximum Length Sequence: 

related to: 
Clone IDs: 

103357 

256431 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 9 68 

- Ceres seq_id 1595802 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 969 

- Ceres seq_id 1595803 

- Location of start within SEQ ID NO 968: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 85 

- gi No. 2134207 

- Description: protamine II-l - painted turtle 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 969: from 93 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 970 

- Ceres seq_id 1595804 

- Location of start within SEQ ID NO 968: at 66 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

256581 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 971 

- Ceres seq_id 1595805 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 972 

- Ceres seq_id 1595806 

- Location of start within SEQ ID NO 971: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Peptidase family M3 

- Location within SEQ ID NO 972: from 97 to 177 aa. 
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(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 973 

- Ceres seq_id 1595807 

- Location of start within SEQ ID NO 971: at 85 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Peptidase family M3 

- Location within SEQ ID NO 973: from 69 to 149 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

267129 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 97 4 

- Ceres seq__id 1595812 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 975 

- Ceres seq_id 1595813 

- Location of start within SEQ ID NO 974: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- WD domain, G-beta repeat 

- Location within SEQ ID NO 975: from 99 to 147 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 48 6 

- gi No. 5668798 

- Description: (AC007519) Contains 6 PF I 00400 WD40 G-beta repeat 
domains. [Arabidopsis thaliana] 

- % Identity: 97.4 

- Alignment Length: 152 

- Location of Alignment in SEQ ID NO 975: from 1 to 152 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 97 6 

- Ceres seq_id 1595814 

- Location of start within SEQ ID NO 974: at 104 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- WD domain, G-beta repeat 

- Location within SEQ ID NO 97 6: from 65 to 113 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 48 7 

- gi No. 5668798 

- Description: (AC007519) Contains 6 PF | 00400 WD40 G-beta repeat 
domains. [Arabidopsis thaliana] 

- % Identity: 97.4 

- Alignment Length: 152 

- Location of Alignment in SEQ ID NO 97 6: from 1 to 118 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 977 

- Ceres seq_id 1595815 
» Location of start within SEQ ID NO 974: at 152 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- WD domain, G-beta repeat 

- Location within SEQ ID NO 977: from 4 9 to 97 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 488 

- gi No* 5668798 

- Description: (AC007519) Contains 6 PF| 00400 WD40 G-beta repeat 
domains. [Arabidopsis thaliana] 

- % Identity: 97.4 

- Alignment Length: 152 

- Location of Alignment in SEQ ID NO 977: from 1 to 102 

Maximum Length Sequence: 

related to: 
Clone IDs: 

268013 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 978 

- Ceres seq_id 1595820 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 97 9 

- Ceres seq_id 1595821 

- Location of start within SEQ ID NO 978: at 259 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 489 

- gi No. 4914429 

- Description: (AL050351) SEC14-like protein [Arabidopsis thaliana] 

- % Identity: 90.6 

- Alignment Length: 64 

- Location of Alignment in SEQ ID NO 97 9: from 1 to 64 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 980 

- Ceres seq_id 1595822 

- Location of start within SEQ ID NO 978: at 404 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 90 

- gi No. 4914429 

- Description: (AL050351) SEC14-like protein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 66 

- Location of Alignment in SEQ ID NO 98 0: from 9 to 7 3 

Maximum Length Sequence: 

related to: 
Clone IDs: 

263500 

(Ac) cDNA Polynucleotide Sequence 
- Pat. Appln. SEQ ID NO 981 
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- Ceres seq_id 1595829 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 982 

- Ceres seq_id 1595830 

- Location of start within SEQ ID NO 981: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 91 

- gi No. 2245108 

- Description: (Z97343) EREBP-4 like protein [Arabidopsis thaliana] 

- % Identity: 75.9 

- Alignment Length: 2 9 

- Location of Alignment in SEQ ID NO 982: from 22 to 50 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 983 

- Ceres seq__id 1595831 

- Location of start within SEQ ID NO 981: at 11 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 92 

- gi No. 2245108 

- Description: (Z97343) EREBP-4 like protein [Arabidopsis thaliana] 

- % Identity: 75.9 

- Alignment Length: 2 9 

- Location of Alignment in SEQ ID NO 983: from 19 to 47 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 984 

- Ceres seq_id 1595832 

- Location of start within SEQ ID NO 981: at 300 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

254758 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 985 

- Ceres seq_id 1595837 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 98 6 

- Ceres seq_id 1595838 

- Location of start within SEQ ID NO 985: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- TBC domain 

- Location within SEQ ID NO 986: from 156 to 275 aa. 
(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 987 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 212 



- Ceres seq_id 1595839 

- Location of start within SEQ ID NO 985: at 51 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- TBC domain 

- Location within SEQ ID NO 987: from 140 to 259 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 988 

- Ceres seq__id 1595840 

- Location of start within SEQ ID NO 985: at 408 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- TBC domain 

- Location within SEQ ID NO 988: from 21 to 140 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

256380 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 98 9 

- Ceres seq_id 1595841 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 990 

- Ceres seq__id 1595842 

- Location of start within SEQ ID NO 98 9: at 115 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 93 

- gi No. 2244825 

- Description: (Z97336) light induced protein like [Arabidopsis 

thaliana] 

- % Identity: 77.4 

- Alignment Length: 93 

- Location of Alignment in SEQ ID NO 990: from 1 to 8 9 

Maximum Length Sequence: 

related to: 
Clone IDs: 

260488 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 991 

- Ceres seq__id 1595847 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 992 

- Ceres seq_id 1595848 

- Location of start within SEQ ID NO 991: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- 60s Acidic ribosomal protein 

- Location within SEQ ID NO 992: from 95 to 187 aa. 
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(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

266553 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 9 93 

- Ceres seq_id 1595853 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 994 

- Ceres seq_id 1595854 

- Location of start within SEQ ID NO 993: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- NAD dependent epimerase/dehydratase family 

- Location within SEQ ID NO 994: from 81 to 326 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 94 

- gi No. 3047109 

- Description: (AF058919) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 88.5 

- Alignment Length: 23 6 

- Location of Alignment in SEQ ID NO 994: from 24 to 24 4 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 995 

- Ceres seq_id 1595855 

- Location of start within SEQ ID NO 993: at 333 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- NAD dependent epimerase/dehydratase family 

- Location within SEQ ID NO 995: from 1 to 216 aa. 

Related Amino Acid Sequences 
Alignment No. 4 95 
gi No. 3047109 

Description: (AF058919) No definition line found [Arabidopsis 

% Identity: 88.5 
Alignment Length: 236 

Location of Alignment in SEQ ID NO 995: from 1 to 134 

Maximum Length Sequence: 

related to: 
Clone IDs: 

268851 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 996 

- Ceres seq_id 1595856 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 9 97 

- Ceres seq_id 1595857 

- Location of start within SEQ ID NO 996: at 2 nt . 



(Dp) 



thaliana] 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 96 

- gi No. 1351978 

- Description: 3-PHOSPHOSHIKIMATE 1-CARBOXYVINYLTRANSFERASE 
PRECURSOR (3-ENOLPYRUVYLSHIKIMATE-5-PHOSPHATE SYNTHASE) (EPSP SYNTHASE } 
>gi i 295790 | emb 1 CAA29828 i (X06613) EPSP [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 53 

- Location of Alignment in SEQ ID NO 997: from 60 to 112 

{ B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 998 

- Ceres seq_id 1595858 

- Location of start within SEQ ID NO 996: at 179 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 97 

- gi No. 1351978 

- Description: 3-PHOSPHOSHIKIMATE 1-CARBOXYVINYLTRANSFERASE 
PRECURSOR ( 3-ENOLPYRUVYLSHIKIMATE- 5 -PHOSPHATE SYNTHASE) (EPSP SYNTHASE) 
>gi] 295790 i emb iCAA29828 i (X06613) EPSP [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 53 

- Location of Alignment in SEQ ID NO 998: from 1 to 53 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 999 

- Ceres seq_id 1595859 

- Location of start within SEQ ID NO 996: at 385 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EPSP synthase ( 3-phosphoshikimate 1-carboxyvinyltransf erase) 

- Location within SEQ ID NO 999: from 15 to 116 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 98 

- gi No. 1351978 

- Description: 3-PHOSPHOSHIKIMATE 1-CARBOXYVINYLTRANSFERASE 
PRECURSOR ( 3-ENOLPYRUVYLSHIKIMATE- 5 -PHOSPHATE SYNTHASE) (EPSP SYNTHASE) 
>gi|295790|emb|CAA29828 1 (X06613) EPSP [Arabidopsis thaliana] 

- % Identity: 95.5 

- Alignment Length: 134 

- Location of Alignment in SEQ ID NO 999: from 1 to 116 
Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1000 

- Ceres seq_id 1595906 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1001 

- Ceres seq_id 1595907 

- Location of start within SEQ ID NO 1000: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 
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- Pectate lyase 

- Location within SEQ ID NO 1001: from 1 to 258 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 4 99 

- gi No. 2435395 

- Description: (U63550) pectate lyase [Fragaria x ananassa] 

- % Identity: 73.5 

- Alignment Length: 223 

- Location of Alignment in SEQ ID NO 1001: from 1 to 223 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1002 

- Ceres seq_id 1595908 

- Location of start within SEQ ID NO 1000: at 284 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Pectate lyase 

- Location within SEQ ID NO 1002: from 1 to 164 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 500 

- gi No. 2435395 

- Description: (U63550) pectate lyase [Fragaria x ananassa] 

- % Identity: 73.5 

- Alignment Length: 223 

- Location of Alignment in SEQ ID NO 1002: from 1 to 129 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1003 

- Ceres seq_id 1595909 

- Location of start within SEQ ID NO 1000: at 341 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Pectate lyase 

- Location within SEQ ID NO 1003: from 1 to 145 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 01 

- gi No. 2435395 

- Description: (U63550) pectate lyase [Fragaria x ananassa] 

- % Identity: 73.5 

- Alignment Length: 223 

- Location of Alignment in SEQ ID NO 1003: from 1 to 110 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1004 

- Ceres seq_id 1595931 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1005 

- Ceres seq_id 1595932 

- Location of start within SEQ ID NO 1004: at 1 nt . 

<C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 02 
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- gi No. 4325374 

- Description: (AF128396) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 79.3 

- Alignment Length: 87 

- Location of Alignment in SEQ ID NO 1005: from 68 to 154 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1006 

- Ceres seq_id 1595934 

- Location of start within SEQ ID NO 1004: at 34 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 503 

- gi No. 4325374 

- Description: (AF128396) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 79.3 

- Alignment Length: 87 

- Location of Alignment in SEQ ID NO 1006: from 57 to 143 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1007 

- Ceres seq_id 1595943 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1008 

- Ceres seq__id 1595944 

- Location of start within SEQ ID NO 1007: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 504 

- gi No. 2252842 

- Description: (AF013293) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 86.8 

- Alignment Length: 190 

- Location of Alignment in SEQ ID NO 1008: from 1 to 190 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1009 

- Ceres seq_id 1595946 

- Location of start within SEQ ID NO 1007: at 103 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 505 

- gi No. 2252842 

- Description: (AF013293) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 86.8 

- Alignment Length: 190 

- Location of Alignment in SEQ ID NO 1009: from 1 to 156 
Maximum Length Sequence: 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1010 

- Ceres seq_id 1595992 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1011 

- Ceres seq_id 1595993 

- Location of start within SEQ ID NO 1010: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 1011: from 1 to 132 aa. 

{Dp) Related Amino Acid Sequences 

- Alignment No. 50 6 

- gi No. 4456682 

- Description: (AJ224336) MAP kinase [Medicago sativa] 

- % Identity: 82.6 

- Alignment Length: 132 

- Location of Alignment in SEQ ID NO 1011: from 1 to 132 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1012 

- Ceres seq_id 1595994 

- Location of start within SEQ ID NO 1010: at 37 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 1012: from 1 to 120 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 507 

- gi No. 4456682 

- Description: (AJ224336) MAP kinase [Medicago sativa] 

- % Identity: 82.6 

- Alignment Length: 132 

- Location of Alignment in SEQ ID NO 1012: from 1 to 120 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1013 

- Ceres seq_id 1595995 

- Location of start within SEQ ID NO 1010: at 142 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 1013: from 1 to 85 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 508 

- gi No. 4456682 

- Description: (AJ224336) MAP kinase [Medicago sativa] 

- % Identity: 82.6 

- Alignment Length: 132 

- Location of Alignment in SEQ ID NO 1013: from 1 to 85 
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(Ac) cDNA Polynucleotide Sequence 
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- Pat. Appln. SEQ ID NO 1014 

- Ceres seq_id 1596012 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1015 

- Ceres seq_id 1596013 

- Location of start within SEQ ID NO 1014: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 509 

- gi No. 2832674 

- Description; (AL021712) fibrillin precursor-like protein 
[Arabidopsis thaliana] 

- % Identity: 97.2 

- Alignment Length: 17 6 

- Location of Alignment in SEQ ID NO 1015: from 1 to 17 6 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1016 

- Ceres seq_id 1596022 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1017 

- Ceres seq_id 1596023 

- Location of start within SEQ ID NO 1016: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Reverse transcriptase (RNA-dependent DNA polymerase) 

- Location within SEQ ID NO 1017: from 105 to 287 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 510 

- gi No. 4115365 

- Description: (AC005957) reverse transcriptase-like protein 
[Arabidopsis thaliana] 

- % Identity: 70.2 

- Alignment Length: 151 

- Location of Alignment in SEQ ID NO 1017: from 63 to 213 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1018 

- Ceres seq_id 1596024 

- Location of start within SEQ ID NO 1016: at 518 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide {s ) 

- Reverse transcriptase (RNA-dependent DNA polymerase) 

- Location within SEQ ID NO 1018: from 1 to 115 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 511 

- gi No. 4115365 

- Description: (AC005957) reverse transcriptase-like protein 
[Arabidopsis thaliana] 

- % Identity: 70.2 

- Alignment Length: 151 

- Location of Alignment in SEQ ID NO 1018: from 1 to 41 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1019 

- Ceres seq_id 1596025 

- Location of start within SEQ ID NO 1016: at 548 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Reverse transcriptase (RNA-dependent DNA polymerase) 

- Location within SEQ ID NO 1019: from 1 to 105 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 512 

- gi No. 4115365 

- Description: (AC005957) reverse transcriptase-like protein 
[Arabidopsis thaliana] 

- % Identity: 70.2 

- Alignment Length: 151 

- Location of Alignment in SEQ ID NO 1019: from 1 to 31 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1020 

- Ceres seq_id 1596038 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1021 

- Ceres seq_id 1596039 

- Location of start within SEQ ID NO 1020: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Lipase/Acylhydrolase with GDSL-like motif 

- Location within SEQ ID NO 1021: from 41 to 145 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 513 

- gi No. 2347208 

- Description: (AC002338) APG protein isolog [Arabidopsis thaliana] 

- % Identity: 90.3 

- Alignment Length: 31 

- Location of Alignment in SEQ ID NO 1021: from 25 to 55 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1022 

- Ceres seq__id 1596040 

- Location of start within SEQ ID NO 1020: at 310 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1023 

- Ceres seq_id 1596041 

- Location of start within SEQ ID NO 1020: at 364 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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{Ac} cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1024 

- Ceres seq_id 1596062 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1025 

- Ceres seq__id 1596063 

- Location of start within SEQ ID NO 1024: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- NB-ARC domain 

- Location within SEQ ID NO 1025: from 60 to 343 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 514 

- gi No. 5302806 

- Description: (Z97342) disease resistance RPP5 like protein 
[Arabidopsis thaliana] 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 1025: from 485 to 4 98 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1026 

- Ceres seq_id 1596065 

- Location of start within SEQ ID NO 1024: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- NB-ARC domain 

- Location within SEQ ID NO 1026: from 21 to 304 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 515 

- gi No. 5302806 

- Description: (Z97342) disease resistance RPP5 like protein 
[Arabidopsis thaliana] 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 1026: from 446 to 459 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1027 

- Ceres seq_id 1596069 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1028 

- Ceres seq_id 1596070 

- Location of start within SEQ ID NO 1027: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 516 

- gi No. 2213582 

- Description: (AC000348) T7N9.2 [Arabidopsis thaliana] 

- % Identity: 7 9.3 

- Alignment Length: 8 2 

- Location of Alignment in SEQ ID NO 1028: from 55 to 136 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1029 

- Ceres seq_id 1596071 

- Location of start within SEQ ID NO 1027: at 52 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 517 

- gi No. 2213582 

- Description: (AC000348) T7N9.2 [Arabidopsis thaliana] 

- % Identity: 79.3 

- Alignment Length: 82 

- Location of Alignment in SEQ ID NO 1029: from 38 to 119 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1030 

- Ceres seq_id 1596072 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1031 

- Ceres seq__id 1596073 

- Location of start within SEQ ID NO 1030: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 518 

- gi No. 3169719 

- Description: (AF007109) similar to yeast dcpl [Arabidopsis 

thaliana] 

- % Identity: 98.6 

- Alignment Length: 355 

- Location of Alignment in SEQ ID NO 1031: from 1 to 355 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1032 

- Ceres seq_id 1596075 

- Location of start within SEQ ID NO 1030: at 232 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 519 

- gi No. 3169719 

- Description: (AF007109) similar to yeast dcpl [Arabidopsis 

thaliana] 

- % Identity: 98.6 

- Alignment Length: 355 

- Location of Alignment in SEQ ID NO 1032: from 1 to 278 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1033 

- Ceres seq__id 1596076 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1034 

- Ceres seq_id 1596077 
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- Location of start within SEQ ID NO 1033: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- LIP family of ribosomal proteins 

- Location within SEQ ID NO 1034: from 7 to 207 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 52 0 

- gi No. 1709970 

- Description: 60S RIBOSOMAL PROTEIN L10A 

- % Identity: 94.4 

- Alignment Length: 12 6 

- Location of Alignment in SEQ ID NO 1034: from 1 to 126 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1035 

- Ceres seq_id 1596079 

- Location of start within SEQ ID NO 1033: at 184 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s } 

- LIP family of ribosomal proteins 

- Location within SEQ ID NO 1035: from 1 to 146 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 521 

- gi No. 1709970 

- Description: 60S RIBOSOMAL PROTEIN L10A 

- % Identity: 94.4 

- Alignment Length: 12 6 

- Location of Alignment in SEQ ID NO 1035: from 1 to 65 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1036 

- Ceres seq_id 1596087 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1037 

- Ceres seq^id 1596088 

- Location of start within SEQ ID NO 1036: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic DNA topoisomerase I 

- Location within SEQ ID NO 1037: from 322 to 561 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 522 

- gi No. 3695391 

- Description: (AF096371) No definition line found [Arabidopsis 



- % Identity: 91.4 

- Alignment Length: 232 

- Location of Alignment in SEQ ID NO 1037: from 1 to 217 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1038 

- Ceres seq_id 1596090 

- Location of start within SEQ ID NO 1036: at 52 nt . 



thaliana] 
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(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Eukaryotic DNA topoisomerase I 

- Location within SEQ ID NO 1038: from 305 to 544 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 523 

- gi No. 3695391 

- Description: (AF096371) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 91.4 

- Alignment Length: 232 

- Location of Alignment in SEQ ID NO 1038: from 1 to 200 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1039 

- Ceres seq_id 1596095 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1040 

- Ceres seq_id 1596096 

- Location of start within SEQ ID NO 1039: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Eukaryotic DNA topoisomerase I 

- Location within SEQ ID NO 1040: from 295 to 534 aa. 

Related Amino Acid Sequences 
Alignment No. 52 4 
gi No. 3695391 

Description: (AF096371) No definition line found [Arabidopsis 

% Identity: 96.5 
Alignment Length: 198 

Location of Alignment in SEQ ID NO 1040: from 553 to 750 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1041 

- Ceres seq__id 1596098 

- Location of start within SEQ ID NO 1039: at 52 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic DNA topoisomerase I 

- Location within SEQ ID NO 1041: from 278 to 517 aa. 

Related Amino Acid Sequences 
Alignment No. 525 
gi No. 3695391 

Description: (AF096371) No definition line found [Arabidopsis 

% Identity: 96.5 
Alignment Length: 198 

Location of Alignment in SEQ ID NO 1041: from 536 to 733 
Maximum Length Sequence: 



(Dp) 



thaliana] 



(Dp) 
thaliana] 



(Ac) cDNA Polynucleotide Sequence 
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- Pat. Appln. SEQ ID NO 1042 

- Ceres seq_id 1596099 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1043 

- Ceres seq_id 1596100 

- Location of start within SEQ ID NO 1042: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DEAD / DEAH box helicase 

- Location within SEQ ID NO 1043: from 103 to 312 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1044 

- Ceres seq__id 1596102 

- Location of start within SEQ ID NO 1042: at 205 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DEAD /DEAH box helicase 

- Location within SEQ ID NO 1044: from 35 to 244 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1045 

- Ceres seq_id 1596137 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1046 

- Ceres seq_id 1596138 

- Location of start within SEQ ID NO 1045: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Retroviral aspartyl proteases 

- Location within SEQ ID NO 1046: from 307 to 390 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 52 6 

- gi No. 3047066 

- Description: (AF058825) contains similarity to retrovirus-related 
POL polyproteins [Arabidopsis thaliana] 

- % Identity: 82.1 

- Alignment Length: 301 

- Location of Alignment in SEQ ID NO 1046: from 1 to 285 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1047 

- Ceres seq__id 1596140 

- Location of start within SEQ ID NO 1045: at 97 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Retroviral aspartyl proteases 

- Location within SEQ ID NO 1047: from 275 to 358 aa. 



(Dp) Related Amino Acid Sequences 
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- Alignment No. 527 

- gi No. 3047066 

- Description: (AF058825) contains similarity to retrovirus-related 
POL polyproteins [Arabidopsis thaliana] 

- % Identity: 82.1 

- Alignment Length: 301 

- Location of Alignment in SEQ ID NO 1047: from 1 to 253 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1048 

- Ceres seq__id 1596141 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 104 9 

- Ceres seq_id 1596142 

- Location of start within SEQ ID NO 1048: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Reverse transcriptase (RNA-dependent DNA polymerase) 

- Location within SEQ ID NO 1049: from 56 to 165 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 528 

- gi No. 1419123 

- Description: (Z75497) reverse transcriptase [Oryza sativa] 

- % Identity: 70.8 

- Alignment Length: 24 

- Location of Alignment in SEQ ID NO 1049: from 134 to 157 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1050 

- Ceres seq_id 1596144 

- Location of start within SEQ ID NO 1048: at 523 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1051 

- Ceres seq_id 1596153 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1052 

- Ceres seq_id 1596154 

- Location of start within SEQ ID NO 1051: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Eukaryotic DNA topoisomerase I 

- Location within SEQ ID NO 1052: from 246 to 482 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1053 

- Ceres seq_id 1596155 

- Location of start within SEQ ID NO 1051: at 285 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Eukaryotic DNA topoisomerase I 

- Location within SEQ ID NO 1053: from 152 to 388 aa. 
{Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1054 

- Ceres seq_id 1596156 

- Location of start within SEQ ID NO 1051: at 378 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic DNA topoisomerase I 

- Location within SEQ ID NO 1054: from 121 to 357 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1055 

- Ceres seq_id 1596158 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1056 

- Ceres seq_id 1596159 

- Location of start within SEQ ID NO 1055: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Papain family cysteine protease 

- Location within SEQ ID NO 1056: from 138 to 353 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 529 

- gi No. 4836904 

- Description: (AC007369) lcliprt_seq No definition line found 
[Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 35 6 

- Location of Alignment in SEQ ID NO 1056: from 1 to 356 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1057 

- Ceres seq_id 1596161 

- Location of start within SEQ ID NO 1055: at 328 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Papain family cysteine protease 

- Location within SEQ ID NO 1057: from 29 to 244 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 530 

- gi No. 4836904 

- Description: (AC007369) lcl|prt_seq No definition line found 
[Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 35 6 
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- Location of Alignment in SEQ ID NO 1057: from 1 to 247 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1058 

- Ceres seq_id 1596204 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1059 

- Ceres seq_id 1596205 

- Location of start within SEQ ID NO 1058: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cytochrome P4 50 

- Location within SEQ ID NO 1059: from 1 to 79 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 531 

- gi No. 5280993 

- Description: (Z97338) cytochrome P450 like protein [Arabidopsrs 

thaliana] 

- % Identity: 100 

- Alignment Length: 7 9 

- Location of Alignment in SEQ ID NO 1059: from 1 to 7 9 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1060 

- Ceres seq_id 1596206 

- Location of start within SEQ ID NO 1058: at 52 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cytochrome P4 50 

- Location within SEQ ID NO 1060: from 1 to 62 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 532 

- gi No. 5280993 

- Description: (Z97338) cytochrome P450 like protein [Arabidopsrs 

thaliana] 

- % Identity: 100 

- Alignment Length: 7 9 

- Location of Alignment in SEQ ID NO 1060: from 1 to 62 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1061 

- Ceres seq_id 1596252 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 10 62 

- Ceres seq_id 1596253 

- Location of start within SEQ ID NO 1061: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 33 

- gi No. 3377846 
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- Description: (AF076274) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 80 

- Alignment Length: 12 0 

- Location of Alignment in SEQ ID NO 1062: from 37 to 146 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1063 

- Ceres seq_id 1596254 

- Location of start within SEQ ID NO 1061: at 17 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 534 

- gi No. 3377846 

- Description: {AF07 627 4) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 80 

- Alignment Length: 120 

- Location of Alignment in SEQ ID NO 1063: from 32 to 141 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1064 

- Ceres seq_id 1596255 

- Location of start within SEQ ID NO 1061: at 164 nt - 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 535 

- gi No. 3377846 

- Description: (AF07 6274) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 80 

- Alignment Length: 12 0 

- Location of Alignment in SEQ ID NO 1064: from 1 to 92 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 10 65 

- Ceres seq_id 1596264 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1066 

- Ceres seq_id 1596265 

- Location of start within SEQ ID NO 1065: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 53 6 

- gi No. 3377846 

- Description: (AF076274) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 74.2 

- Alignment Length: 31 

- Location of Alignment in SEQ ID NO 1066: from 12 to 42 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 10 67 
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- Ceres seq_id 1596266 

- Location of start within SEQ ID NO 1065: at 91 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 537 

- gi No. 3377846 

- Description: (AF076274) No definition line found [Arabidopsis 

thaliana] 

- % Identity: 74.2 

- Alignment Length: 31 

- Location of Alignment in SEQ ID NO 1067: from 1 to 12 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1068 

- Ceres seq_id 1596267 

- Location of start within SEQ ID NO 1065: at 112 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1069 

- Ceres seq_id 1596280 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1070 

- Ceres seq_id 1596281 

- Location of start within SEQ ID NO 1069: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s ) 

- Reverse transcriptase (RNA-dependent DNA polymerase) 

- Location within SEQ ID NO 1070: from 2 to 56 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 538 

- gi No. 5724766 

- Description: (AF160181) contains similarity to retroviral 
intergrases; may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 72.5 

- Alignment Length: 200 

- Location of Alignment in SEQ ID NO 1070: from 274 to 471 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1071 

- Ceres seq_id 1596283 

- Location of start within SEQ ID NO 1069: at 16 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Reverse transcriptase (RNA-dependent DNA polymerase) 

- Location within SEQ ID NO 1071: from 1 to 51 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 5 39 

- gi No. 5724766 
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- Description: (AF160181) contains similarity to retroviral 
intergrases; may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 72.5 

- Alignment Length: 20 0 

- Location of Alignment in SEQ ID NO 1071: from 269 to 466 



Maximum Length Sequence: 



(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1072 

- Ceres seq_id 1596332 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1073 

- Ceres seq_id 1596333 

- Location of start within SEQ ID NO 1072: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Fibronectin type III domain 

- Location within SEQ ID NO 1073: from 20 to 106 aa. 



{Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1074 

- Ceres seq_id 1596335 

- Location of start within SEQ ID NO 1072: at 95 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1075 

- Ceres seq_id 1596336 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 107 6 

- Ceres seq_id 1596337 

- Location of start within SEQ ID NO 1075: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 54 0 

- gi No. 1084367 

- Description: C9 protein - kidney bean 

- % Identity: 78.1 

- Alignment Length: 32 

- Location of Alignment in SEQ ID NO 1076: from 106 to 137 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1077 

- Ceres seq_id 1596338 

- Location of start within SEQ ID NO 1075: at 40 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
- Alignment No. 541 
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- gi No. 1084367 

- Description: C9 protein - kidney bean 

- % Identity: 78.1 

- Alignment Length: 32 

- Location of Alignment in SEQ ID NO 1077: from 93 to 124 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1078 

- Ceres seq_id 1596339 

- Location of start within SEQ ID NO 1075: at 49 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 542 

- gi No. 1084367 

- Description: C9 protein - kidney bean 

- % Identity: 78.1 

- Alignment Length: 32 

- Location of Alignment in SEQ ID NO 1078: from 90 to 121 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 107 9 

- Ceres seq_id 1596343 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1080 

- Ceres seq_id 1596344 

- Location of start within SEQ ID NO 1079: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Uncharacterized protein family UPF0025 

- Location within SEQ ID NO 1080: from 2 to 75 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1081 

- Ceres seq_id 1596346 

- Location of start within SEQ ID NO 107 9: at 37 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Uncharacterized protein family UPF0025 

- Location within SEQ ID NO 1081: from 1 to 63 aa . 

(Dp) Related Amino Acid Sequences 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1082 

- Ceres seq_id 1596350 
( B ) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1083 

- Ceres seq_id 1596351 

- Location of start within SEQ ID NO 1082: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 543 

- gi No. 3123327 

- Description: (AJ005927) squalene epoxidase homologue [Arabidopsis 

thaliana] 

- % Identity: 79.6 

- Alignment Length: 113 

- Location of Alignment in SEQ ID NO 1083: from 1 to 113 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1084 

- Ceres seq_id 1596368 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1085 

- Ceres seq_id 1596369 

- Location of start within SEQ ID NO 1084: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 544 

- gi No. 5732030 

- Description: (AF147260) contains similarity to Drosophila 
suppressor of sable protein (GB:M57889) [Arabidopsis thaliana] 

- % Identity: 72.3 

- Alignment Length: 47 

- Location of Alignment in SEQ ID NO 1085: from 1 to 47 

<B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1086 

- Ceres seq_id 1596371 

- Location of start within SEQ ID NO 1084: at 169 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1087 

- Ceres seq_id 1596407 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1088 

- Ceres seq_id 1596408 

- Location of start within SEQ ID NO 1087: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Protein phosphatase 2C 

- Location within SEQ ID NO 1088: from 13 to 183 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1089 

- Ceres seq_id 1596410 
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- Location of start within SEQ ID NO 1087: at 31 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Protein phosphatase 2C 

- Location within SEQ ID NO 1089: from 3 to 173 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1090 

- Ceres seq_id 1596461 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1091 

- Ceres seq_id 1596462 

- Location of start within SEQ ID NO 1090: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 545 

- gi No. 5080810 

- Description: (AC007258) Very similar to helicases [Arabidopsis 

thaliana] 

- % Identity: 89.3 

- Alignment Length: 2 8 

- Location of Alignment in SEQ ID NO 1091: from 379 to 405 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1092 

- Ceres seq_id 1596463 

- Location of start within SEQ ID NO 1090: at 58 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 54 6 

- gi No. 5080810 

- Description: (AC007258) Very similar to helicases [Arabidopsis 

thaliana] 

- % Identity: 89.3 

- Alignment Length: 28 

- Location of Alignment in SEQ ID NO 1092: from 360 to 38 6 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1093 

- Ceres seq_id 1596464 

- Location of start within SEQ ID NO 1090: at 124 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 547 

- gi No. 5080810 

- Description: (AC007258) Very similar to helicases [Arabidopsis 

thaliana] 

- % Identity: 89.3 

- Alignment Length: 28 

- Location of Alignment in SEQ ID NO 1093: from 338 to 364 
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Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1094 

- Ceres seq_id 1596522 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1095 

- Ceres seq_id 1596523 

- Location of start within SEQ ID NO 1094: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 54 8 

- gi No. 3377838 

- Description: (AF075598) No definition line found [Arabidopsi 

thaliana] 

- % Identity: 73.1 

- Alignment Length: 52 

- Location of Alignment in SEQ ID NO 1095: from 251 to 301 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1096 

- Ceres seq__id 1596532 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1097 

- Ceres seq_id 1596533 

- Location of start within SEQ ID NO 10 96: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- ADP-ribosylation factor family 

- Location within SEQ ID NO 1097: from 19 to 139 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1098 

- Ceres seq_id 1596534 

- Location of start within SEQ ID NO 1096: at 187 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- ADP-ribosylation factor family 

- Location within SEQ ID NO 1098: from 1 to 77 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1099 

- Ceres seq_id 1596611 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1100 

- Ceres seq_id 1596612 

- Location of start within SEQ ID NO 1099: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Reverse transcriptase (RNA- dependent DNA polymerase) 

- Location within SEQ ID NO 1100: from 91 to 194 aa . 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1101 

- Ceres seq_id 1596614 

- Location of start within SEQ ID NO 1099: at 67 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Reverse transcriptase (RNA-dependent DNA polymerase) 

- Location within SEQ ID NO 1101: from 69 to 172 aa. 

(Dp) Related Amino Acid Sequences 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1102 

- Ceres seq_id 1596776 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1103 

- Ceres seq_id 1596777 

- Location of start within SEQ ID NO 1102: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ligand-gated ion channel 

- Location within SEQ ID NO 1103: from 269 to 329 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1104 

- Ceres seq_id 1596779 

- Location of start within SEQ ID NO 1102: at 55 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ligand-gated ion channel 

- Location within SEQ ID NO 1104: from 251 to 311 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1105 

- Ceres seq__id 1596784 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1106 

- Ceres seq_id 1596785 

- Location of start within SEQ ID NO 1105: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Kelch motif 
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- Location within SEQ ID NO 1106: from 102 to 148 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1107 

- Ceres seq__id 1596786 

- Location of start within SEQ ID NO 1105: at 37 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Kelch motif 

- Location within SEQ ID NO 1107: from 90 to 136 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1108 

- Ceres seq_id 1596787 

- Location of start within SEQ ID NO 1105: at 241 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Kelch motif 

- Location within SEQ ID NO 1108: from 22 to 68 aa . 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1109 

- Ceres seq_id 1596800 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1110 

- Ceres seq_id 1596801 

- Location of start within SEQ ID NO 1109: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- PH domain 

- Location within SEQ ID NO 1110: from 226 to 278 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1111 

- Ceres seq_id 1596802 

- Location of start within SEQ ID NO 1109: at 80 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- PH domain 

- Location within SEQ ID NO 1111: from 200 to 252 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1112 

- Ceres seq_id 1596803 

- Location of start within SEQ ID NO 1109: at 173 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- PH domain 

- Location within SEQ ID NO 1112: from 169 to 221 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1113 

- Ceres seq_id 1596804 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1114 

- Ceres seq_id 1596805 

- Location of start within SEQ ID NO 1113: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- PH domain 

- Location within SEQ ID NO 1114: from 240 to 292 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1115 

- Ceres seq_id 1596806 

- Location of start within SEQ ID NO 1113: at 79 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- PH domain 

- Location within SEQ ID NO 1115: from 214 to 266 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1116 

- Ceres seq_id 1596807 

- Location of start within SEQ ID NO 1113: at 172 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- PH domain 

- Location within SEQ ID NO 1116: from 183 to 235 aa. 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1117 

- Ceres seq_id 1596885 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1118 

- Ceres seq_id 1596886 

- Location of start within SEQ ID NO 1117: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- Uncharacterized protein family UPF0016 

- Location within SEQ ID NO 1118: from 1 to 91 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 54 9 

- gi No. 5734713 

- Description: (AC008075) Is a member of PF| 01169 Uncharacterized 
{transmembrane domain) protein family. [Arabidopsis thaliana] 

- % Identity: 84.8 

- Alignment Length: 152 

- Location of Alignment in SEQ ID NO 1118: from 1 to 136 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1119 

- Ceres seq_id 1596887 

- Location of start within SEQ ID NO 1117: at 8 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Uncharacterized protein family UPF0016 

- Location within SEQ ID NO 1119: from 1 to 89 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 55 0 

- gi No. 5734713 

- Description: (AC008075) Is a member of PF | 01169 Uncharacterized 
(transmembrane domain) protein family. [Arabidopsis thaliana] 

- % Identity: 84.8 

- Alignment Length: 152 

- Location of Alignment in SEQ ID NO 1119: from 1 to 134 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1120 

- Ceres seq_id 1596888 

- Location of start within SEQ ID NO 1117: at 62 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Uncharacterized protein family UPF0016 

- Location within SEQ ID NO 1120: from 1 to 71 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 551 

- gi No. 5734713 . 

- Description: (AC008075) Is a member of PF | 01169 Uncharacterized 
(transmembrane domain) protein family. [Arabidopsis thaliana] 

- % Identity: 84.8 

- Alignment Length: 152 

- Location of Alignment in SEQ ID NO 1120: from 1 to 116 



Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1121 

- Ceres seq_id 1596933 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1122 

- Ceres seq_id 1596934 

- Location of start within SEQ ID NO 1121: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 1122: from 444 to 642 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1123 

- Ceres seq_id 1596936 

- Location of start within SEQ ID NO 1121: at 361 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 1123: from 324 to 522 aa . 
(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 
- Pat. Appln. SEQ ID NO 1124 
~ Ceres seq_id 1596960 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1125 

- Ceres seq_id 1596961 

- Location of start within SEQ ID NO 1124: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 552 

- gi No. 2289011 

- Description: (AC002335) MYB transcription factor isolog 
[Arabidopsis thaliana] 

- % Identity: 95.8 

- Alignment Length: 381 

- Location of Alignment in SEQ ID NO 1125: from 1 to 381 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 112 6 

- Ceres seq_id 1596963 

- Location of start within SEQ ID NO 1124: at 301 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 553 

- gi No. 2289011 

- Description: (AC002335) MYB transcription factor isolog 
[Arabidopsis thaliana] 

- % Identity: 95.8 

- Alignment Length: 381 

- Location of Alignment in SEQ ID NO 1126: from 1 to 281 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1127 

- Ceres seq_id 1596968 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1128 

- Ceres seq__id 1596969 

- Location of start within SEQ ID NO 1127: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- retroviral pol related endonuclease 

- Location within SEQ ID NO 1128: from 5 to 82 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 55 4 

- gi No. 5724766 

- Description: (AF160181) contains similarity to retroviral 
intergrases; may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 78.5 

- Alignment Length: 107 

- Location of Alignment in SEQ ID NO 1128: from 1 to 105 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1129 

- Ceres seq_id 1596971 

- Location of start within SEQ ID NO 1127: at 79 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- retroviral pol related endonuclease 

- Location within SEQ ID NO 1129: from 1 to 56 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 555 

- gi No. 5724766 

- Description: (AF160181) contains similarity to retroviral 
intergrases; may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 78.5 

- Alignment Length: 107 

- Location of Alignment in SEQ ID NO 112 9: from 1 to 7 9 
Maximum Length Sequence; 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1130 

- Ceres seq_id 1596976 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1131 

- Ceres seq_id 1596977 

- Location of start within SEQ ID NO 1130; at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 55 6 

- gi No. 5107819 

- Description: (AF149413) contains similarity to arabinosidase 
[Arabidopsis thaliana] 

- % Identity: 92.5 

- Alignment Length: 335 

- Location of Alignment in SEQ ID NO 1131: from 333 to 662 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1132 
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- Ceres seq_id 1596979 

- Location of start within SEQ ID NO 1130: at 370 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 557 

- gi No. 5107819 

- Description: (AF149413) contains similarity to arabinosidase 
[Arabidopsis thaliana] 

- % Identity: 92.5 

- Alignment Length: 335 

- Location of Alignment in SEQ ID NO 1132: from 210 to 539 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1133 

- Ceres seq_id 1596980 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1134 

- Ceres seq_id 1596981 

- Location of start within SEQ ID NO 1133: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 558 

- gi No. 4733967 

- Description: (AC007261) envelope-like protien [Arabidopsis 

thaliana] 

- % Identity: 97.2 

- Alignment Length: 10 8 

- Location of Alignment in SEQ ID NO 1134: from 1 to 108 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1135 

- Ceres seq_id 1596983 

- Location of start within SEQ ID NO 1133: at 298 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 55 9 

- gi No. 4733967 

- Description: (AC007261) envelope-like protien [Arabxdopsrs 

thaliana] 

- % Identity: 94.6 

- Alignment Length: 92 

- Location of Alignment in SEQ ID NO 1135: from 9 to 100 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1136 

- Ceres seq_id 1596991 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1137 

- Ceres seq_id 1596992 

- Location of start within SEQ ID NO 1136: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Myb-like DNA-binding domain 

- Location within SEQ ID NO 1137: from 95 to 139 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1138 

- Ceres seq_id 1596993 

- Location of start within SEQ ID NO 1136: at 202 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Myb-like DNA-binding domain 

- Location within SEQ ID NO 1138: from 28 to 72 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1139 

- Ceres seq_id 1596994 

- Location of start within SEQ ID NO 1136: at 331 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1140 

- Ceres seq_id 1596999 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1141 

- Ceres seq_id 1597000 

- Location of start within SEQ ID NO 1140: at 119 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 60 

- gi No. 2947059 

- Description: (AC002521) similar to myb transforming protein 
[Arabidopsis thaliana] 

- % Identity: 70 

- Alignment Length: 27 0 

- Location of Alignment in SEQ ID NO 1141: from 1 to 228 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1142 

- Ceres seq_id 1597001 

- Location of start within SEQ ID NO 1140: at 152 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 561 

- gi No. 2947059 

- Description: (AC002521) similar to myb transforming protein 
[Arabidopsis thaliana] 
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- % Identity: 70 

- Alignment Length: 27 0 

- Location of Alignment in SEQ ID NO 1142: from 1 to 217 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1143 

- Ceres seq_id 1597002 

- Location of start within SEQ ID NO 1140: at 395 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 562 

- gi No. 2947059 

- Description: (AC002521) similar to myb transforming protein 
[Arabidopsis thaliana] 

- % Identity: 70 

- Alignment Length: 27 0 

- Location of Alignment in SEQ ID NO 1143: from 1 to 136 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1144 

- Ceres seq_id 1597003 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1145 

- Ceres seq__id 1597004 

- Location of start within SEQ ID NO 1144: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 63 

- gi No. 2947059 

- Description: (AC002521) similar to myb transforming protein 
[Arabidopsis thaliana] 

- % Identity: 70 

- Alignment Length: 27 0 

- Location of Alignment in SEQ ID NO 1145: from 1 to 228 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1146 

- Ceres seq_id 1597005 

- Location of start within SEQ ID NO 1144: at 151 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 64 

- gi No. 2947059 

- Description: (AC002521) similar to myb transforming protein 
[Arabidopsis thaliana] 

- % Identity: 70 

- Alignment Length: 27 0 

- Location of Alignment in SEQ ID NO 114 6: from 1 to 217 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1147 

- Ceres seq_id 1597006 

- Location of start within SEQ ID NO 1144: at 394 nt . 
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(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 65 

- gi No. 2947059 

- Description: (AC002521) similar to myb transforming protein 
[Arabidopsis thaliana] 

- % Identity: 70 

- Alignment Length: 27 0 

- Location of Alignment in SEQ ID NO 1147: from 1 to 136 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1148 

- Ceres seq_id 1597014 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1149 

- Ceres seq_id 1597015 

- Location of start within SEQ ID NO 1148: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Reverse transcriptase (RNA-dependent DNA polymerase) 

- Location within SEQ ID NO 1149: from 117 to 397 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1150 

- Ceres seq__id 1597016 

- Location of start within SEQ ID NO 1148: at 141 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Reverse transcriptase (RNA-dependent DNA polymerase) 

- Location within SEQ ID NO 1150: from 71 to 351 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1151 

- Ceres seq_id 1597017 

- Location of start within SEQ ID NO 1148: at 207 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Reverse transcriptase (RNA-dependent DNA polymerase) 

- Location within SEQ ID NO 1151: from 49 to 329 aa. 

(Dp) Related Amino Acid Sequences 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1152 

- Ceres seq_id 1597018 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1153 

- Ceres seq_id 1597019 
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- Location of start within SEQ ID NO 1152: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Reverse transcriptase (RNA- dependent DNA polymerase) 

- Location within SEQ ID NO 1153: from 117 to 397 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1154 

- Ceres seq_id 1597020 

- Location of start within SEQ ID NO 1152: at 140 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Reverse transcriptase (RNA- dependent DNA polymerase) 

- Location within SEQ ID NO 1154: from 71 to 351 aa . 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1155 

- Ceres seq_id 1597021 

- Location of start within SEQ ID NO 1152: at 206 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Reverse transcriptase (RNA-dependent DNA polymerase) 

- Location within SEQ ID NO 1155: from 49 to 329 aa. 

(Dp) Related Amino Acid Sequences 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 115 6 

- Ceres seq_id 1597022 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1157 

- Ceres seq_id 1597023 

- Location of start within SEQ ID NO 1156: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cyclic nucleotide-binding domain 

- Location within SEQ ID NO 1157: from 4 to 86 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1158 

- Ceres seq_id 1597024 

- Location of start within SEQ ID NO 1156: at 19 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Cyclic nucleotide-binding domain 

- Location within SEQ ID NO 1158: from 1 to 80 aa . 



(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1159 

- Ceres seq_id 1597025 

- Location of start within SEQ ID NO 1156: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1160 

- Ceres seq_id 1597076 
{ B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1161 

- Ceres seq_id 1597077 

- Location of start within SEQ ID NO 1160: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glutathione S-transf erases . 

- Location within SEQ ID NO 1161: from 5 to 196 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 66 

- gi No. 4006934 

- Description: (AJ012571) glutathione transferase [Arabidopsis 

thaliana] 

- % Identity: 74 

- Alignment Length: 219 

- Location of Alignment in SEQ ID NO 1161: from 1 to 219 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1162 

- Ceres seq_id 1597079 

- Location of start within SEQ ID NO 1160: at 40 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glutathione S-transf erases . 

- Location within SEQ ID NO 1162: from 1 to 183 aa. 

Related Amino Acid Sequences 
Alignment No. 5 67 
gi No. 4006934 

Description: (AJ012571) glutathione transferase [Arabidopsis 

% Identity: 74 
Alignment Length: 219 

Location of Alignment in SEQ ID NO 1162: from 1 to 206 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1163 

- Ceres seq_id 1597092 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1164 

- Ceres seq_id 1597093 



(Dp) 
thaliana] 
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- Location of start within SEQ ID NO 1163: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Helicases conserved C-terminal domain 

- Location within SEQ ID NO 1164: from 2 to 75 aa . 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1165 

- Ceres seq_id 1597094 

- Location of start within SEQ ID NO 1163: at 65 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Helicases conserved C-terminal domain 

- Location within SEQ ID NO 1165: from 1 to 54 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1166 

- Ceres seq_id 1597095 

- Location of start within SEQ ID NO 1163: at 227 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1167 

- Ceres seq_id 1597129 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1168 

- Ceres seq_id 1597130 

- Location of start within SEQ ID NO 1167: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Purine/pyrimidine phosphor ibosyl transferases 

- Location within SEQ ID NO 1168: from 89 to 246 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 68 

- gi No. 399046 

- Description: ADENINE PHOSPHORIBOSYLTRANSFERASE 1 (APRT) 
>gi I 99657 |pir M S20867 adenine phosphoribosyltransf erase (EC 2.4.2.7} 
Arabidopsis thaliana >gi I 16164 i emb | CAA4 14 97 | (X58640) adenine 
phosphoribosyltransf erase [Arabidopsis thaliana] 

- % Identity: 97.8 

- Alignment Length: 18 3 

- Location of Alignment in SEQ ID NO 1168: from 77 to 259 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1169 

- Ceres seq_id 1597132 

- Location of start within SEQ ID NO 1167: at 229 nt . 
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(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s ) 

- Purine/pyrimidine phosphoribosyl transferases 

- Location within SEQ ID NO 1169: from 13 to 170 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 569 

- gi No. 399046 

- Description: ADENINE PHOSPHORIBOSYLTRANSFERASE 1 (APRT) 
>gi ! 99657 i pir | i S20867 adenine phosphoribosyltrans f erase (EC 2.4.2.7) - 
Arabidopsis thaliana >gi ! 16164 | emb I CAA414 97 | (X58640) adenine 
phosphoribosyltransf erase [Arabidopsis thaliana] 

- % Identity: 97.8 

- Alignment Length: 183 

- Location of Alignment in SEQ ID NO 1169: from 1 to 183 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1170 

- Ceres seq__id 1597162 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1171 

- Ceres seq_id 1597163 

- Location of start within SEQ ID NO 1170: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- FKBP-type peptidyl -prolyl cis-trans isomerases 

- Location within SEQ ID NO 1171: from 122 to 224 aa . 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1172 

- Ceres seq_id 1597165 

- Location of start within SEQ ID NO 1170: at 253 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- FKBP-type peptidyl-prolyl cis-trans isomerases 

- Location within SEQ ID NO 1172: from 38 to 140 aa. 

(Dp) Related Amino Acid Sequences 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 117 3 

- Ceres seq_id 1597170 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1174 

- Ceres seq_id 1597171 

- Location of start within SEQ ID NO 1173: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 57 0 

- gi No. 5724774 
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- Description: (AF160183) contains similarity to retrotransposons 
may be a pseuciogene [Arabiciopsis thaliana] 

- % Identity: 79.6 

- Alignment Length: 4 9 

- Location of Alignment in SEQ ID NO 1174: from 42 to 90 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1175 

- Ceres seq_id 1597173 

- Location of start within SEQ ID NO 1173: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 571 

- gi No, 3047067 

- Description: (AF058825) similar to Arabidopsis thaliana 
retrotransposon Athila (GB:X81801) [Arabidopsis thaliana] 

- % Identity: 7 9.2 

- Alignment Length: 24 

- Location of Alignment in SEQ ID NO 1175: from 92 to 115 



Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 117 6 

- Ceres seq_id 1597192 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1177 

- Ceres seq_id 1597193 

- Location of start within SEQ ID NO 117 6: at 1 nt. 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 572 

- gi No. 4325351 

- Description: (AF128394) similar to Antirrhinum majus (garden 
snapdragon) TNP2 protein (GB:X57297) [Arabidopsis thaliana] 

- % Identity: 96.6 

- Alignment Length: 117 

- Location of Alignment in SEQ ID NO 1177: from 1 to 117 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1178 

- Ceres seq_id 1597194 

- Location of start within SEQ ID NO 1176: at 58 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 57 3 

- gi No. 4325351 

- Description: (AF128394) similar to Antirrhinum majus (garden 
snapdragon) TNP2 protein (GB:X57297) [Arabidopsis thaliana] 

- % Identity: 96.6 

- Alignment Length: 117 

- Location of Alignment in SEQ ID NO 1178: from 1 to 98 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1179 
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- Ceres seq__id 1597195 

- Location of start within SEQ ID NO 1176: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 57 4 

- gi No. 4325351 

- Description: (AF128394) similar to Antirrhinum majus (garden 
snapdragon) TNP2 protein (GB:X57297) [Arabidopsis thaliana] 

- % Identity: 96.6 

- Alignment Length: 117 

- Location of Alignment in SEQ ID NO 117 9: from 1 to 78 
Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1180 

- Ceres seq_id 1597208 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1181 

- Ceres seq_id 1597209 

- Location of start within SEQ ID NO 1180: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- E1-E2 ATPases 

- Location within SEQ ID NO 1181: from 1 to 71 aa . 

Related Amino Acid Sequences 
Alignment No. 57 5 
gi No. 3549654 

Description: (AL031394) metal-transporting P-type ATPase 
[Arabidopsis thaliana] 
% Identity: 98.6 
Alignment Length: 71 

Location of Alignment in SEQ ID NO 1181: from 1 to 71 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1182 

- Ceres seq_id 1597210 

- Location of start within SEQ ID NO 1180: at 28 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- E1-E2 ATPases 

- Location within SEQ ID NO 1182: from 1 to 62 aa. 

Related Amino Acid Sequences 
Alignment No. 57 6 
gi No. 3549654 

Description: (AL031394) metal-transporting P-type ATPase 
[Arabidopsis thaliana] 
% Identity: 98.6 
Alignment Length: 71 

Location of Alignment in SEQ ID NO 1182: from 1 to 62 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1183 

- Ceres seq_id 1597211 

- Location of start within SEQ ID NO 1180: at 61 nt . 



(Dp) 
(fragment) 



(Dp) 



( fragment ) 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- E1-E2 ATPases 

- Location within SEQ ID NO 1183: from 1 to 51 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 577 

- gi No. 3549654 

- Description: (AL031394) metal-transporting P-type ATPase 
(fragment) [Arabidopsis thaliana] 

- % Identity: 98.6 

- Alignment Length: 71 

- Location of Alignment in SEQ ID NO 1183: from 1 to 51 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1184 

- Ceres seq_id 1597224 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1185 

- Ceres seq_id 1597225 

- Location of start within SEQ ID NO 1184: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 1185: from 44 to 243 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 57 8 

- gi No. 1170505 

- Description: EUKARYOTIC INITIATION FACTOR 4A-2 (EIF-4A-2) 

>gi 1322504 ! pir | [JC1453 translation initiation factor eIF-4A2 - Arabidopsis 
thaliana >gi I 1 655 6 i emb | CAA4 618 9 | (X65053) eukaryotic translation initiation 
factor 4A-2 [Arabidopsis thaliana] 

- % Identity: 8 6.5 

- Alignment Length: 315 

- Location of Alignment in SEQ ID NO 1185: from 1 to 311 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1186 

- Ceres seq__id 1597226 

- Location of start within SEQ ID NO 1184: at 107 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 1186: from 9 to 208 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 57 9 

- gi No. 1170505 

- Description: EUKARYOTIC INITIATION FACTOR 4A-2 (EIF-4A-2) 

>gi I 322504 (pir | IJC1453 translation initiation factor eIF-4A2 - Arabidopsis 
thaliana >gi I 16556 | emb | CAA46189 I (X65053) eukaryotic translation initiation 
factor 4A-2 [Arabidopsis thaliana] 

- % Identity: 86.5 

- Alignment Length: 315 

- Location of Alignment in SEQ ID NO 118 6: from 1 to 27 6 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1187 

- Ceres seq_id 1597227 

- Location of start within SEQ ID NO 1184: at 356 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DEAD/DEAH box he li case 

- Location within SEQ ID NO 1187: from 1 to 125 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 580 

- gi No. 1170505 

- Description: EUKARYOTIC INITIATION FACTOR 4A-2 (EIF-4A-2) 
>gi[322504 Ipirj IJC1453 translation initiation factor eIF-4A2 - Arabidopsis 
thaliana >gi | 16556 \ emb | CAA4 618 9 | (X65053) eukaryotic translation initiation 
factor 4A-2 [Arabidopsis thaliana] 

- % Identity: 86.5 

- Alignment Length: 315 

- Location of Alignment in SEQ ID NO 1187: from 1 to 193 
Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1188 

- Ceres seq_id 1597228 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1189 

- Ceres seq_id 1597229 

- Location of start within SEQ ID NO 1188: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 581 

- gi No. 3047073 

- Description: (AF058825) contains similarity to retrotransposon- 
like proteins [Arabidopsis thaliana] 

- % Identity: 82.9 

- Alignment Length: 41 

- Location of Alignment in SEQ ID NO 118 9: from 26 to 66 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1190 

- Ceres seq_id 1597257 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1191 

- Ceres seq_id 1597258 

- Location of start within SEQ ID NO 1190: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 582 

- gi No. 5724774 

- Description: (AF160183) contains similarity to retrotransposons 
may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 75.5 
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- Alignment Length: 212 

- Location of Alignment in SEQ ID NO 1191: from 12 6 to 337 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1192 

- Ceres seq_id 1597259 

- Location of start within SEQ ID NO 1190: at 280 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 583 

- gi No. 5724774 

- Description: (AF160183) contains similarity to retrotransposons; 
may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 75.5 

- Alignment Length: 212 

- Location of Alignment in SEQ ID NO 1192: from 33 to 244 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1193 

- Ceres seq_id 1597260 

- Location of start within SEQ ID NO 1190: at 376 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 58 4 

- gi No. 5724774 

- Description: (AF160183) contains similarity to retrotransposons; 
may be a pseudogene [Arabidopsis thaliana] 

- % Identity: 75.5 

- Alignment Length: 212 

- Location of Alignment in SEQ ID NO 1193: from 1 to 212 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1194 

- Ceres seq_id 1597261 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1195 

- Ceres seq_id 1597262 

- Location of start within SEQ ID NO 1194: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s ) 

- Subtilase family 

- Location within SEQ ID NO 1195: from 55 to 113 aa • 

(Dp) Related Amino Acid Sequences 

- Alignment No. 585 

- gi No. 4115920 

- Description: (AF118222) similar to the subtilase family of serine 
proteases (Pfam: PF00082, score; 45.8, E=l.le-ll, n=2) [Arabidopsis thaliana] 

- % Identity: 73.8 

- Alignment Length: 61 

- Location of Alignment in SEQ ID NO 1195: from 120 to 180 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1196 
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- Ceres seq_id 1597264 

- Location of start within SEQ ID NO 1194: at 223 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No* 58 6 

- gi No. 4115920 

- Description: (AF118222) similar to the subtilase family of serine 
proteases {Pfam: PF00082, score; 45.8, E=l.le-ll, n=2) [Arabidopsis thaliana] 

- % Identity: 73.8 

- Alignment Length: 61 

- Location of Alignment in SEQ ID NO 1196: from 46 to 106 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1197 

- Ceres seq_id 1597281 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1198 

- Ceres seq_id 1597282 

- Location of start within SEQ ID NO 1197: at 56 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- LIP family of ribosomal proteins 

- Location within SEQ ID NO 1198: from 1 to 78 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 587 

- gi No. 1709970 

- Description: 60S RIBOSOMAL PROTEIN L10A 

- % Identity: 96 

- Alignment Length: 7 5 

- Location of Alignment in SEQ ID NO 1198: from 1 to 75 

(B) Polypeptide Sequence 

~ Pat. Appln. SEQ ID NO 1199 

- Ceres seq_id 1597283 

- Location of start within SEQ ID NO 1197: at 379 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1200 

- Ceres seq_id 1597284 

- Location of start within SEQ ID NO 1197: at 406 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1201 

- Ceres seq_id 1597285 
(B) Polypeptide Sequence 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 255 



- Pat. Appln. SEQ ID NO 1202 

- Ceres seq__id 1597286 

- Location of start within SEQ ID NO 1201: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Purine/pyrimidine phosphoribosyl transferases 

- Location within SEQ ID NO 1202: from 40 to 197 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 58 8 

- gi No. 399046 

- Description: ADENINE PHOSPHORIBOSYLTRANSFERASE 1 (APRT) 
>gi I 99657 jpir | | S20867 adenine phosphoribosyltransf erase (EC 2.4.2.7) - 
Arabidopsis thaliana >gi I 16164 | emb | CAA4 14 97 ! (X58 64 0) adenine 
phosphoribosyltransf erase [Arabidopsis thaliana] 

- % Identity: 97.8 

- Alignment Length: 183 

- Location of Alignment in SEQ ID NO 1202: from 28 to 210 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1203 

- Ceres seq_id 1597287 

- Location of start within SEQ ID NO 1201: at 82 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Purine/pyrimidine phosphoribosyl transferases 

- Location within SEQ ID NO 1203: from 13 to 170 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 58 9 

- gi No. 399046 

- Description: ADENINE PHOSPHORIBOSYLTRANSFERASE 1 (APRT) 
>gi I 99657 Ipir | | S20867 adenine phosphoribosyltransf erase (EC 2.4.2.7) - 
Arabidopsis thaliana >gi ! 16164 | emb i CAA414 97 ] (X58640) adenine 
phosphoribosyltransf erase [Arabidopsis thaliana] 

- % Identity: 97.8 

- Alignment Length: 18 3 

- Location of Alignment in SEQ ID NO 1203: from 1 to 183 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1204 

- Ceres seq_id 1597288 

- Location of start within SEQ ID NO 1201: at 169 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Purine/pyrimidine phosphoribosyl transferases 

- Location within SEQ ID NO 1204: from 1 to 141 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 90 

- gi No. 399046 

- Description: ADENINE PHOSPHORIBOSYLTRANSFERASE 1 (APRT) 
>gi | 99657 |pir | 1 S20867 adenine phosphoribosyltransf erase (EC 2.4.2.7) - 
Arabidopsis thaliana >gi ] 16164 ] emb [ CAA41497 | (X58640) adenine 
phosphoribosyltransferase [Arabidopsis thaliana] 

- % Identity: 97.8 

- Alignment Length: 183 

- Location of Alignment in SEQ ID NO 1204: from 1 to 154 
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Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1205 

- Ceres seq_id 1597313 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1206 

- Ceres seq_id 1597314 

- Location of start within SEQ ID NO 1205: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- LIP family of ribosomal proteins 

- Location within SEQ ID NO 1206: from 1 to 117 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 91 

- gi No. 1709970 

- Description: 60S RIBOSOMAL PROTEIN L10A 

- % Identity: 8 9.5 

- Alignment Length: 124 

- Location of Alignment in SEQ ID NO 1206: from 1 to 117 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1207 

- Ceres seq_id 1597315 

- Location of start within SEQ ID NO 1205: at 179 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- LIP family of ribosomal proteins 

- Location within SEQ ID NO 1207: from 1 to 58 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 92 

- gi No. 1709970 

- Description: 60S RIBOSOMAL PROTEIN L10A 
» % Identity: 89.5 

- Alignment Length: 12 4 

- Location of Alignment in SEQ ID NO 1207: from 1 to 58 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1208 

- Ceres seq_id 1597316 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1209 

- Ceres seq_id 1597317 

- Location of start within SEQ ID NO 1208: at 120 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Purine/pyrimidine phosphoribosyl transferases 

- Location within SEQ ID NO 1209: from 73 to 230 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 93 

- gi No. 399046 
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- Description: ADENINE PHOSPHORIBOSYLTRANSFERASE 1 (APRT) 
>gil 99657 [pirj IS20867 adenine phosphoribosyltransferase (EC 2.4.2.7) 
Arabidopsis thaliana >gi I 1 61 64 | emb I CAA4 14 97 I (X58640) adenine 
phosphoribosyltransferase [Arabidopsis thaliana] 

- % Identity: 97.8 

- Alignment Length: 183 

- Location of Alignment in SEQ ID NO 1209: from 61 to 243 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1210 

- Ceres seq_id 1597318 

- Location of start within SEQ ID NO 1208: at 300 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide ( s ) 

- Purine/pyrimidine phosphoribosyl transferases 

- Location within SEQ ID NO 1210: from 13 to 170 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 94 

- gi No. 399046 

- Description: ADENINE PHOSPHORIBOSYLTRANSFERASE 1 (APRT) 
>gi| 99657 ipir | ! S20867 adenine phosphoribosyltransferase (EC 2.4.2.7) 
Arabidopsis thaliana >gi I 16164 [ emb | CAA414 97 | (X58640) adenine 
phosphoribosyltransferase [Arabidopsis thaliana] 

- % Identity: 97.8 

- Alignment Length: 183 

- Location of Alignment in SEQ ID NO 1210: from 1 to 183 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1211 

- Ceres seq_id 1597319 

- Location of start within SEQ ID NO 1208: at 387 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Purine/pyrimidine phosphoribosyl transferases 

- Location within SEQ ID NO 1211: from 1 to 141 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 95 

- gi No. 399046 

- Description: ADENINE PHOSPHORIBOSYLTRANSFERASE 1 (APRT) 
>gi | 99657 Ipir 1 1 S20867 adenine phosphoribosyltransferase (EC 2.4.2.7) 
Arabidopsis thaliana >gi I 161 64 i emb 1 CAA41497 | (X58640) adenine 
phosphoribosyltransferase [Arabidopsis thaliana] 

- % Identity: 97.8 

- Alignment Length: 18 3 

- Location of Alignment in SEQ ID NO 1211: from 1 to 154 
Maximum Length Sequence: 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1212 

- Ceres seq_id 1597344 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1213 

- Ceres seq_id 1597345 

- Location of start within SEQ ID NO 1212: at 3 nt . 
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(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 1213: from 43 to 201 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 96 

- gi No. 1170505 

- Description: EUKARYOTIC INITIATION FACTOR 4A-2 (EIF-4A-2) 

>gi 1 322504 !pir | IJC1453 translation initiation factor eIF-4A2 - Arabidopsis 
thaliana >gi I 16556 | emb j CAA4618 9 | (X65053) eukaryotic translation initiation 
factor 4A-2 [Arabidopsis thaliana] 

- % Identity: 95.5 

- Alignment Length: 2 01 

- Location of Alignment in SEQ ID NO 1213: from 1 to 201 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1214 

- Ceres seq_id 1597346 

- Location of start within SEQ ID NO 1212: at 105 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 1214: from 9 to 167 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 5 97 

- gi No. 1170505 

- Description: EUKARYOTIC INITIATION FACTOR 4A-2 (EIF-4A-2) 

>gi I 322504 | pir i 1 JC1453 translation initiation factor eIF-4A2 - Arabidopsis 
thaliana >gi I 1655 6 | emb | CAA4 618 9 | (X65053) eukaryotic translation initiation 
factor 4A-2 [Arabidopsis thaliana] 

- % Identity: 95.5 

- Alignment Length: 2 01 

- Location of Alignment in SEQ ID NO 1214: from 1 to 167 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1215 

- Ceres seq_id 1597347 

- Location of start within SEQ ID NO 1212: at 283 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1216 

- Ceres seq_id 1597360 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1217 

- Ceres seq_id 1597361 

- Location of start within SEQ ID NO 1216: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Monooxygenase 

- Location within SEQ ID NO 1217: from 35 to 233 aa . 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 5 98 

- gi No. 3123327 

- Description: (AJ005927) squalene epoxidase homologue [Arabidopsis 

thaliana] 

- % Identity: 84.7 

- Alignment Length: 353 

- Location of Alignment in SEQ ID NO 1217: from 1 to 352 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1218 

- Ceres seq_id 1597362 

- Location of start within SEQ ID NO 1216: at 248 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Monooxygenase 

- Location within SEQ ID NO 1218: from 1 to 151 aa. 

Related Amino Acid Sequences 
Alignment No. 5 99 
gi No. 3123327 

Description: (AJ0 05 927) squalene epoxidase homologue [Arabidopsis 

% Identity: 84.7 
Alignment Length: 353 

Location of Alignment in SEQ ID NO 1218: from 1 to 270 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1219 

- Ceres seq_id 1597363 

- Location of start within SEQ ID NO 1216: at 269 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Monooxygenase 

- Location within SEQ ID NO 1219: from 1 to 144 aa. 

Related Amino Acid Sequences 
Alignment No. 600 
gi No. 3123327 

Description: (AJ005927) squalene epoxidase homologue [Arabidopsis 

% Identity: 8 4.7 
Alignment Length: 353 

Location of Alignment in SEQ ID NO 1219: from 1 to 263 

Maximum Length Sequence: 

related to: 
Clone IDs: 

208396 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1220 

- Ceres seq_id 1597376 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1221 

- Ceres seq_id 1597377 

- Location of start within SEQ ID NO 1220: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 



(Dp) 



thaliana] 



(Dp) 



thaliana] 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 601 

- gi No. 3914899 

- Description: 40S RIBOSOMAL PROTEIN S4 >gi 12331301 (AF013487) 
ribosomal protein S4 type I [Zea mays] 

- % Identity: 89.7 

- Alignment Length: 2 9 

- Location of Alignment in SEQ ID NO 1221: from 26 to 54 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1222 

- Ceres seq_id 1597378 

- Location of start within SEQ ID NO 1220: at 271 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal family S4e 

- Location within SEQ ID NO 1222: from 1 to 77 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 602 

- gi No. 3914899 

- Description: 40S RIBOSOMAL PROTEIN S4 >gi 12331301 (AF013487) 
ribosomal protein S4 type I [Zea mays] 

- % Identity: 100 

- Alignment Length: 8 3 

- Location of Alignment in SEQ ID NO 1222: from 1 to 77 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1223 

- Ceres seq_id 1597379 

- Location of start within SEQ ID NO 1220: at 281 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

209351 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1224 

- Ceres seq_id 1597392 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1225 

- Ceres seq_id 1597393 

- Location of start within SEQ ID NO 1224: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 60s Acidic ribosomal protein 

- Location within SEQ ID NO 1225: from 34 to 112 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 603 

- gi No. 899610 

- Description: (U29383) acidic ribosomal protein P2 [Zea mays] 

- % Identity: 7 6.6 

- Alignment Length: 94 

- Location of Alignment in SEQ ID NO 1225: from 34 to 120 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1226 

- Ceres seq_id 1597394 

- Location of start within SEQ ID NO 1224: at 100 nt. 

<C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- 60s Acidic ribosomal protein 

- Location within SEQ ID NO 1226: from 1 to 7 9 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 604 

- gi No. 899610 

- Description: (U29383) acidic ribosomal protein P2 [Zea mays] 

- % Identity: 7 6.6 

- Alignment Length: 94 

- Location of Alignment in SEQ ID NO 1226: from 1 to 87 

Maximum Length Sequence: 

related to: 
Clone IDs: 

209869 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1227 

- Ceres seq_id 1597396 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1228 

- Ceres seq_id 1597397 

- Location of start within SEQ ID NO 1227: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 122 9 

- Ceres seq_id 1597398 

- Location of start within SEQ ID NO 1227: at 99 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 60s Acidic ribosomal protein 

- Location within SEQ ID NO 122 9: from 1 to 94 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 605 

- gi No. 4204376 

- Description: (U62750) acidic ribosomal protein P2a-4 [Zea m 

- % Identity: 100 

- Alignment Length: 58 

- Location of Alignment in SEQ ID NO 1229: from 19 to 7 6 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1230 

- Ceres seq_id 1597399 

- Location of start within SEQ ID NO 1227: at 213 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- 60s Acidic ribosomal protein 
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- Location within SEQ ID NO 1230: from 1 to 56 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 60 6 

- gi No. 4204376 

- Description: (U62750) acidic ribosomal protein P2a-4 [Zea mays] 

- % Identity: 100 

- Alignment Length: 58 

- Location of Alignment in SEQ ID NO 1230: from 1 to 38 

Maximum Length Sequence: 

related to: 
Clone IDs: 

212251 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1231 

- Ceres seq__id 1597412 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1232 

- Ceres seq_id 1597413 

- Location of start within SEQ ID NO 1231: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 607 

- gi No. 1326372 

- Description: (U58750) Similar to Histone. [Caenorhabditis 

elegans] 

- % Identity: 76.6 

- Alignment Length: 94 

- Location of Alignment in SEQ ID NO 1232: from 28 to 64 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1233 

- Ceres seq_id 1597414 

- Location of start within SEQ ID NO 1231: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1234 

- Ceres seq_id 1597415 

- Location of start within SEQ ID NO 1231: at 337 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

212303 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1235 

- Ceres seq_id 1597416 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1236 

- Ceres seq_id 1597417 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1237 

- Ceres seq_id 1597418 

- Location of start within SEQ ID NO 1235: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 60 8 

- gi No. 642484 

- Description: (U16371) androgen receptor [Homo sapiens] 

- % Identity: 76.9 

- Alignment Length: 13 

- Location of Alignment in SEQ ID NO 1237: from 64 to 7 6 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1238 

- Ceres seq_id 1597419 

- Location of start within SEQ ID NO 1235: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

217986 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1239 

- Ceres seq_id 1597426 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1240 

- Ceres seq_id 1597427 

- Location of start within SEQ ID NO 1239: at 137 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 609 

- gi No. 2499334 

- Description: NADH-UBIQUINONE OXIDOREDUCTASE 11 KD SUBUNIT 
(COMPLEX I-11KD) (CI-11KD) 

- % Identity: 88.9 

- Alignment Length: 18 

- Location of Alignment in SEQ ID NO 1240: from 2 to 19 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1241 

- Ceres seq_id 1597428 

- Location of start within SEQ ID NO 1239: at 149 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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- Alignment No. 610 

- gi No. 2499334 

- Description: NADH-UBIQUINONE OXIDOREDUCTASE 11 KD SUBUNIT 
{COMPLEX I-11KD) (CI-11KD) 

- % Identity: 88.9 

- Alignment Length: 18 

- Location of Alignment in SEQ ID NO 1241: from 1 to 15 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1242 

- Ceres seq_id 1597429 

- Location of start within SEQ ID NO 1239: at 182 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

218075 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1243 

- Ceres seq_id 1597430 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1244 

- Ceres seq_id 1597431 

- Location of start within SEQ ID NO 1243: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Eukaryotic initiation factor 5A hypusine (eIF-5A) 

- Location within SEQ ID NO 1244: from 45 to 187 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 611 

- gi No. 3024018 

- Description: INITIATION FACTOR 5A (EIF-5A) (EIF-4D) 

>gi| 1546919 iemb|CAA69225.1 I (Y07920) translation initiation factor 5A [Z 
mays] >gi 12668738 (AF034943) translation initiation factor 5A [Zea mays] 

- % Identity: 100 

- Alignment Length: 160 

- Location of Alignment in SEQ ID NO 1244: from 33 to 192 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1245 

- Ceres seq_id 1597432 

- Location of start within SEQ ID NO 1243: at 98 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Eukaryotic initiation factor 5A hypusine (eIF-5A) 

- Location within SEQ ID NO 1245: from 13 to 155 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 612 

- gi No. 3024018 

- Description: INITIATION FACTOR 5A (EIF-5A) (EIF-4D) 

>gi I 1546919 lembi CAA69225.il (Y07920) translation initiation factor 5A [Zea 
mays] >gi i 2668738 (AF034943) translation initiation factor 5A [Zea mays] 

- % Identity: 100 
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- Alignment Length: 160 

- Location of Alignment in SEQ ID NO 1245: from 1 to 160 

Maximum Length Sequence: 

related to: 
Clone IDs: 

218280 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 124 6 

- Ceres seq_id 1597437 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1247 

- Ceres seq_id 1597438 

- Location of start within SEQ ID NO 1246: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 613 

- gi No. 115815 

- Description: CHLOROPHYLL A— B BINDING PROTEIN OF LHCII TYPE I 
PRECURSOR (CAB-M9) (LHCP) >gi | 1008 66 i pir S [ S130 98 chlorophyll a/b-binding 
protein precursor - maize >gi I 22355 | emb | CAA3937 6 i (X55892) light-harvesting 
chlorophyll a/b binding protein [Zea mays] 

- % Identity: 91.4 

- Alignment Length: 58 

- Location of Alignment in SEQ ID NO 1247: from 22 to 78 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 124 8 

- Ceres seq_id 1597439 

- Location of start within SEQ ID NO 124 6: at 65 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 614 

- gi No. 115815 

- Description: CHLOROPHYLL A-B BINDING PROTEIN OF LHCII TYPE I 
PRECURSOR (CAB-M9) (LHCP) >gi | 1008 66 I pir j IS13098 chlorophyll a/b-binding 
protein precursor - maize >gi I 22355 | emb | CAA3937 6 I (X55892) light-harvesting 
chlorophyll a/b binding protein [Zea mays] 

- % Identity: 91.4 

- Alignment Length: 58 

- Location of Alignment in SEQ ID NO 1248: from 1 to 57 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1249 

- Ceres seq_id 1597440 

- Location of start within SEQ ID NO 1246: at 80 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 615 

- gi No. 115815 

- Description: CHLOROPHYLL A-B BINDING PROTEIN OF LHCII TYPE I 
PRECURSOR (CAB-M9) (LHCP) >gi | 1008 66 | pir M S13098 chlorophyll a/b-binding 
protein precursor - maize >gi I 22355 | emb | CAA3937 6 | (X55892) light-harvesting 
chlorophyll a/b binding protein [Zea mays] 

- % Identity: 91.4 
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- Alignment Length: 58 

- Location of Alignment in SEQ ID NO 124 9: from 1 to 52 



Maximum Length Sequence: 

related to: 
Clone IDs: 

218874 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1250 

- Ceres seq__id 1597452 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1251 

- Ceres seq_id 1597453 

- Location of start within SEQ ID NO 1250: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 616 

- gi No. 1169528 

- Description: ENOLASE 2 { 2-PHOSPHOGLYCERATE DEHYDRATASE 2) (2- 
PHOSPHO-D-GLYCERATE HYDRO-LYASE 2) >gi | 602253 (U17973) enolase [Zea mays] 

- % Identity: 100 

- Alignment Length: 2 6 

- Location of Alignment in SEQ ID NO 1251: from 18 to 42 



Maximum Length Sequence: 

related to: 
Clone IDs: 

219095 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1252 

- Ceres seq_id 1597456 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1253 

- Ceres seq_id 1597457 

- Location of start within SEQ ID NO 1252: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 617 

- gi No. 464466 

- Description: PROFILIN 1 >gi 1 422031 | pir | I S357 96 profilin 1 - maize 
>gi|313138|emb|CAA51718 j (X73279) profilin 1 [Zea mays] 

- % Identity: 100 

- Alignment Length: 29 

- Location of Alignment In SEQ ID NO 1253: from 72 to 100 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1254 

- Ceres seq_id 1597458 

- Location of start within SEQ ID NO 1252: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Profilins 

- Location within SEQ ID NO 1254: from 42 to 147 aa . 



(Dp) Related Amino Acid Sequences 
- Alignment No. 618 
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- gi No. 464466 

- Description: PROFILIN 1 >gi I 422031 1 pir || S357 96 profilin 1 - maize 
>gi | 313138 jemb|CAA51718| (X73279) profilin 1 [Zea mays] 

- % Identity: 71.3 

- Alignment Length: 10 9 

- Location of Alignment in SEQ ID NO 1254: from 41 to 147 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1255 

- Ceres seq_id 1597459 

- Location of start within SEQ ID NO 1252: at 123 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Profilins 

- Location within SEQ ID NO 1255: from 2 to 107 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 619 

- gi No. 464466 

- Description: PROFILIN 1 >gi I 422031 i pir I I S357 96 profilin 1 - maize 
>gi|313138 |emb|CAA51718| (X73279) profilin 1 [Zea mays] 

- % Identity: 71.3 

- Alignment Length: 109 

- Location of Alignment in SEQ ID NO 1255: from 1 to 107 



Maximum Length Sequence: 

related to: 
Clone IDs: 

219884 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1256 

- Ceres seq__id 1597476 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1257 

- Ceres seq__id 1597477 

- Location of start within SEQ ID NO 1256: at 83 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Bowman-Birk serine protease inhibitor family 

- Location within SEQ ID NO 1257: from 36 to 87 aa. 



(Dp) Related Amino Acid Sequences 



Maximum Length Sequence: 

related to: 
Clone IDs: 

219975 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1258 

- Ceres seq_id 1597478 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1259 

- Ceres seq_id 1597479 

- Location of start within SEQ ID NO 1258: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
- Alignment No. 620 
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- gi No. 100883 

- Description: heat shock protein 17.2 - maize 

>gi 122335 | emb|CAA46641i (X65725) heat shock protein 17.2 [Zea mays] 

- % Identity: 100 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 1259: from 124 to 134 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1260 

- Ceres seq_id 1597480 

- Location of start within SEQ ID NO 1258: at 110 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Hsp20/alpha crystallin family 

- Location within SEQ ID NO 1260: from 41 to 93 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 621 

- gi No. 100883 

- Description: heat shock protein 17.2 - maize 

>gii 22335 iemb!CAA46641| (X65725) heat shock protein 17.2 [Zea mays] 

- % Identity: 88.2 

- Alignment Length: 93 

- Location of Alignment in SEQ ID NO 1260: from 1 to 93 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1261 

- Ceres seq_id 1597481 

- Location of start within SEQ ID NO 1258: at 152 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide {s ) 

- Hsp20/alpha crystallin family 

- Location within SEQ ID NO 1261: from 27 to 79 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 622 

- gi No. 100883 

- Description: heat shock protein 17.2 - maize 

>gi j 22335 | emb ICAA46641 i (X65725) heat shock protein 17.2 [Zea mays] 

- % Identity: 88.2 

- Alignment Length: 93 

- Location of Alignment in SEQ ID NO 1261: from 1 to 79 

Maximum Length Sequence: 

related to: 
Clone IDs: 

219985 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1262 

- Ceres seq_id 1597482 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1263 

- Ceres seq__id 1597483 

- Location of start within SEQ ID NO 1262: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 14-3-3 proteins 

- Location within SEQ ID NO 1263: from 32 to 153 aa . 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 623 

- gi No. 1519251 

- Description: (U65957) GF14-C protein [Oryza sativa] 

- % Identity: 96.8 

- Alignment Length: 125 

- Location of Alignment in SEQ ID NO 1263: from 30 to 153 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 12 64 

- Ceres seq_id 1597484 

- Location of start within SEQ ID NO 1262: at 8 9 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- 14-3-3 proteins 

- Location within SEQ ID NO 1264: from 3 to 124 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 624 

- gi No. 1519251 

- Description: (U65957) GF14-C protein [Oryza sativa] 

- % Identity: 96.8 

- Alignment Length: 125 

- Location of Alignment in SEQ ID NO 1264: from 1 to 124 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1265 

- Ceres seq_id 1597485 

- Location of start within SEQ ID NO 1262: at 113 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- 14-3-3 proteins 

- Location within SEQ ID NO 1265: from 1 to 116 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 625 

- gi No. 1519251 

- Description: (U65957) GF14-C protein [Oryza sativa] 

- % Identity: 96.8 

- Alignment Length: 125 

- Location of Alignment in SEQ ID NO 1265: from 1 to 116 



Maximum Length Sequence: 

related to: 
Clone IDs: 

221197 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 12 6 6 

- Ceres seq_id 1597499 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1267 

- Ceres seq_id 1597500 

- Location of start within SEQ ID NO 1266: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
- Alignment No. 62 6 
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- gi No. 112110 

- Description: p23 protein - rat >gi i 24 6868 | bbs | 90835 nucleic acid- 
binding protein p23=yeast ribosomal protein YL43 homolog [rats, liver, 
Peptide Partial, 28 aa] 

- % Identity: 74.1 

- Alignment Length: 27 

- Location of Alignment in SEQ ID NO 12 67: from 2 to 28 

Maximum Length Sequence: 

related to: 
Clone IDs: 

222572 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1268 

- Ceres seq_id 1597502 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1269 

- Ceres seq__id 1597503 

- Location of start within SEQ ID NO 1268: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sm protein 

- Location within SEQ ID NO 1269: from 88 to 139 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 127 0 

- Ceres seq_id 1597504 

- Location of start within SEQ ID NO 1268: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- SRF-type transcription factor (DNA-binding and dimerisation 

domain) 

- Location within SEQ ID NO 1270: from 58 to 93 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 627 

- gi No. 81611 

- Description: floral homeotic protein AGL3 - Arabidopsis thaliana 

( fragment) 

- % Identity: 74.1 

- Alignment Length: 27 

- Location of Alignment in SEQ ID NO 1270: from 58 to 84 

Maximum Length Sequence: 

related to: 
Clone IDs: 

223016 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1271 

- Ceres seq_id 1597519 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1272 

- Ceres seq__id 1597520 

- Location of start within SEQ ID NO 1271: at 2 nt, 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1273 

- Ceres seq_id 1597521 

- Location of start within SEQ ID NO 1271: at 85 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1273: from 4 to 59 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 628 

- gi No. 729671 

- Description: HISTONE H2A >gi [ 473603 (U08225) histone H2A [Zea 

mays] 

- % Identity: 100 

- Alignment Length: 54 

- Location of Alignment in SEQ ID NO 1273: from 1 to 54 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1274 

- Ceres seq_id 1597522 

- Location of start within SEQ ID NO 1271: at 327 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

218282 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1275 

- Ceres seq_id 1597540 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1276 

- Ceres seq_id 1597541 

- Location of start within SEQ ID NO 1275: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1277 

- Ceres seq_id 1597542 

- Location of start within SEQ ID NO 1275: at 83 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1277: from 8 to 88 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 629 

- gi No. 70712 

- Description: histone H2A.3 - wheat (fragment) 

- % Identity: 85.7 
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- Alignment Length: 2 8 

- Location of Alignment in SEQ ID NO 1277: from 17 to 44 

Maximum Length Sequence: 

related to: 
Clone IDs: 

221082 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1278 

- Ceres seq_id 1597553 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 127 9 

- Ceres seq__id 1597554 

- Location of start within SEQ ID NO 1278: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1280 

- Ceres seq_id 1597555 

- Location of start within SEQ ID NO 1278: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1281 

- Ceres seq_id 1597556 

- Location of start within SEQ ID NO 1278: at 198 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cytochrome c 

- Location within SEQ ID NO 1281: from 10 to 93 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 630 

- gi No. 118011 

- Description: CYTOCHROME C >gi [ 62518 9 i pir j| CCRZ cytochrome c - 
rice >gi | 169786 (M63704) cytochrome c [Oryza sativa] >gi I 218249 i dbj i BAA02159 | 
(D12634) 'cytochrome C T [Oryza sativa] 

- % Identity: 97.9 

- Alignment Length: 94 

- Location of Alignment in SEQ ID NO 1281: from 1 to 93 

Maximum Length Sequence : 

related to: 
Clone IDs: 

225408 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1282 

- Ceres seq_id 1597579 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 128 3 

- Ceres seq__id 1597580 

- Location of start within SEQ ID NO 1282: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1284 

- Ceres seq_id 1597581 

- Location of start within SEQ ID NO 1282: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide {s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1285 

- Ceres seq_id 1597582 

- Location of start within SEQ ID NO 1282: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

{Dp} Related Amino Acid Sequences 

- Alignment No. 631 

- gi No. 1326372 

- Description: (U58750) Similar to Histone. [Caenorhabditis 

elegans] 

- % Identity: 94.1 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 1285: from 22 to 37 

Maximum Length Sequence: 

related to: 
Clone IDs: 

225493 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 128 6 

- Ceres seq_id 1597589 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1287 

- Ceres seq_id 1597590 

- Location of start within SEQ ID NO 128 6: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ribosomal L5P family C-terminus 

- Location within SEQ ID NO 1287: from 90 to 159 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 632 

- gi No. 1173055 

- Description: 60S RIBOSOMAL PROTEIN Lll (L5) 
>gi I 541961 Ipir [ | S42497 ribosomal protein Lll.e - alfalfa 
>gil 1076504 |pir||S51819 RL5 ribosomal protein - alfalfa 

>gi 1 463252 | emblCAA55090 1 (X78284) RL5 ribosomal protein [Medicago sativa] 

- % Identity: 96.3 

- Alignment Length: 135 

- Location of Alignment in SEQ ID NO 1287: from 25 to 159 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1288 

- Ceres seq__id 1597591 

- Location of start within SEQ ID NO 1286: at 75 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ribosomal L5P family C-terminus 

- Location within SEQ ID NO 1288: from 66 to 135 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 633 

- gi No. 1173055 

- Description: 60S RIBOSOMAL PROTEIN Lll (L5) 
>gi| 541961 ipiri I S42497 ribosomal protein Lll.e - alfalfa 
>gii 1076504 Ipiri IS51819 RL5 ribosomal protein - alfalfa 

>gi!463252|emb|CAA55090i (X78284) RL5 ribosomal protein [Medicago sativa] 

- % Identity: 96.3 

- Alignment Length: 135 

- Location of Alignment in SEQ ID NO 1288: from 1 to 135 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 128 9 

- Ceres seq_id 1597592 

- Location of start within SEQ ID NO 1286: at 105 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- ribosomal L5P family C-terminus 

- Location within SEQ ID NO 1289: from 56 to 125 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 634 

- gi No. 1173055 

- Description: 60S RIBOSOMAL PROTEIN Lll (L5) 
>gi | 541961 Ipir ! | S42497 ribosomal protein Lll.e - alfalfa 
>gi|1076504 |pir!|S51819 RL5 ribosomal protein - alfalfa 

>gi j 4 63252 | emb | CAA550 90 | (X78284) RL5 ribosomal protein [Medicago sativa. 

- % Identity: 96.3 

- Alignment Length: 135 

- Location of Alignment in SEQ ID NO 1289: from 1 to 125 

Maximum Length Sequence: 

related to: 
Clone IDs: 

226513 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 12 90 

- Ceres seq_id 1597600 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1291 

- Ceres seq_id 1597601 

- Location of start within SEQ ID NO 1290: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 635 

- gi No. 2662415 

- Description: (U97494) metallothionein-like protein [Prunus 

armeniaca] 

- % Identity: 71.4 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 1291: from 31 to 51 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1292 

- Ceres seq_id 1597602 

- Location of start within SEQ ID NO 1290: at 159 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1293 

- Ceres seq_id 1597603 

- Location of start within SEQ ID NO 1290; at 171 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

228402 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 12 94 

- Ceres seq_id 1597 620 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1295 

- Ceres seq_id 1597621 

- Location of start within SEQ ID NO 1294: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

{Dp) Related Amino Acid Sequences 

- Alignment No. 636 

- gi No. 2134213 

- Description: protamine I - American alligator 

- % Identity: 76.9 

- Alignment Length: 13 

- Location of Alignment in SEQ ID NO 1295: from 77 to 89 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1296 

- Ceres seq_id 1597622 

- Location of start within SEQ ID NO 1294: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 637 

- gi No. 1173350 

- Description: DNA BINDING PROTEIN SlFA 

- % Identity: 79.2 

- Alignment Length: 2 4 

- Location of Alignment in SEQ ID NO 1296: from 34 to 57 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1297 

- Ceres seq_id 1597623 

- Location of start within SEQ ID NO 1294: at 158 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

228477 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1298 

- Ceres seq_id 1597631 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1299 

- Ceres seq_id 1597632 

- Location of start within SEQ ID NO 1298: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1300 

- Ceres seq_id 1597633 

- Location of start within SEQ ID NO 1298: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1301 

- Ceres seq_id 1597634 

- Location of start within SEQ ID NO 1298: at 86 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1301: from 4 to 59 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 638 

- gi No. 122007 

- Description: HISTONE H2A >gi 1 100 1 61 i pir i | S114 98 histone H2A 
parsley >gi | 20448 ! emb | CAA37 828 | (X53831) H2A histone protein (AA 1 - 14: 
[Petroselinum crispum] 

- % Identity: 86.7 

- Alignment Length: 4 5 

- Location of Alignment in SEQ ID NO 1301: from 10 to 54 

Maximum Length Sequence: 

related to: 
Clone IDs: 

228554 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1302 

- Ceres seq_id 1597639 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1303 

- Ceres seq^id 1597640 

- Location of start within SEQ ID NO 1302: at 2 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 63 9 

- gi No. 100856 

- Description: b-32 protein - maize >gi I 22188 [ emb | CAA4 9722 | 
(X70153) protein b-32 [Zea mays] 

- % Identity; 82.1 

- Alignment Length: 2 8 

- Location of Alignment in SEQ ID NO 1303: from 25 to 52 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1304 

- Ceres seq_id 1597641 

- Location of start within SEQ ID NO 1302: at 5 nt, 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 640 

- gi No. 100856 

- Description: b-32 protein - maize >gi I 22 18 8 | emb I CAA4 97 22 1 
(X70153) protein b-32 [Zea mays] 

- % Identity: 82.1 

- Alignment Length: 28 

- Location of Alignment in SEQ ID NO 1304: from 24 to 51 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1305 

- Ceres seq_id 1597642 

- Location of start within SEQ ID NO 1302: at 263 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 641 

- gi No. 496164 

- Description: (L26305) ribosome-inactivating protein [Zea 
mays] >gi I 1096509 iprf M 2111429A ribosome-inactivating protein [Zea mays] 

- % Identity: 92.9 

- Alignment Length: 28 

- Location of Alignment in SEQ ID NO 1305: from 34 to 61 

Maximum Length Sequence: 

related to: 
Clone IDs: 

229046 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1306 

- Ceres seq_id 1597658 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1307 

- Ceres seq_id 1597659 

- Location of start within SEQ ID NO 1306: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1308 
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- Ceres seq_id 1597660 

- Location of start within SEQ ID NO 1306: at 115 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1309 

- Ceres seq_id 1597661 

- Location of start within SEQ ID NO 1306: at 156 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Hsp20/alpha crystallin family 

- Location within SEQ ID NO 130 9: from 28 to 62 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 642 

- gi No. 100883 

- Description: heat shock protein 17.2 - maize 

>gi | 22335 1 emb | CAA4 6641 | (X65725) heat shock protein 17.2 [Zea mays] 

- % Identity: 98.7 

- Alignment Length: 7 9 

- Location of Alignment in SEQ ID NO 1309: from 1 to 71 

Maximum Length Sequence: 

related to: 
Clone IDs: 

229243 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1310 

- Ceres seq_id 1597666 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1311 

- Ceres seq__id 1597667 

- Location of start within SEQ ID NO 1310: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1312 

- Ceres seq_id 1597668 

- Location of start within SEQ ID NO 1310: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1313 

- Ceres seq_id 1597669 

- Location of start within SEQ ID NO 1310: at 136 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 643 

- gi No. 5106775 
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- Description: (AF067732) ribosomal protein S12 [Hordeum vulgare] 

- % Identity: 8 6.7 

- Alignment Length: 30 

- Location of Alignment in SEQ ID NO 1313: from 1 to 30 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

229888 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1314 

- Ceres seq_id 1597674 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1315 

- Ceres seq_id 1597675 

- Location of start within SEQ ID NO 1314: at 103 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Ribosomal protein Sll 

- Location within SEQ ID NO 1315: from 28 to 98 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 64 4 

- gi No. 131772 

- Description: 40S RIBOSOMAL PROTEIN S14 (CLONE MCHl) 
>gi!827231pir| IA30097 ribosomal protein S14 (clone MCHl) - maize 

- % Identity: 99 

- Alignment Length: 99 

- Location of Alignment in SEQ ID NO 1315: from 1 to 98 



Maximum Length Sequence: 

related to: 
Clone IDs: 

224490 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1316 

- Ceres seq_id 1597698 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1317 

- Ceres seq_id 1597699 

- Location of start within SEQ ID NO 1316: at 124 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1317: from 17 to 91 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 64 5 

- gi No. 1053059 

- Description: (U38423) histone H3 [Triticum aestivum] 

- % Identity: 97.8 

- Alignment Length: 92 

- Location of Alignment in SEQ ID NO 1317: from 1 to 91 



Maximum Length Sequence : 

related to: 
Clone IDs: 

230129 

(Ac) cDNA Polynucleotide Sequence 
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- Pat. Appln. SEQ ID NO 1318 

- Ceres seq_id 1597726 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1319 

- Ceres seq_id 1597727 

- Location of start within SEQ ID NO 1318; at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 64 6 

- gi No. 2492519 

- Description: 26S PROTEASE REGULATORY SUBUNIT 7 (2 6S PROTEASOME 
SUBUNIT 7) >gi 1 1395191 I dbj 1 BAA13021 | (D86121) 26S proteasome ATPase subunit 
[Spinacia oleracea] 

- % Identity: 90.2 

- Alignment Length: 122 

- Location of Alignment in SEQ ID NO 1319: from 43 to 163 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1320 

- Ceres seq_id 1597728 

- Location of start within SEQ ID NO 1318: at 136 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 647 

- gi No. 2492519 

- Description: 2 6S PROTEASE REGULATORY SUBUNIT 7 (2 6S PROTEASOME 
SUBUNIT 7) >gi | 1395191 |dbj IBAA13021 | (D86121) 26S proteasome ATPase subunit 
[Spinacia oleracea] 

- % Identity: 90.2 

- Alignment Length: 122 

- Location of Alignment in SEQ ID NO 1320: from 1 to 118 

Maximum Length Sequence: 

related to: 
Clone IDs: 

230721 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1321 

- Ceres seq__id 1597732 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1322 

- Ceres seq_id 1597733 

- Location of start within SEQ ID NO 1321: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 64 8 

- gi No. 4666287 

- Description: (D857 64) cytosolic monodehydroascorbate reductase 

[Oryza sativa] 

- % Identity: 93 

- Alignment Length: 128 

- Location of Alignment in SEQ ID NO 1322: from 1 to 128 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1323 
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- Ceres seq_id 1597734 

- Location of start within SEQ ID NO 1321: at 105 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

232971 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1324 

- Ceres seq__id 1597741 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1325 

- Ceres seq_id 1597742 

- Location of start within SEQ ID NO 1324: at 86 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribosomal protein S8 

- Location within SEQ ID NO 1325: from 5 to 114 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 64 9 

- gi No. 1173218 

- Description: 40S RIBOSOMAL PROTEIN S15A >gi 1440824 (L27461) 
ribosomal protein S15 [Arabidopsis thaliana] >gi 12150130 (AF001412) 
cytoplasmic ribosomal protein S15a [Arabidopsis thaliana] 

- % Identity: 96.5 

- Alignment Length: 115 

- Location of Alignment in SEQ ID NO 1325: from 1 to 114 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1326 

- Ceres seq__id 1597743 

- Location of start within SEQ ID NO 1324: at 125 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S8 

- Location within SEQ ID NO 1326: from 1 to 101 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 650 

- gi No. 1173218 

- Description: 40S RIBOSOMAL PROTEIN S15A >gi i 440824 (L27461) 
ribosomal protein S15 [Arabidopsis thaliana] >gi i 2150130 (AF001412) 
cytoplasmic ribosomal protein S15a [Arabidopsis thaliana] 

- % Identity: 96.5 

- Alignment Length: 115 

- Location of Alignment in SEQ ID NO 1326: from 1 to 101 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1327 

- Ceres seq__id 1597744 

- Location of start within SEQ ID NO 1324: at 161 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- Ribosomal protein S8 

- Location within SEQ ID NO 1327: from 1 to 89 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 651 

- gi No. 1173218 

- Description; 40S RIBOSOMAL PROTEIN S15A >gi | 440824 (L27461) 
ribosomal protein S15 [Arabidopsis thaliana] >gi 12150130 (AF001412) 
cytoplasmic ribosomal protein S15a [Arabidopsis thaliana] 

- % Identity: 96.5 

- Alignment Length: 115 

- Location of Alignment in SEQ ID NO 1327: from 1 to 8 9 

Maximum Length Sequence: 

related to: 
Clone IDs: 

232976 

{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1328 

- Ceres seq_id 1597745 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1329 

- Ceres seq_id 1597746 

- Location of start within SEQ ID NO 1328: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1330 

- Ceres seq_id 1597747 

- Location of start within SEQ ID NO 1328: at 102 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1330: from 5 to 91 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 65 2 

- gi No. 3204129 

- Description: (AJ006768) histone H2A [Cicer arietinum] 

- % Identity: 93.2 

- Alignment Length: 8 8 

- Location of Alignment in SEQ ID NO 1330: from 5 to 91 

Maximum Length Sequence: 

related to: 
Clone IDs: 

233094 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1331 

- Ceres seq_id 1597748 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1332 

- Ceres seq_id 1597749 

- Location of start within SEQ ID NO 1331: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 
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- Eukaryotic ribosomal protein L18 

- Location within SEQ ID NO 1332: from 30 to 82 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 653 

- gi No. 1172977 

- Description: 60S RIBOSOMAL PROTEIN L18 >gi I 606970 (U15741) 
cytoplasmic ribosomal protein L18 [Arabidopsis thaliana] 

- % Identity: 84.1 

- Alignment Length: 63 

- Location of Alignment in SEQ ID NO 1332: from 20 to 82 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1333 

- Ceres seq_id 1597750 

- Location of start within SEQ ID NO 1331: at 59 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Eukaryotic ribosomal protein L18 

- Location within SEQ ID NO 1333: from 11 to 63 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 654 

- gi No. 1172977 

- Description: 60S RIBOSOMAL PROTEIN L18 >gi 1606970 (U15741) 
cytoplasmic ribosomal protein L18 [Arabidopsis thaliana] 

- % Identity: 84.1 

- Alignment Length: 63 

- Location of Alignment in SEQ ID NO 1333: from 1 to 63 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1334 

- Ceres seq_id 1597751 

- Location of start within SEQ ID NO 1331: at 249 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Eukaryotic ribosomal protein L18 

- Location within SEQ ID NO 1334: from 1 to 68 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 655 

- gi No. 1172977 

- Description: 60S RIBOSOMAL PROTEIN L18 >gi i 606970 (U15741) 
cytoplasmic ribosomal protein L18 [Arabidopsis thaliana] 

- % Identity: 7 6.8 

- Alignment Length: 69 

- Location of Alignment in SEQ ID NO 1334: from 1 to 68 

Maximum Length Sequence : 

related to: 
Clone IDs: 

233109 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1335 

- Ceres seq_id 1597752 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1336 

- Ceres seq_id 1597753 

- Location of start within SEQ ID NO 1335: at 2 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1337 

- Ceres seq_id 1597754 

- Location of start within SEQ ID NO 1335: at 81 nt • 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Zn-finger in Ran binding protein and others. 

- Location within SEQ ID NO 1337: from 4 to 31 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 65 6 

- gi No. 5726567 

- Description: (AF169205) glycine-rich RNA-binding protein [Glycine 

max] 

- % Identity: 85.7 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 1337: from 36 to 4 9 

Maximum Length Sequence: 

related to: 
Clone IDs: 

233186 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1338 

- Ceres seq_id 1597755 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1339 

- Ceres seq_id 1597756 

- Location of start within SEQ ID NO 1338: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 657 

- gi No. 1173055 

- Description: 60S RIBOSOMAL PROTEIN Lll (L5) 
>gi ! 541961 jpir I t S42497 ribosomal protein Lll.e - alfalfa 
>gi I 1076504 |pir M S51819 RL5 ribosomal protein - alfalfa 

>gi I 463252 | emb|CAA55090 | (X78284) RL5 ribosomal protein [Medicago sativa] 

- % Identity: 89.3 

- Alignment Length: 28 

- Location of Alignment in SEQ ID NO 1339: from 35 to 62 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1340 

- Ceres seq_id 1597757 

- Location of start within SEQ ID NO 1338: at 105 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 658 

- gi No. 1173055 

- Description: 60S RIBOSOMAL PROTEIN Lll (L5) 
>gi i 541961 jpir i | S42497 ribosomal protein Lll.e - alfalfa 
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>gi ! 1076504 |pir i | S51819 RL5 ribosomal protein - alfalfa 

>gi I 463252 | emb i CAA55090 | {X78284} RL5 ribosomal protein [Medicago sativa] 

- % Identity: 89.3 

- Alignment Length: 28 

- Location of Alignment in SEQ ID NO 1340: from 1 to 28 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1341 

- Ceres seq_id 1597758 

- Location of start within SEQ ID NO 1338: at 135 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 659 

- gi No. 1173055 

- Description: 60S RIBOSOMAL PROTEIN Lll (L5) 
>gi ] 541961 Ipir | i S42497 ribosomal protein Lll.e - alfalfa 
>gi I 1076504 Ipir | | S51819 RL5 ribosomal protein - alfalfa 

>gi I 463252 | emb 1 CAA55090 I (X78284) RL5 ribosomal protein [Medicago sativa] 

- % Identity: 89.3 

- Alignment Length: 28 

- Location of Alignment in SEQ ID NO 1341: from 1 to 18 

Maximum Length Sequence: 

related to: 
Clone IDs: 

238005 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1342 

- Ceres seq_id 1597797 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1343 

- Ceres seq_id 1597798 

- Location of start within SEQ ID NO 1342: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1344 

- Ceres seq_id 1597799 

- Location of start within SEQ ID NO 1342: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribosomal protein Lll 

- Location within SEQ ID NO 1344: from 43 to 100 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 6 60 

- gi No. 2677830 

- Description: (U93168) ribosomal protein L12 [Prunus armeniaca] 

- % Identity: 93.5 

- Alignment Length: 4 6 

- Location of Alignment in SEQ ID NO 1344: from 44 to 8 9 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1345 

- Ceres seq_id 1597800 
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- Location of start within SEQ ID NO 1342: at 58 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

238199 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1346 

- Ceres seq_id 1597805 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1347 

- Ceres seq_id 1597806 

- Location of start within SEQ ID NO 134 6: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S8 

- Location within SEQ ID NO 1347: from 35 to 121 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 661 

- gi No. 1173218 

- Description: 40S RIBOSOMAL PROTEIN S15A >gi 1440824 (L27461) 
ribosomal protein S15 [Arabidopsis thaliana] >gi 12150130 (AF001412) 
cytoplasmic ribosomal protein S15a [Arabidopsis thaliana] 

- % Identity: 93.4 

- Alignment Length: 91 

- Location of Alignment in SEQ ID NO 1347: from 31 to 121 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1348 

- Ceres seq_id 1597807 

- Location of start within SEQ ID NO 1346: at 93 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S8 

- Location within SEQ ID NO 1348: from 5 to 91 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 662 

- gi No. 1173218 

- Description: 40S RIBOSOMAL PROTEIN S15A >gi 1440824 (L27461) 
ribosomal protein S15 [Arabidopsis thaliana] >gi 12150130 (AF001412) 
cytoplasmic ribosomal protein S15a [Arabidopsis thaliana] 

- % Identity: 93.4 

- Alignment Length: 91 

- Location of Alignment in SEQ ID NO 134 8: from 1 to 91 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1349 

- Ceres seq_id 1597808 

- Location of start within SEQ ID NO 1346: at 132 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S8 
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- Location within SEQ ID NO 1349: from 1 to 78 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 663 

- gi No. 1173218 

- Description: 40S RIBOSOMAL PROTEIN S15A >gi| 440824 (L27461) 
ribosomal protein S15 [Arabidopsis thaliana] >gi 12150130 (AF001412) 
cytoplasmic ribosomal protein S15a [Arabidopsis thaliana] 

- % Identity: 93.4 

- Alignment Length: 91 

- Location of Alignment in SEQ ID NO 1349: from 1 to 78 

Maximum Length Sequence: 

related to: 
Clone IDs: 

238282 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1350 

- Ceres seq_id 1597809 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1351 

- Ceres seq_id 1597810 

- Location of start within SEQ ID NO 1350: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1352 

- Ceres seq_id 1597811 

- Location of start within SEQ ID NO 1350: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 664 

- gi No. 498902 

- Description: (U10044) ribosomal protein L27 homolog [Pisum 

sativum] 

- % Identity: 90 

- Alignment Length: 20 

- Location of Alignment in SEQ ID NO 1352: from 25 to 43 

Maximum Length Sequence: 

related to: 
Clone IDs: 

238494 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1353 

- Ceres seq_id 1597813 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1354 

- Ceres seq_id 1597814 

- Location of start within SEQ ID NO 1353: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 665 

- gi No. 2058273 
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- Description: (D83527) YK426 [Oryza sativa] 

- % Identity: 78.1 

- Alignment Length: 32 

- Location of Alignment in SEQ ID NO 1354: from 32 to 62 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1355 

- Ceres seq_id 1597815 

- Location of start within SEQ ID NO 1353: at 94 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 666 

- gi No. 2058273 

- Description: (D83527) YK426 [Oryza sativa] 

- % Identity: 78.1 

- Alignment Length: 32 

- Location of Alignment in SEQ ID NO 1355: from 1 to 31 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1356 

- Ceres seq_id 1597816 

- Location of start within SEQ ID NO 1353: at 263 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

238568 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1357 

- Ceres seq_id 1597821 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1358 

- Ceres seq_id 1597822 

- Location of start within SEQ ID NO 1357: at 88 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 667 

- gi No. 4090257 

- Description: (AJ131732) ribosomal protein L37A [Pseudotsuga 

menziesii] 

- % Identity: 91.4 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 1358: from 1 to 81 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1359 

- Ceres seq_id 1597823 

- Location of start within SEQ ID NO 1357: at 172 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 668 
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- gi No. 4090257 

- Description: (AJ131732) ribosomal protein L37A [Pseudotsuga 

menziesii] 

- % Identity: 91.4 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 135 9: from 1 to 53 

Maximum Length Sequence: 

related to: 
Clone IDs: 

238840 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1360 

- Ceres seq_id 1597825 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1361 

- Ceres seq_id 1597826 

- Location of start within SEQ ID NO 1360: at 3 nt . 

<C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin-conjugating enzyme 

- Location within SEQ ID NO 1361: from 48 to 137 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 669 

- gi No. 2668744 

- Description: (AF034946) ubiquitin conjugating enzyme [Zea mays] 

- % Identity: 100 

- Alignment Length: 90 

- Location of Alignment in SEQ ID NO 1361: from 48 to 137 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1362 

- Ceres seq__id 1597827 

- Location of start within SEQ ID NO 1360: at 144 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s ) 

- Ubiquitin-conjugating enzyme 

- Location within SEQ ID NO 1362: from 1 to 90 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 67 0 

- gi No. 2668744 

- Description: (AF034946) ubiquitin conjugating enzyme [Zea mays] 

- % Identity: 100 

- Alignment Length: 90 

- Location of Alignment in SEQ ID NO 1362: from 1 to 90 

Maximum Length Sequence: 

related to: 
Clone IDs: 

239101 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1363 

- Ceres seq_id 1597830 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1364 

- Ceres seq_id 1597831 

- Location of start within SEQ ID NO 1363: at 2 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1365 

- Ceres seq__id 1597832 

- Location of start within SEQ ID NO 1363: at 100 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribosomal protein S12e 

- Location within SEQ ID NO 1365: from 9 to 73 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1366 

- Ceres seq_id 1597833 

- Location of start within SEQ ID NO 1363: at 343 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 671 

- gi No. 4506683 

- Description: ref | NP_001007 . 1 j pRPS12 1 ribosomal protein S12 
>gi|133742isp|P25398|RS12_HUMAN 40S RIBOSOMAL PROTEIN S12 

>gi ! 70948 | pir | | R3HU12 ribosomal protein S12 - human >gi | 3 614 6 | emb j CAA37 582 | 
(X53505) ribosomal protein S12 [Homo sapiens] 

- % Identity: 73.9 

- Alignment Length: 23 

- Location of Alignment in SEQ ID NO 1366: from 11 to 33 

Maximum Length Sequence : 

related to: 
Clone IDs: 

239625 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1367 

- Ceres seq_id 1597848 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1368 

- Ceres seq_id 1597849 

- Location of start within SEQ ID NO 1367: at 143 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Inorganic pyrophosphatase 

- Location within SEQ ID NO 1368: from 52 to 112 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 67 2 

- gi No. 4033424 

- Description: SOLUBLE INORGANIC PYROPHOSPHATASE (PYROPHOSPHATE 
PHOSPHO-HYDROLASE) (PPASE) >gi 12668746 (AF034 947) inorganic pyrophosphatase 
[Zea mays] 

- % Identity: 99.1 

- Alignment Length: 113 

- Location of Alignment in SEQ ID NO 1368: from 1 to 112 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

239759 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1369 

- Ceres seq_id 1597850 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1370 

- Ceres seq_id 1597851 

- Location of start within SEQ ID NO 1369: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 673 

- gi No. 224970 

- Description: heat shock protein hsp70 [Zea mays] 

- % Identity: 88 

- Alignment Length: 25 

- Location of Alignment in SEQ ID NO 1370: from 94 to 118 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1371 

- Ceres seq_id 1597852 

- Location of start within SEQ ID NO 1369: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 67 4 

- gi No. 224970 

- Description: heat shock protein hsp70 [Zea mays] 

- % Identity: 92.9 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 1371: from 74 to 87 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1372 

- Ceres seq__id 1597853 

- Location of start within SEQ ID NO 1369: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Hsp70 protein 

- Location within SEQ ID NO 1372: from 8 to 66 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 67 5 

- gi No. 224970 

- Description: heat shock protein hsp70 [Zea mays] 

- % Identity: 91.9 

- Alignment Length: 37 

- Location of Alignment in SEQ ID NO 1372: from 1 to 37 



Maximum Length Sequence: 

related to: 
Clone IDs: 

239868 

(Ac) cDNA Polynucleotide Sequence 
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- Pat. Appln. SEQ ID NO 1373 

- Ceres seq_id 1597854 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1374 

- Ceres seq_id 1597855 

- Location of start within SEQ ID NO 1373: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Pollen allergen 

- Location within SEQ ID NO 1374: from 47 to 125 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1375 

- Ceres seq_id 1597856 

- Location of start within SEQ ID NO 1373: at 63 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Pollen allergen 

- Location within SEQ ID NO 1375: from 27 to 105 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 137 6 

- Ceres seq_id 1597857 

- Location of start within SEQ ID NO 1373: at 99 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Pollen allergen 

- Location within SEQ ID NO 137 6: from 15 to 93 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

237653 

(Ac) cDNA Polynucleotide Sequence 
- Pat. Appln. SEQ ID NO 1377 
_ ceres seq_id 1597871 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1378 

- Ceres seq_id 1597872 

- Location of start within SEQ ID NO 1377: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 67 6 

- gi No. 3745759 

- Description: Chain B, X-Ray Structure Of The Nucleosome Core 
Particle At 2 . 8 A Resolution >gi | 37 4 57 63 | pdb | 1AOI i F Chain F, X-Ray Structure 
Of The Nucleosome Core Particle At 2 . 8 A Resolution 

- % Identity: 93.8 

- Alignment Length: 16 

- Location of Alignment in SEQ ID NO 1378: from 47 to 62 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1379 

- Ceres seq_id 1597873 

- Location of start within SEQ ID NO 1377: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1379: from 29 to 110 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 677 

- gi No. 3745759 

- Description: Chain B, X-Ray Structure Of The Nucleosome Core 
Particle At 2.8 A Resolution >gi | 37 457 63 i pdb | 1AOI I F Chain F, X-Ray Structure 
Of The Nucleosome Core Particle At 2.8 A Resolution 

- % Identity: 89.8 

- Alignment Length: 4 9 

- Location of Alignment in SEQ ID NO 137 9: from 63 to 110 

Maximum Length Sequence: 

related to: 
Clone IDs: 

238756 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1380 

- Ceres seq_id 1597878 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1381 

- Ceres seq_id 1597879 

- Location of start within SEQ ID NO 1380: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1381: from 61 to 155 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 67 8 

- gi No. 1346251 

- Description: HISTONE H2B.4 >gi | 577819 ! emb i CAA4 9585 1 (X69961) H2B 
histone [Zea mays] 

- % Identity: 92.2 

- Alignment Length: 103 

- Location of Alignment in SEQ ID NO 1381: from 53 to 155 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 138 2 

- Ceres seq_id 1597880 

- Location of start within SEQ ID NO 1380: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Nuclear transition protein 2 

- Location within SEQ ID NO 1382: from 12 to 77 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 
related to: 
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Clone IDs: 

239917 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1383 

- Ceres seq__id 1597881 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1384 

- Ceres seq_id 1597882 

- Location of start within SEQ ID NO 1383: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 67 9 

- gi No. 730456 

- Description: 40S RIBOSOMAL PROTEIN SI 9 

- % Identity: 96.2 

- Alignment Length: 26 

- Location of Alignment in SEQ ID NO 1384: from 24 to 48 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1385 

- Ceres seq_id 1597883 

- Location of start within SEQ ID NO 1383: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

240016 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 138 6 

- Ceres seq_id 1597890 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1387 

- Ceres seq_id 1597891 

- Location of start within SEQ ID NO 1386: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1388 

- Ceres seq_id 1597892 

- Location of start within SEQ ID NO 1386: at 59 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 68 0 

- gi No. 3135543 

- Description: (AF062393) aquaporin [Oryza sativa] 

- % Identity: 86.1 

- Alignment Length: 36 

- Location of Alignment in SEQ ID NO 1388: from 1 to 36 
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Maximum Length Sequence: 
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related to: 
Clone IDs: 

240230 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1389 

- Ceres seq_id 1597900 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1390 

- Ceres seq_id 1597901 

- Location of start within SEQ ID NO 1389: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 681 

- gi No. 585963 

- Description: PROTEIN TRANSPORT PROTEIN SEC61 GAMMA SUBUNIT 

- % Identity: 97.4 

- Alignment Length: 3 9 

- Location of Alignment in SEQ ID NO 1390: from 57 to 95 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1391 

- Ceres seq_id 1597902 

- Location of start within SEQ ID NO 1389: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 682 

- gi No. 585963 

- Description: PROTEIN TRANSPORT PROTEIN SEC61 GAMMA SUBUNIT 

- % Identity: 90.3 

- Alignment Length: 31 

- Location of Alignment in SEQ ID NO 1391: from 27 to 57 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1392 

- Ceres seq_id 1597903 

- Location of start within SEQ ID NO 138 9: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

240401 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1393 

- Ceres seq_id 1597905 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1394 

- Ceres seq_id 1597906 

- Location of start within SEQ ID NO 1393: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1395 

- Ceres seq_id 1597907 

- Location of start within SEQ ID NO 1393: at 125 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s } 

(Dp) Related Amino Acid Sequences 

- Alignment No. 683 

- gi No. 3176668 

- Description: (AC004393) Similar to ribosomal protein L17 
gb|X62724 from Hordeum vulgare. ESTs gblZ34728, gb|F19974, gb}T75677 and 
gblZ33937 come from this gene. [Arabidopsis thaliana] 

- % Identity: 79.2 

- Alignment Length: 2 4 

- Location of Alignment in SEQ ID NO 1395: from 1 to 24 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1396 

- Ceres seq_id 1597908 

- Location of start within SEQ ID NO 1393: at 272 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

240703 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1397 

- Ceres seq_id 1597913 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1398 

- Ceres seq_id 1597914 

- Location of start within SEQ ID NO 1397: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1399 

- Ceres seq_id 1597915 

- Location of start within SEQ ID NO 1397: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) , 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domaxn) 

- Location within SEQ ID NO 1399: from 63 to 126 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1400 

- Ceres seq_id 1597916 

- Location of start within SEQ ID NO 1397: at 17 9 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 1400: from 4 to 67 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

240874 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1401 

- Ceres seq_id 1597917 
(B) Polypeptide Sequence 

- Pat, Appln. SEQ ID NO 1402 

- Ceres seq_id 1597918 

- Location of start within SEQ ID NO 1401: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 684 

- gi No. 4914432 

- Description: (AL050351) ribosomal protein S25 [Arabidopsis 

thaliana] 

- % Identity: 88.5 

- Alignment Length: 2 6 

- Location of Alignment in SEQ ID NO 1402: from 58 to 83 

Maximum Length Sequence: 

related to: 
Clone IDs: 

241414 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 14 03 

- Ceres seq_id 1597937 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1404 

- Ceres seq_id 1597938 

- Location of start within SEQ ID NO 1403: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- EF hand 

- Location within SEQ ID NO 1404: from 49 to 77 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 685 

- gi No. 20186 

- Description: (X65016) calmodulin [Oryza sativa] 

>gi i 3336950 | emb | CAA74307 1 (Y13974) calmodulin [Zeamays] >gi|4103961 
(AF030034) calmodulin [Phaseolus vulgaris] 

- % Identity: 99.2 

- Alignment Length: 125 

- Location of Alignment in SEQ ID NO 1404: from 38 to 161 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1405 

- Ceres seq_id 1597939 

- Location of start within SEQ ID NO 1403: at 114 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- EF hand 

- Location within SEQ ID NO 1405: from 12 to 40 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 68 6 

- gi No. 20186 

- Description: (X65016) calmodulin [Oryza sativa] 
>gi|3336950|emb|CAA74307| (Y13974) calmodulin [Zea mays] >gi 14103961 
(AF030034) calmodulin [Phaseolus vulgaris] 

- % Identity: 99.2 

- Alignment Length: 125 

- Location of Alignment in SEQ ID NO 1405: from 1 to 124 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1406 

- Ceres seq__id 1597940 

- Location of start within SEQ ID NO 1403: at 222 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 687 

- gi No. 20186 

- Description: (X65016) calmodulin [Oryza sativa] 

>gi I 3336950 |emb|CAA74307 | (Y13974) calmodulin [Zea mays] >gi[4103961 
(AF030034) calmodulin [Phaseolus vulgaris] 

- % Identity: 99.2 

- Alignment Length: 125 

- Location of Alignment in SEQ ID NO 1406: from 1 to 88 

Maximum Length Sequence: 

related to: 
Clone IDs: 

241555 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1407 

- Ceres seq_id 1597941 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1408 

- Ceres seq_id 1597942 

- Location of start within SEQ ID NO 1407: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- ATP synthase 

- Location within SEQ ID NO 1408: from 1 to 159 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 688 

- gi No. 543867 

- Description: ATP SYNTHASE GAMMA CHAIN, MITOCHONDRIAL PRECURSOR 
>gi | 1076684 | pir | jA47493 H+-transporting ATP synthase (EC 3.6.1.34) gamma 
chain precursor - sweet potato >gi | 303626 I dbj i BAA03526 ! (D14699) Fl-ATPase 
gammma subunit [Ipomoea batatas] 

- % Identity: 78 

- Alignment Length: 15 9 

- Location of Alignment in SEQ ID NO 1408: from 1 to 159 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 1409 

- Ceres seq_id 1597943 

- Location of start within SEQ ID NO 1407: at 188 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- ATP synthase 

- Location within SEQ ID NO 1409: from 1 to 97 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 68 9 

- gi No. 543867 

- Description: ATP SYNTHASE GAMMA CHAIN, MITOCHONDRIAL PRECURSOR 
>gi | 1076684 jpir MA47493 H+-transporting ATP synthase (EC 3.6.1.34) gamma 
chain precursor - sweet potato >gi | 303626 i dbj | BAA03526 | (D14699) Fl-ATPase 
gammma subunit [Ipomoea batatas] 

- % Identity: 78 

- Alignment Length: 159 

- Location of Alignment in SEQ ID NO 1409: from 1 to 97 

Maximum Length Sequence: 

related to: 
Clone IDs: 

242413 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1410 

- Ceres seq_id 1597958 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1411 

- Ceres seq__id 1597959 

- Location of start within SEQ ID NO 1410: at 87 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 690 

- gi No. 4741896 

- Description: (AF127042) 60S ribosomal protein L37a [Gossypium 

hirsutum] 

- % Identity: 90.7 

- Alignment Length: 5 4 

- Location of Alignment in SEQ ID NO 1411: from 1 to 54 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1412 

- Ceres seq_id 1597960 

- Location of start within SEQ ID NO 1410: at 171 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 691 

- gi No. 4741896 

- Description: (AF127042) 60S ribosomal protein L37a [Gossypium 

hirsutum] 

- % Identity: 90.7 

- Alignment Length: 5 4 

- Location of Alignment in SEQ ID NO 1412: from 1 to 26 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1413 
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- Ceres seq__id 1597961 

- Location of start within SEQ ID NO 1410: at 246 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

243107 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1414 

- Ceres seq_id 1597962 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1415 

- Ceres seq_id 1597963 

- Location of start within SEQ ID NO 1414: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 692 

- gi No. 585777 

- Description: GTP-BINDING NUCLEAR PROTEIN RANI >gi 1453561 (L28713) 
Ran protein/TC4 protein [Solanum lycopersicum] 

- % Identity: 96.4 

- Alignment Length: 28 

- Location of Alignment in SEQ ID NO 1415: from 36 to 62 

Maximum Length Sequence: 

related to: 
Clone IDs: 

243124 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1416 

- Ceres seq_id 1597964 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1417 

- Ceres seq_id 1597965 

- Location of start within SEQ ID NO 1416: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1418 

- Ceres seq_id 1597966 

- Location of start within SEQ ID NO 1416: at 69 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Peptidyl-prolyl cis-trans isomerase 

- Location within SEQ ID NO 1418: from 5 to 71 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 693 

- gi No. 118104 

- Description: PEPTIDYL-PROLYL CIS-TRANS ISOMERASE (PPIASE) 
(ROTAMASE) (CYCLOPHILIN) (CYCLOSPORIN A-BINDING PROTEIN) >gi | 68 4 08 ! pir | | CSZM 
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peptidylprolyl isomerase {EC 5.2.1.8) - maize >gi|168461 (M55021) cyclophilin 
[Zea mays] >gi | 829148 1 emb | CAA48638 I 

- % Identity: 95.8 

- Alignment Length: 7 2 

- Location of Alignment in SEQ ID NO 1418: from 1 to 71 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1419 

- Ceres seq_id 1597967 

- Location of start within SEQ ID NO 1416: at 96 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Peptidyl-prolyl cis-trans isomerase 

- Location within SEQ ID NO 1419: from 1 to 62 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 694 

- gi No. 118104 

- Description: PEPTIDYL-PROLYL CIS-TRANS ISOMERASE (PPIASE) 
(ROTAMASE) (CYCLOPHILIN) (CYCLOSPORIN A-BINDING PROTEIN) >gi | 68 4 08 i pir 1 | CSZM 
peptidylprolyl isomerase (EC 5.2.1.8} - maize >gi 1168461 (M55021) cyclophilin 
[Zea mays] >gi j 82 914 8 [ emb | CAA4 8 638 i 

- % Identity: 95.8 

- Alignment Length: 72 

- Location of Alignment in SEQ ID NO 1419: from 1 to 62 

Maximum Length Sequence: 

related to: 
Clone IDs; 

243616 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1420 

- Ceres seq_id 1597985 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1421 

- Ceres seq_id 1597986 

- Location of start within SEQ ID NO 1420: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1422 

- Ceres seq_id 1597987 

- Location of start within SEQ ID NO 1420: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1422: from 78 to 129 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 695 

- gi No. 1362185 

- Description: histone H2B123 - wheat >gi i 531052 | dbj | BAA07 158 | 
(D37 944) protein H2B123 [Triticum aestivum] 

- % Identity: 76.9 

- Alignment Length: 52 

- Location of Alignment in SEQ ID NO 1422: from 79 to 129 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1423 

- Ceres seq_id 1597988 

- Location of start within SEQ ID NO 1420: at 80 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3 /H4 

- Location within SEQ ID NO 1423: from 52 to 103 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 696 

- gi No. 1362185 

- Description: histone H2B123 - wheat >gi i 531052 | dbj | BAA07158 | 
(D37944) protein H2B123 [Triticum aestivum] 

- % Identity: 7 6.9 

- Alignment Length: 52 

- Location of Alignment in SEQ ID NO 1423: from 53 to 103 

Maximum Length Sequence: 

related to: 
Clone IDs: 

244329 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1424 

- Ceres seq_id 1598004 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1425 

- Ceres seq_id 1598005 

- Location of start within SEQ ID NO 1424: at 88 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 697 

- gi No. 4090257 

- Description: (AJ131732) ribosomal protein L37A [Pseudotsuga 

menziesii] 

- % Identity: 90.1 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 1425: from 1 to 81 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1426 

- Ceres seq_id 1598006 

- Location of start within SEQ ID NO 1424: at 172 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 698 

- gi No. 4090257 

- Description: (AJ131732) ribosomal protein L37A [Pseudotsuga 

menziesii] 

- % Identity: 90.1 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 1426: from 1 to 53 



Maximum Length Sequence : 
related to: 
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Clone IDs: 

244537 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1427 

- Ceres seq_id 1598007 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1428 

- Ceres seq_id 1598008 

- Location of start within SEQ ID NO 1427: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1429 

- Ceres seq__id 1598009 

- Location of start within SEQ ID NO 1427: at 136 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 1429: from 4 to 67 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 69 9 

- gi No. 2582643 

- Description: (AJ002377) RSZp21 protein [Arabidopsis thaliana] 

- % Identity: 72.7 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 1429: from 1 to 108 

Maximum Length Sequence: 

related to: 
Clone IDs: 

244958 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1430 

- Ceres seq_id 1598010 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1431 

- Ceres seq__id 1598011 

- Location of start within SEQ ID NO 1430: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 00 

- gi No. 347453 

- Description: {L22029) hydroxyproline-rich glycoprotein [Glyci 

max] 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 1431: from 27 to 40 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 32 

- Ceres seq_id 1598012 

- Location of start within SEQ ID NO 1430: at 3 nt, 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 01 

- gi No. 4521296 

- Description: (AB017909) myoK [Dictyostelium discoideum] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 1432: from 8 to 18 

Maximum Length Sequence: 

related to: 
Clone IDs: 

245999 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1433 

- Ceres seq_id 1598024 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1434 

- Ceres seq_id 1598025 

- Location of start within SEQ ID NO 1433: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 02 

- gi No. 4138732 

- Description: (Y17332) proline-rich protein [Zea mays] 

- % Identity: 100 

- Alignment Length: 3 9 

- Location of Alignment in SEQ ID NO 1434: from 77 to 114 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1435 

- Ceres seq_id 1598026 

- Location of start within SEQ ID NO 1433: at 109 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 03 

- gi No. 4138732 

- Description: (Y17332) proline-rich protein [Zea mays] 

- % Identity: 81.8 

- Alignment Length: 22 

- Location of Alignment in SEQ ID NO 1435: from 1 to 21 

Maximum Length Sequence: 

related to: 
Clone IDs: 

247147 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1436 

- Ceres seq_id 1598040 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1437 

- Ceres seq_id 1598041 

- Location of start within SEQ ID NO 1436: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- LIP family of ribosoirial proteins 

- Location within SEQ ID NO 1437: from 23 to 148 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 7 04 

- gi No. 1709970 

- Description: 60S RIBOSOMAL PROTEIN L10A 

- % Identity: 83.3 

- Alignment Length: 12 6 

- Location of Alignment in SEQ ID NO 1437: from 23 to 148 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1438 

- Ceres seq_id 1598042 

- Location of start within SEQ ID NO 1436: at 67 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- LIP family of ribosomal proteins 

- Location within SEQ ID NO 1438: from 1 to 126 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 7 05 

- gi No. 1709970 

- Description: 60S RIBOSOMAL PROTEIN L10A 

- % Identity: 83.3 

- Alignment Length: 12 6 

- Location of Alignment in SEQ ID NO 1438: from 1 to 126 

Maximum Length Sequence: 

related to: 
Clone IDs: 

247250 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1439 

- Ceres seq_id 1598043 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1440 

- Ceres seq_id 1598044 

- Location of start within SEQ ID NO 1439: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Proteasome A-type and B-type 

- Location within SEQ ID NO 1440: from 36 to 145 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 706 

- gi No. 2982322 

- Description: (AF051246) probable proteasome subunit [Picea 

mariana] 

- % Identity: 88.3 

- Alignment Length: 120 

- Location of Alignment in SEQ ID NO 1440: from 27 to 145 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 41 

- Ceres seq_id 1598045 

- Location of start within SEQ ID NO 1439: at 2 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1442 

- Ceres seq_id 1598046 

- Location of start within SEQ ID NO 1439: at 79 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Proteasome A-type and B-type 

- Location within SEQ ID NO 1442: from 10 to 119 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 07 

- gi No. 2982322 

- Description: (AF051246) probable proteasome subunit [Picea 

mariana] 

- % Identity: 88.3 

- Alignment Length: 120 

- Location of Alignment in SEQ ID NO 1442: from 1 to 119 

Maximum Length Sequence: 

related to: 
Clone IDs: 

247268 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1443 

- Ceres seq_id 1598047 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1444 

- Ceres seq_id 1598048 

- Location of start within SEQ ID NO 1443: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 708 

- gi No. 2554937 

- Description: (AF022151) homeobox protein [Callus gallus] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 1444: from 52 to 62 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1445 

- Ceres seq_id 1598049 

- Location of start within SEQ ID NO 1443: at 166 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1446 

- Ceres seq_id 1598050 

- Location of start within SEQ ID NO 1443: at 201 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

240057 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1447 

- Ceres seq_id 1598051 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1448 

- Ceres seq_id 1598052 

- Location of start within SEQ ID NO 1447: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 144 9 

- Ceres seq_id 1598053 

- Location of start within SEQ ID NO 1447: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1450 

- Ceres seq_id 1598054 

- Location of start within SEQ ID NO 1447: at 64 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 09 

- gi No. 122087 

- Description: HISTONE H3 >gi ! 81849 | pir | | S04520 histone H3 (clone 
pH3c-l) - alfalfa >gi | 82609 t pir | ] A26014 histone H3 - wheat 

>gi | 19607 | emb j CAA31964 | (X13673) histone H3 (AA 1-136) [Medicago sativa] 
>gi|19609iembiCAA31965| (X13674) histone H3 (AA 1-136) 

- % Identity: 98.5 

- Alignment Length: 133 

- Location of Alignment in SEQ ID NO 1450: from 1 to 34 

Maximum Length Sequence: 

related to: 
Clone IDs: 

240869 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1451 

- Ceres seq_id 1598059 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1452 

- Ceres seq_id 1598060 

- Location of start within SEQ ID NO 1451: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Hsp90 protein 

- Location within SEQ ID NO 1452: from 26 to 51 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 710 

- gi No. 417154 

- Description: HEAT SHOCK PROTEIN 82 >gi | 100685 | pir | j S25541 heat 
shock protein 82 - rice (strain Taichung Native One) >gi I 20256 | emb | CAA77 978 | 
(Z11920) heat shock protein 82 (HSP82) [Oryza sativa] 

- % Identity: 77 

- Alignment Length: 7 4 

- Location of Alignment in SEQ ID NO 1452: from 23 to 75 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1453 

- Ceres seq__id 1598061 

- Location of start within SEQ ID NO 1451: at 68 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Hsp90 protein 

- Location within SEQ ID NO 1453: from 4 to 29 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 711 

- gi No. 417154 

- Description: HEAT SHOCK PROTEIN 82 >gi | 100685 | pir | I S2554 1 heat 
shock protein 82 - rice (strain Taichung Native One) >gi i 20256 I emb | CAA77978 | 
(Z11920) heat shock protein 82 (HSP82) [Oryza sativa] 

- % Identity: 77 

- Alignment Length: 7 4 

- Location of Alignment in SEQ ID NO 1453: from 1 to 53 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1454 

- Ceres seq_id 1598062 

- Location of start within SEQ ID NO 1451: at 192 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

241163 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1455 

- Ceres seq_id 1598063 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1456 

- Ceres seq_id 1598064 

- Location of start within SEQ ID NO 1455: at 1 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 712 

- gi No. 4741896 

- Description: (AF127042) 60S ribosomal protein L37a [Gossypium 

hirsutum] 

- % Identity: 85.7 

- Alignment Length: 28 
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- Location of Alignment in SEQ ID NO 1456: from 48 to 74 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1457 

- Ceres seq_id 1598065 

- Location of start within SEQ ID NO 1455: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 713 

- gi No. 4741896 

- Description: (AF127042) 60S ribosomal protein L37a [Gossypium 

hirsutum] 

- % Identity: 95.2 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 1457: from 27 to 47 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1458 

- Ceres seq_id 1598066 

- Location of start within SEQ ID NO 1455: at 80 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 714 

- gi No. 4741896 

- Description: (AF127042) 60S ribosomal protein L37a [Gossypium 

hirsutum] 

- % Identity: 95.2 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 1458: from 1 to 21 

Maximum Length Sequence: 

related to: 
Clone IDs: 

241366 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1459 

- Ceres seq_id 1598071 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 60 

- Ceres seq__id 1598072 

- Location of start within SEQ ID NO 1459: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 61 

- Ceres seq_id 1598073 

- Location of start within SEQ ID NO 1459: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L6 

- Location within SEQ ID NO 1461: from 37 to 100 aa . 
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(Dp) Related Amino Acid Sequences 
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- Alignment No. 715 

- gi No. 2058273 

- Description: (D83527) YK426 [Oryza sativa] 

- % Identity: 81.3 

- Alignment Length: 80 

- Location of Alignment in SEQ ID NO 14 61: from 23 to 100 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 62 
_ ceres seq_id 1598074 

- Location of start within SEQ ID NO 1459: at 69 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L6 

- Location within SEQ ID NO 1462: from 15 to 78 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 716 

- gi No. 2058273 

- Description: (D83527) YK426 [Oryza sativa] 

- % Identity: 81.3 

- Alignment Length: 8 0 

- Location of Alignment in SEQ ID NO 14 62: from 1 to 78 

Maximum Length Sequence: 

related to: 
Clone IDs: 

243525 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1463 

- Ceres seq_id 1598092 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 64 

- Ceres seq_id 1598093 

- Location of start within SEQ ID NO 14 63: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 717 

- gi No. 3914899 

- Description: 40S RIBOSOMAL PROTEIN S4 >gi 12331301 (AF013487) 
ribosomal protein S4 type I [Zea mays] 

- % Identity: 89.7 

- Alignment Length: 2 9 

- Location of Alignment in SEQ ID NO 1464: from 31 to 59 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 65 

- Ceres seq_id 1598094 

- Location of start within SEQ ID NO 14 63: at 91 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 718 

- gi No. 3914899 

- Description: 40S RIBOSOMAL PROTEIN S4 >gi 12331301 (AF013487) 
ribosomal protein S4 type I [Zea mays] 

- % Identity: 89.7 
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- Alignment Length: 2 9 

- Location of Alignment in SEQ ID NO 14 65: from 1 to 29 



Maximum Length Sequence: 

related to: 
Clone IDs: 

244732 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1466 

- Ceres seq_id 1598108 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 67 

- Ceres seq_id 1598109 

- Location of start within SEQ ID NO 1466: at 124 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 14-3-3 proteins 

- Location within SEQ ID NO 1467: from 3 to 99 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 719 

- gi No. 1519251 

- Description: (U65957) GF14-C protein [Oryza sativa] 

- % Identity: 92 

- Alignment Length: 100 

- Location of Alignment in SEQ ID NO 14 67: from 1 to 99 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 68 

- Ceres seq_id 1598110 

- Location of start within SEQ ID NO 1466: at 148 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 14-3-3 proteins 

- Location within SEQ ID NO 14 68: from 1 to 91 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 7 20 

- gi No. 1519251 

- Description: (U65957) GF14-C protein [Oryza sativa] 

- % Identity: 92 

- Alignment Length: 100 

- Location of Alignment in SEQ ID NO 14 68: from 1 to 91 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1469 

- Ceres seq__id 1598111 

- Location of start within SEQ ID NO 1466: at 187 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 14-3-3 proteins 

- Location within SEQ ID NO 1469: from 1 to 78 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 721 

- gi No. 1519251 

- Description: (U65957) GF14-C protein [Oryza sativa] 

- % Identity: 92 
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- Alignment Length: 10 0 

- Location of Alignment in SEQ ID NO 14 69: from 1 to 78 



Maximum Length Sequence: 

related to: 
Clone IDs : 

245061 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1470 

- Ceres seq_id 1598115 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1471 

- Ceres seq_id 1598116 

- Location of start within SEQ ID NO 1470: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 22 

- gi No. 266944 

- Description: 60S RIBOSOMAL PROTEIN L2 (L8) (RIBOSOMAL PROTEIN 
TL2) >gi | 71078 |pir | | R5TOL8 ribosomal protein L8 7 cytosolic - tomato 

>gij 19343 j emb i CAA458 63 j (X64562) ribosomal protein L2 [Lycopersicon 
esculentum] 

- % Identity: 78.3 

- Alignment Length: 23 

- Location of Alignment in SEQ ID NO 1471: from 109 to 123 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1472 

- Ceres seq_id 1598117 

- Location of start within SEQ ID NO 1470: at 50 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 723 

- gi No. 266944 

- Description: 60S RIBOSOMAL PROTEIN L2 (L8) (RIBOSOMAL PROTEIN 
TL2) >gi i 71078 ipir[lR5TOL8 ribosomal protein L8, cytosolic - tomato 
>gi|19343|emb|CAA45863! (X64562) ribosomal protein L2 [Lycopersicon 

esculentum] 

- % Identity: 78.3 

- Alignment Length: 23 

- Location of Alignment in SEQ ID NO 1472: from 93 to 107 



Maximum Length Sequence: 

related to: 
Clone IDs: 

245595 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1473 

- Ceres seq_id 1598124 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1474 

- Ceres seq__id 1598125 

- Location of start within SEQ ID NO 1473: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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- Alignment No. 7 24 

- gi No. 464621 

- Description: 60S RIBOSOMAL PROTEIN L6 { YL16-LIKE) 
>gi | 280374 | pir 1 jS28586 ribosomal protein ML16 - common ice plant 

>gi [ 19539 | ernb | CAA4 9175 ! (X69378) ribosomal protein YL16 [Mesembryanthemum 
crystal linum] 

- % Identity: 86.7 

- Alignment Length; 15 

- Location of Alignment in SEQ ID NO 147 4: from 141 to 154 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1475 

- Ceres seq_id 1598126 

- Location of start within SEQ ID NO 1473: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Ribosomal protein L6e 

- Location within SEQ ID NO 1475: from 84 to 140 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1476 

- Ceres seq_id 1598127 

- Location of start within SEQ ID NO 1473: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

246296 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1477 

- Ceres seq_id 1598131 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1478 

- Ceres seq_id 1598132 

- Location of start within SEQ ID NO 1477: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 25 

- gi No. 2286111 

- Description: (U78891) MADS box protein [Oryza sativa] 

- % Identity: 73.3 

- Alignment Length: 15 

- Location of Alignment in SEQ ID NO 1478: from 58 to 71 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 147 9 

- Ceres seq_id 1598133 

- Location of start within SEQ ID NO 1477: at 186 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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Maximum Length Sequence : 

related to: 
Clone IDs: 

241417 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 14 80 

- Ceres seq_id 1598138 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1481 

- Ceres seq_id 1598139 

- Location of start within SEQ ID NO 1480: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 726 

- gi No. 2599072 

- Description: (AF028601) dihydrof lavonol 4~reductase [Ipomoea 

purpurea] 

- % Identity: 73.5 

- Alignment Length: 34 

- Location of Alignment in SEQ ID NO 1481: from 14 to 47 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1482 

- Ceres seq_id 1598140 

- Location of start within SEQ ID NO 1480: at 20 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 727 

- gi No. 2599072 

- Description: (AF028601) dihydrof lavonol 4-reductase [Ipomoea 

purpurea] 

- % Identity: 73.5 

- Alignment Length: 34 

- Location of Alignment in SEQ ID NO 1482: from 8 to 41 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1483 

- Ceres seq_id 1598141 

- Location of start within SEQ ID NO 1480: at 4 6 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

259077 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 14 8 4 

- Ceres seq__id 1598152 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 85 

- Ceres seq_id 15 98153 

- Location of start within SEQ ID NO 1484: at 2 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Clathrin adaptor complex small chain 

- Location within SEQ ID NO 1485: from 49 to 165 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No, 728 

- gi No. 3377820 

- Description: (AF076275) contains similarity to coatomer zeta 
chains [Arabidopsis thaliana] 

- % Identity: 75 

- Alignment Length: 112 

- Location of Alignment in SEQ ID NO 1485: from 55 to 165 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1486 

- Ceres seq_id 1598154 

- Location of start within SEQ ID NO 1484: at 116 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Clathrin adaptor complex small chain 

- Location within SEQ ID NO 1486: from 11 to 127 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 729 

- gi No. 3377820 

- Description: (AF07 6275) contains similarity to coatomer zeta 
chains [Arabidopsis thaliana] 

- % Identity: 75 

- Alignment Length: 112 

- Location of Alignment in SEQ ID NO 1486: from 17 to 127 

Maximum Length Sequence : 

related to: 
Clone IDs: 

259736 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1487 

- Ceres seq_id 1598163 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 8 8 

- Ceres seq_id 1598164 

- Location of start within SEQ ID NO 1487: at 45 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypept ide ( s ) 

- Ribosomal protein L36e 

~ Location within SEQ ID NO 1488: from 25 to 101 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1489 

- Ceres seq_id 1598165 

- Location of start within SEQ ID NO 1487: at 96 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L36e 

- Location within SEQ ID NO 1489: from 8 to 84 aa. 
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(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1490 

- Ceres seq_id 1598166 

- Location of start within SEQ ID NO 1487: at 165 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

- Ribosomal protein L36e 

- Location within SEQ ID NO 1490: from 1 to 61 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

260523 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 14 91 

- Ceres seq_id 1598171 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 92 

- Ceres seq_id 1598172 

- Location of start within SEQ ID NO 1491: at 141 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L19e 

- Location within SEQ ID NO 14 92: from 2 to 87 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 30 

- gi No. 2760155 

- Description: (AB010048) ribosomal protein L19 
[Schizosaccharomyces pombe] 

- % Identity: 73.7 

- Alignment Length: 7 6 

- Location of Alignment in SEQ ID NO 1492: from 8 to 83 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1493 

- Ceres seq^id 1598173 

- Location of start within SEQ ID NO 14 91: at 220 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L19e 

- Location within SEQ ID NO 14 93: from 5 to 88 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 94 

- Ceres seq_id 1598174 

- Location of start within SEQ ID NO 1491: at 280 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L19e 
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- Location within SEQ ID NO 14 94: from 1 to 68 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

264188 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 14 95 

- Ceres seq_id 1598193 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1496 

- Ceres seq_id 1598194 

- Location of start within SEQ ID NO 1495: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 97 

- Ceres seq__id 1598195 

- Location of start within SEQ ID NO 1495: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1497: from 13 to 71 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 731 

- gi No. 1154954 

- Description: (X94693) histone H2A [Triticum aestivum] 

- % Identity: 83.7 

- Alignment Length: 43 

- Location of Alignment in SEQ ID NO 14 97: from 29 to 71 

Maximum Length Sequence : 

related to: 
Clone IDs: 

273130 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1498 

- Ceres seq_id 1598207 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 14 99 

- Ceres seq_id 1598208 

- Location of start within SEQ ID NO 1498: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 32 

- gi No. 4666287 

- Description: (D85764) cytosolic monodehydroascorbate reductase 
[Oryza sativa] 

- % Identity: 92.5 

- Alignment Length: 161 

- Location of Alignment in SEQ ID NO 14 99: from 1 to 160 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

273521 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1500 

- Ceres seq_id 1598213 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1501 

- Ceres seq_id 1598214 

- Location of start within SEQ ID NO 1500: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 33 

- gi No. 1326372 

- Description: (U58750) Similar to Histone. [Caenorhabditis 

elegans] 

- % Identity: 82.2 

- Alignment Length: 45 

- Location of Alignment in SEQ ID NO 1501: from 18 to 61 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1502 

- Ceres seq_id 1598215 

- Location of start within SEQ ID NO 1500: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1503 

- Ceres seq_id 1598216 

- Location of start within SEQ ID NO 1500: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

273754 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1504 

- Ceres seq_id 1598217 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1505 

- Ceres seq_id 1598218 

- Location of start within SEQ ID NO 1504: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 34 

- gi No. 498902 

- Description: (U10044) ribosomal protein L27 homolog [Pisum 

sativum] 

- % Identity: 89.5 
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- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 1505: from 23 to 40 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1506 

- Ceres seq_id 1598219 

- Location of start within SEQ ID NO 1504: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s } 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

273880 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1507 

- Ceres seq__id 1598224 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1508 

- Ceres seq_id 1598225 

- Location of start within SEQ ID NO 1507: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 35 

- gi No. 4038471 

- Description: (AF111029) 40S ribosomal protein S27 homolog [Zea 

mays] 

- % Identity: 100 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 1508: from 34 to 119 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1509 

- Ceres seq_id 1598226 

- Location of start within SEQ ID NO 1507: at 102 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 36 

- gi No. 4038471 

- Description: (AF111029) 40S ribosomal protein S27 homolog [Zea 

mays] 

- % Identity: 100 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 1509: from 1 to 8 6 

Maximum Length Sequence: 

related to: 
Clone IDs: 

274064 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1510 

- Ceres seq_id 1598227 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1511 

- Ceres seq_id 1598228 
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- Location of start within SEQ ID NO 1510: at 145 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S12e 

- Location within SEQ ID NO 1511: from 1 to 93 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 37 

- gi No. 5106775 

- Description: (AF067732) ribosomal protein S12 [Hordeum vulgare] 

- % Identity: 87.9 

- Alignment Length: 9 9 

- Location of Alignment in SEQ ID NO 1511: from 1 to 97 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1512 

- Ceres seq_id 1598229 

- Location of start within SEQ ID NO 1510: at 154 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S12e 

- Location within SEQ ID NO 1512: from 1 to 90 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 7 38 

- gi No. 5106775 

- Description: (AF067732) ribosomal protein S12 [Hordeum vulgare] 

- % Identity: 87.9 

- Alignment Length: 99 

- Location of Alignment In SEQ ID NO 1512: from 1 to 94 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1513 

- Ceres seq_id 1598230 

- Location of start within SEQ ID NO 1510: at 175 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S12e 

- Location within SEQ ID NO 1513: from 1 to 83 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 739 

- gi No. 5106775 

- Description: (AF067732) ribosomal protein S12 [Hordeum vulgare] 

- % Identity: 87.9 

- Alignment Length: 99 

- Location of Alignment in SEQ ID NO 1513: from 1 to 87 



Maximum Length Sequence : 

related to: 
Clone IDs: 

257748 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1514 

- Ceres seq_id 1598235 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1515 
~ Ceres seq_id 1598236 
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- Location of start within SEQ ID NO 1514: at 102 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L14 

- Location within SEQ ID NO 1515: from 19 to 85 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 74 0 

- gi No. 4574244 

- Description: (AF108726) ribosomal protein L17 [Tortula ruralis] 

- % Identity: 88.2 

- Alignment Length: 8 5 

- Location of Alignment in SEQ ID NO 1515: from 1 to 85 

{ B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1516 

- Ceres seq_id 1598237 

- Location of start within SEQ ID NO 1514: at 147 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L14 

- Location within SEQ ID NO 1516: from 4 to 70 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 41 

- gi No. 4574244 

- Description: (AF108726) ribosomal protein L17 [Tortula ruralis] 

- % Identity: 88.2 

- Alignment Length: 85 

- Location of Alignment in SEQ ID NO 1516: from 1 to 70 

Maximum Length Sequence: 

related to: 
Clone IDs: 

262196 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1517 

- Ceres seq_id 1598252 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1518 

- Ceres seq_id 1598253 

- Location of start within SEQ ID NO 1517: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1519 

- Ceres seq_id 1598254 

- Location of start within SEQ ID NO 1517: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1520 

- Ceres seq_id 1598255 
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- Location of start within SEQ ID NO 1517: at 75 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ubiquit in-conjugating enzyme 

- Location within SEQ ID NO 1520: from 1 to 66 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 42 

- gi No. 464986 

- Description: UBIQUITIN-CONJUGATING ENZYME E2-17 KD 9 (UBIQUITIN- 
PROTEIN LIGASE 9) (UBIQUITIN CARRIER PROTEIN 9} (UBCAT4B) 

>gi I 421857 ipir | i S32674 ubiquitin — protein ligase {EC 6.3.2.19) UBC9 - 
Arabidopsis thaliana >gi i 297884 j emb | CAA78714 [ 

- % Identity: 95.5 

- Alignment Length: 66 

- Location of Alignment in SEQ ID NO 1520: from 1 to 66 

Maximum Length Sequence: 

related to: 
Clone IDs: 

263496 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1521 

- Ceres seq_id 1598257 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1522 

- Ceres seq_id 1598258 

- Location of start within SEQ ID NO 1521: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

{Dp) Related Amino Acid Sequences 

- Alignment No. 7 43 

- gi No. 3745759 

- Description: Chain B, X-Ray Structure Of The Nucleosome Core 
Particle At 2 . 8 A Resolution >gi I 37 4 57 63 j pdb | 1AOI 1 F Chain F, X-Ray Structure 
Of The Nucleosome Core Particle At 2.8 A Resolution 

- % Identity: 100 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 1522: from 47 to 60 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1523 

- Ceres seq_id 1598259 

- Location of start within SEQ ID NO 1521: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

265940 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1524 

- Ceres seq__id 1598271 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1525 

- Ceres seq_id 1598272 
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- Location of start within SEQ ID NO 1524: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 74 4 

- gi No. 399131 

- Description: BRAIN- SPECIFIC HOMEOBOX/POU DOMAIN PROTEIN 1 (BRN 
PROTEIN) >gi| 423400 Ipirj IS31223 transcription factor brain-1 - mouse 

>gi | 200445 (M88299) Brain-1 class III POU-domain protein [Mus musculus] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 1525: from 9 to 19 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1526 

- Ceres seq_id 1598273 

- Location of start within SEQ ID NO 1524: at 96 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 745 

- gi No. 464705 

- Description: 40S RIBOSOMAL PROTEIN S13 >gi 1 4 1 98 02 1 pir i i S3014 6 
ribosomal protein S13.e - maize >gi I 288059 | emb i CAA44311 | (X62455) 
cytoplasmatic ribosomal protein S13 [Zea mays] 

- % Identity: 98.9 

- Alignment Length: 93 

- Location of Alignment in SEQ ID NO 1526: from 1 to 92 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1527 

- Ceres seq_id 1598274 

- Location of start within SEQ ID NO 1524: at 105 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 74 6 

- gi No. 464705 

- Description: 40S RIBOSOMAL PROTEIN S13 >gi i 419802 | pir | | S3014 6 
ribosomal protein S13.e - maize >gi I 288059 | emb | CAA4 4311 1 (X62455) 
cytoplasmatic ribosomal protein S13 [Zea mays] 

- % Identity: 98.9 

- Alignment Length: 93 

- Location of Alignment in SEQ ID NO 1527: from 1 to 89 

Maximum Length Sequence: 

related to: 
Clone IDs: 

275215 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1528 

- Ceres seq_id 1598290 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1529 

- Ceres seq__id 1598291 

- Location of start within SEQ ID NO 1528: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 74 7 

- gi No. 1326372 

- Description: (U58750) Similar to Histone. [Caenorhabditis 

elegans] 

- % Identity: 81.6 

- Alignment Length: 38 

- Location of Alignment in SEQ ID NO 1529: from 22 to 58 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1530 

- Ceres seq_id 1598292 

- Location of start within SEQ ID NO 1528: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1531 

- Ceres seq_id 1598293 

- Location of start within SEQ ID NO 1528: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

274888 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1532 

- Ceres seq_id 1598318 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1533 

- Ceres seq_id 1598319 

- Location of start within SEQ ID NO 1532: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Proteasome A-type and B-type 

- Location within SEQ ID NO 1533: from 61 to 171 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 748 

- gi No. 3138799 

- Description: (AB014058) beta 6 subunit of 20S proteasome [Oryza 

sativa] 

- % Identity: 95.2 

- Alignment Length: 12 6 

- Location of Alignment in SEQ ID NO 1533: from 47 to 171 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1534 

- Ceres seq_id 1598320 

- Location of start within SEQ ID NO 1532: at 140 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Proteasome A-type and B-type 

- Location within SEQ ID NO 1534: from 15 to 125 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 74 9 

- gi No. 3138799 

- Description: (AB014058) beta 6 subunit of 20S proteasome [Oryza 

sativa] 

- % Identity: 95.2 

- Alignment Length: 12 6 

- Location of Alignment in SEQ ID NO 1534: from 1 to 125 

Maximum Length Sequence : 

related to: 
Clone IDs: 

275177 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1535 

- Ceres seq_id 1598321 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1536 

- Ceres seq_id 1598322 

- Location of start within SEQ ID NO 1535: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 50 

- gi No. 2119012 

- Description: histone 3.3A - chicken >gi[ 211851 (M11667) histone 
3.3A [Callus gallus] 

- % Identity: 75 

- Alignment Length: 16 

- Location of Alignment in SEQ ID NO 1536: from 35 to 50 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1537 

- Ceres seq_id 1598323 

- Location of start within SEQ ID NO 1535: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 51 

- gi No. 70760 

- Description: histone H3 . 4 - mouse 

- % Identity: 93.8 

- Alignment Length: 16 

- Location of Alignment in SEQ ID NO 1537: from 44 to 58 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1538 

- Ceres seq_id 1598324 

- Location of start within SEQ ID NO 1535: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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Maximum Length Sequence : 

related to: 
Clone IDs: 

278243 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1539 

- Ceres seq_id 1598333 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1540 

- Ceres seq_id 1598334 

- Location of start within SEQ ID NO 1539: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 752 

- gi No. 2668750 

- Description: (AF034949) ribosomal protein L30 [Zea mays] 

- % Identity: 98.2 

- Alignment Length: 112 

- Location of Alignment in SEQ ID NO 1540: from 33 to 14 4 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 15 41 

- Ceres seq__id 1598335 

- Location of start within SEQ ID NO 1539: at 99 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 53 

- gi No. 2668750 

- Description: (AF034949) ribosomal protein L30 [Zea mays] 

- % Identity: 98.2 

- Alignment Length: 112 

- Location of Alignment in SEQ ID NO 1541: from 1 to 112 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1542 

- Ceres seq_id 1598336 

- Location of start within SEQ ID NO 1539: at 162 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 75 4 

- gi No. 2668750 

- Description: (AF034949) ribosomal protein L30 [Zea mays] 

- % Identity: 98.2 

- Alignment Length: 112 

- Location of Alignment in SEQ ID NO 1542: from 1 to 91 

Maximum Length Sequence: 

related to: 
Clone IDs: 

280953 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1543 

- Ceres seq_id 1598414 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 154 4 
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- Ceres seq_id 1598415 

- Location of start within SEQ ID NO 1543: at 154 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal S17 

- Location within SEQ ID NO 1544: from 2 to 78 aa. 

{Dp) Related Amino Acid Sequences 

- Alignment No. 7 55 

- gi No. 1350944 

- Description: 40S RIBOSOMAL PROTEIN S17 

- % Identity: 91 

- Alignment Length: 7 8 

- Location of Alignment in SEQ ID NO 1544: from 1 to 78 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1545 

- Ceres seq_id 1598416 

- Location of start within SEQ ID NO 1543: at 223 nt. 

{C} Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal S17 

- Location within SEQ ID NO 1545: from 1 to 55 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 75 6 

- gi No. 1350944 

- Description: 40S RIBOSOMAL PROTEIN S17 

- % Identity: 91 

- Alignment Length: 7 8 

- Location of Alignment in SEQ ID NO 1545: from 1 to 55 



Maximum Length Sequence: 

related to: 
Clone IDs: 

281155 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 154 6 

- Ceres seq_id 1598417 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1547 

- Ceres seq_id 1598418 

- Location of start within SEQ ID NO 154 6: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 757 

- gi No. 1170403 

- Description: SPERM PROTAMINE PI >gi 1548216 (L32753) protamine 
[Planigale ingrami] 

- % Identity: 73.3 

- Alignment Length: 15 

- Location of Alignment in SEQ ID NO 1547: from 121 to 135 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1548 

- Ceres seq_id 1598419 

- Location of start within SEQ ID NO 1546: at 134 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 58 

- gi No. 1170403 

- Description: SPERM PROTAMINE PI >gi 1548216 (L32753) protamine PI 
[Planigale ingrami] 

- % Identity: 73.3 

- Alignment Length: 15 

- Location of Alignment in SEQ ID NO 1548: from 77 to 91 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1549 

- Ceres seq_id 1598420 

- Location of start within SEQ ID NO 1546: at 205 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

281685 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1550 

- Ceres seq_id 1598425 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1551 

- Ceres seq_id 1598426 

- Location of start within SEQ ID NO 1550: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

- Copper/ zinc superoxide dismutase (SODC) 

- Location within SEQ ID NO 1551: from 52 to 165 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 759 
» gi No. 134613 

- Description: SUPEROXIDE DISMUTASE-2 (CU-ZN) >gi i 82727 j pir ! 1 A29077 
superoxide dismutase (EC 1.15.1.1) (Cu-Zn) 2 - maize >gi i 168620 (M54936) 
superoxide dismutase 2 [Zea mays] >gi 1168622 (M15175) SOD2 protein [Zea mays] 

- % Identity: 99.1 

- Alignment Length: 116 

- Location of Alignment in SEQ ID NO 1551: from 51 to 165 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1552 

- Ceres seq_id 1598427 

- Location of start within SEQ ID NO 1550: at 151 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Copper/ zinc superoxide dismutase (SODC) 

- Location within SEQ ID NO 1552: from 2 to 115 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 60 

- gi No. 134613 
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- Description: SUPEROXIDE DISMUTASE-2 (CU-ZN) >gi | 82727 | pir | | A2 9077 
superoxide dismutase {EC 1.15.1.1) (Cu-Zn) 2 - maize >gi|168620 (M54936) 
superoxide dismutase 2 [Zea mays] >gi 1168622 (M15175) SOD2 protein [Zea mays] 

- % Identity: 99.1 

- Alignment Length: 116 

- Location of Alignment in SEQ ID NO 1552: from 1 to 115 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

283076 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1553 

- Ceres seq_id 1598452 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1554 

- Ceres seq_id 1598453 

- Location of start within SEQ ID NO 1553: at 1 nt * 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S21e 

- Location within SEQ ID NO 1554: from 36 to 116 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 7 61 

- gi No. 2500497 

- Description: 40S RIBOSOMAL PROTEIN S21 

>gi | 1419372 |emb| CAA67225.1 [ (X98656) ribosomal protein S21 [Zea mays] 

- % Identity: 97.5 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 1554: from 36 to 116 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1555 

- Ceres seq_id 1598454 

- Location of start within SEQ ID NO 1553: at 106 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S21e 

- Location within SEQ ID NO 1555: from 1 to 81 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 7 62 

- gi No. 2500497 

- Description: 40S RIBOSOMAL PROTEIN S21 

>gi| 1419372 |emb| CAA67225.il (X98656) ribosomal protein S21 [Zea mays] 

- % Identity: 97.5 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 1555: from 1 to 81 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1556 

- Ceres seq_id 1598455 

- Location of start within SEQ ID NO 1553: at 140 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

283255 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1557 

- Ceres seq_id 1598456 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1558 

- Ceres seq_id 1598457 

- Location of start within SEQ ID NO 1557: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1559 

- Ceres seq_id 1598458 

- Location of start within SEQ ID NO 1557: at 57 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Eukaryotic ribosomal protein L18 

- Location within SEQ ID NO 1559: from 11 to 118 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 63 

- gi No. 1172977 

- Description: 60S RIBOSOMAL PROTEIN L18 >gi ! 606970 (U15741) 
cytoplasmic ribosomal protein L18 [Arabidopsis thaliana] 

- % Identity: 77.1 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 1559: from 1 to 118 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1560 

- Ceres seq__id 1598459 

- Location of start within SEQ ID NO 1557: at 216 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Eukaryotic ribosomal protein L18 

- Location within SEQ ID NO 1560: from 1 to 65 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 64 

- gi No. 1172977 

- Description: 60S RIBOSOMAL PROTEIN L18 >gi i 606970 (U15741) 
cytoplasmic ribosomal protein L18 [Arabidopsis thaliana] 

- % Identity: 77.1 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 1560: from 1 to 65 

Maximum Length Sequence: 

related to: 
Clone IDs: 

283568 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1561 

- Ceres seq_id 1598463 
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( B ) Polypept ide Sequence 

- Pat. Appln. SEQ ID NO 1562 

- Ceres seq_id 1598464 

- Location of start within SEQ ID NO 1561: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1563 

- Ceres seq_id 1598465 

- Location of start within SEQ ID NO 1561: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1564 

- Ceres seq_id 1598466 

- Location of start within SEQ ID NO 1561: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 65 

- gi No. 1351222 

- Description: TRANSCRIPTION INITIATION FACTOR IIB (TFIIB) 
>gi | 945087 (U31097) transcription factor TFIIB [Glycine max] 

- % Identity: 73.9 

- Alignment Length: 23 

- Location of Alignment in SEQ ID NO 1564: from 54 to 7 6 

Maximum Length Sequence : 

related to: 
Clone IDs: 

283801 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1565 

- Ceres seq_id 1598469 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1566 

- Ceres seq_id 1598470 

- Location of start within SEQ ID NO 1565: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1567 

- Ceres seq__id 1598471 

- Location of start within SEQ ID NO 1565: at 156 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 66 

- gi No. 2493852 
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- Description: CYTOCHROME C OXIDASE POLYPEPTIDE VC 

>gi ! 107 035 6 i emb i CAA92107 j (Z68091) cytochrome c oxidase, Vc subunit [Hordeum 
vulgare] 

- % Identity: 96.8 

- Alignment Length: 63 

- Location of Alignment in SEQ ID NO 1567: from 1 to 63 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1568 

- Ceres seq_id 1598472 

- Location of start within SEQ ID NO 1565: at 301 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

285132 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1569 

- Ceres seq__id 1598480 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 157 0 

- Ceres seq_id 1598481 

- Location of start within SEQ ID NO 1569: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1571 

- Ceres seq__id 1598482 

- Location of start within SEQ ID NO 1569: at 132 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 67 

- gi No. 453670 

- Description: (L28712) heat shock protein 26 [Zea mays] 
>gi | 227776 Iprf | j 17 10350Aheat shock protein 26 [Zea mays] 

- % Identity: 100 

- Alignment Length: 67 

- Location of Alignment in SEQ ID NO 1571: from 1 to 66 

Maximum Length Sequence: 

related to: 
Clone IDs : 

285295 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1572 

- Ceres seq_id 1598487 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1573 

- Ceres seq__id 1598488 

- Location of start within SEQ ID NO 1572: at 129 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 68 

- gi No. 4090257 

- Description: (AJ131732) ribosomal protein L37A [Pseudotsuga 

menziesii] 

- % Identity: 89.5 

- Alignment Length: 57 

- Location of Alignment in SEQ ID NO 1573: from 12 to 68 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1574 

- Ceres seq_id 1598489 

- Location of start within SEQ ID NO 1572: at 141 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 69 

- gi No. 4090257 

- Description: (AJ131732) ribosomal protein L37A [Pseudotsuga 

menziesii] 

- % Identity: 89.5 

- Alignment Length: 57 

- Location of Alignment in SEQ ID NO 1574: from 8 to 64 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1575 

- Ceres seq_id 1598490 

- Location of start within SEQ ID NO 1572: at 171 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 77 0 

- gi No. 4090257 

- Description: (AJ131732) ribosomal protein L37A [Pseudotsuga 

menziesii] 

- % Identity: 89.5 

- Alignment Length: 57 

- Location of Alignment in SEQ ID NO 1575: from 1 to 54 

Maximum Length Sequence: 

related to: 
Clone IDs: 

280697 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 157 6 

- Ceres seq_id 1598509 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1577 

- Ceres seq_id 1598510 

- Location of start within SEQ ID NO 1576: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 771 

- gi No. 2760330 

- Description: (AC002130) F1N21.15 [Arabidopsis thaliana] 
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- % Identity: 72.6 

- Alignment Length: 62 

- Location of Alignment in SEQ ID NO 1577: from 100 to 160 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1578 

- Ceres seq_id 1598511 

- Location of start within SEQ ID NO 1576: at 117 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

280895 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1579 

- Ceres seq_id 1598512 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1580 

- Ceres seq_id 1598513 

- Location of start within SEQ ID NO 1579: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1581 

- Ceres seq_id 1598514 

- Location of start within SEQ ID NO 1579: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 772 

- gi No. 2765837 

- Description: (Z96936) NAP1 6kDa protein [Arabidopsis thaliana] 

- % Identity: 70.8 

- Alignment Length: 24 

- Location of Alignment in SEQ ID NO 1581: from 49 to 71 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1582 

- Ceres seq_id 1598515 

- Location of start within SEQ ID NO 157 9: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

280910 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1583 

- Ceres seq_id 1598516 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 1584 

- Ceres seq_id 1598517 

- Location of start within SEQ ID NO 1583: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1585 
_ ceres seq_id 1598518 

- Location of start within SEQ ID NO 1583: at 179 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal L10 

- Location within SEQ ID NO 1585: from 1 to 55 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 77 3 

- gi No. 2500354 

- Description: 60S RIBOSOMAL PROTEIN L10 (EQM) 

>gi i 1902894 | dbj IBAA19462 | (AB001891) QM family protein [Solarium melongena] 

- % Identity: 94.5 

- Alignment Length: 91 

- Location of Alignment in SEQ ID NO 1585: from 1 to 55 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1586 

- Ceres seq_id 1598519 

- Location of start within SEQ ID NO 1583: at 293 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 77 4 

- gi No. 2500354 

- Description: 60S RIBOSOMAL PROTEIN L10 (EQM) 

>gi j 1902894 |dbj 1BAA194621 (AB001891) QM family protein [Solanum melongena] 

- % Identity: 94.5 

- Alignment Length: 91 

- Location of Alignment in SEQ ID NO 1586: from 1 to 17 

Maximum Length Sequence: 

related to: 
Clone IDs : 

282931 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1587 

- Ceres seq__id 1598528 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1588 

- Ceres seq_id 1598529 

- Location of start within SEQ ID NO 1587: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1589 
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- Ceres seq__icl 1598530 

- Location of start within SEQ ID NO 1587: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Chlorophyll A-B binding proteins 

- Location within SEQ ID NO 1589: from 48 to 168 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 775 

- gi No. 115771 

- Description: CHLOROPHYLL A-B BINDING PROTEIN OF LHCII TYPE I 
PRECURSOR (CAB-1) (LHCP) >gi i 82 682 | pir | | S04 4 53 chlorophyll a/b-binding 
protein precursor - maize >gi [ 2222 4 | emb | CAA32 900 I (X14794) chlorophyll a/b- 
binding preprotein (AA 1 - 262) [Zea mays] 

- % Identity: 100 

- Alignment Length: 151 

- Location of Alignment in SEQ ID NO 1589: from 19 to 168 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1590 

- Ceres seq__id 1598531 

- Location of start within SEQ ID NO 1587: at 56 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Chlorophyll A-B binding proteins 

- Location within SEQ ID NO 1590: from 30 to 150 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 77 6 

- gi No. 115771 

- Description: CHLOROPHYLL A-B BINDING PROTEIN OF LHCII TYPE I 
PRECURSOR (CAB-1) (LHCP) >gi I 82682 | pir | i S04453 chlorophyll a/b-binding 
protein precursor - maize >gi I 22224 | emb | CAA32900 i (X14794) chlorophyll a/b- 
binding preprotein (AA 1 - 2 62) [Zea mays] 

- % Identity: 100 

- Alignment Length: 151 

- Location of Alignment in SEQ ID NO 1590: from 1 to 150 

Maximum Length Sequence: 

related to: 
Clone IDs: 

284175 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1591 

- Ceres seq_id 1598535 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1592 

- Ceres seq_id 1598536 

- Location of start within SEQ ID NO 1591: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 77 

- gi No. 462387 

- Description: IMMEDIATE-EARLY PROTEIN IE180 >gi t 4187 07 1 pir | | A45344 
immediate-early protein - suid herpesvirus 1 (strain Kaplan) >gi 1 334071 
(M34651) immediate-early protein [Pseudorabies virus] 

- % Identity: 70.6 
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- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 1592: from 21 to 37 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1593 

- Ceres seq_id 1598537 

- Location of start within SEQ ID NO 1591: at 101 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

285729 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1594 

- Ceres seq_id 1598551 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 15 95 

- Ceres seq_id 1598552 

- Location of start within SEQ ID NO 1594: at 93 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 77 8 

- gi No. 4090257 

- Description: (AJ131732) ribosomal protein L37A [Pseudotsuga 

menziesii] 

- % Identity: 92.6 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 15 95: from 1 to 81 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 15 9 6 

- Ceres seq__id 1598553 

- Location of start within SEQ ID NO 1594: at 177 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 77 9 

- gi No. 4090257 

- Description: (AJ131732) ribosomal protein L37A [Pseudotsuga 

menziesii] 

- % Identity: 92.6 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 1596: from 1 to 53 

Maximum Length Sequence: 

related to: 
Clone IDs: 

286274 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1597 

- Ceres seq_id 1598560 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1598 

- Ceres seq_id 1598561 
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- Location of start within SEQ ID NO 1597: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EF hand 

- Location within SEQ ID NO 1598: from 86 to 114 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 780 

- gi No. 115511 

- Description: CALMODULIN >gi I 231 68 2 | sp | P2 9612 | CALM_ORYSA 
CALMODULIN >gi | 7 1 682 ! pir || MCBH calmodulin - barley >gi I 100666 | pir || S24 952 
calmodulin 1 (clone lambda DASH) - rice >gi | 20188 i emb | CAA78287 i (Z12827) 
calmodulin [Oryza sativa] 

- % Identity: 100 

- Alignment Length: 104 

- Location of Alignment in SEQ ID NO 1598: from 39 to 142 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1599 

- Ceres seq_id 1598562 

- Location of start within SEQ ID NO 1597: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1600 

- Ceres seq_id 1598563 

- Location of start within SEQ ID NO 1597: at 115 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EF hand 

- Location within SEQ ID NO 1600: from 48 to 76 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 781 

- gi No. 115511 

- Description: CALMODULIN >gi | 231 682 ! sp j P2 9612 | CALM_ORYSA 
CALMODULIN >gi | 7 1 682 | pir j | MCBH calmodulin - barley >gi I 100666 | pir | | S24 952 
calmodulin 1 (clone lambda DASH) - rice >gi I 2018 8 | emb 1 CAA7 8287 | (Z12827) 
calmodulin [Oryza sativa] 

- % Identity: 100 

- Alignment Length: 104 

- Location of Alignment in SEQ ID NO 1600: from 1 to 104 

Maximum Length Sequence: 

related to: 
Clone IDs: 

286359 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1601 

- Ceres seq_id 1598564 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1602 

- Ceres seq_id 1598565 

- Location of start within SEQ ID NO 1601: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DnaJ central domain {4 repeats) 

- Location within SEQ ID NO 1602: from 78 to 133 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 82 

- gi No. 4732091 

- Description: (AF126742) bundle sheath defective protein 2 [Zea 

mays] 

- % Identity: 100 

- Alignment Length: 129 

- Location of Alignment in SEQ ID NO 1602: from 21 to 149 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1603 

- Ceres seq_id 1598566 

- Location of start within SEQ ID NO 1601: at 61 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- DnaJ central domain {4 repeats) 

- Location within SEQ ID NO 1603: from 58 to 113 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 83 

- gi No. 4732091 

- Description: (AF126742) bundle sheath defective protein 2 [Zea 

mays] 

- % Identity: 100 

- Alignment Length: 12 9 

- Location of Alignment in SEQ ID NO 1603: from 1 to 129 

Maximum Length Sequence : 

related to: 
Clone IDs: 

286726 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1604 

- Ceres seq_id 1598578 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1605 

- Ceres seq__id 1598579 

- Location of start within SEQ ID NO 1604: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1606 

- Ceres seq_id 1598580 

- Location of start within SEQ ID NO 1604: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1607 

- Ceres seq_id 1598581 
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- Location of start within SEQ ID NO 1604: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 78 4 

- gi No. 1002380 

- Description: (U24189) RRM-type RNA binding protein 
[Caenorhabditis elegans] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 1607: from 18 to 29 

Maximum Length Sequence: 

related to: 
Clone IDs: 

288342 

{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1608 

- Ceres seq_id 1598598 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1609 

- Ceres seq_id 1598599 

- Location of start within SEQ ID NO 1608: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1610 

- Ceres seq_id 1598600 

- Location of start within SEQ ID NO 1608: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 85 

- gi No. 729762 

- Description: 17.0 KD CLASS II HEAT SHOCK PROTEIN {HSP 18) 

>gi I 477225 Ipiri [A48425 heat shock protein HSP18 - maize >gi | 30007 9 1 bbs 1 130952 
(S59777) HSP18=18 kda heat shock protein [Zea mays, Oh43, clone cMHSP18~l, 
Peptide, 154 aa] [Zea mays] 

- % Identity: 100 

- Alignment Length: 32 

- Location of Alignment in SEQ ID NO 1610: from 28 to 58 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1611 

- Ceres seq_id 1598601 

- Location of start within SEQ ID NO 1608: at 83 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 78 6 

- gi No. 729762 

- Description: 17.0 KD CLASS II HEAT SHOCK PROTEIN (HSP 18) 

>gi I 477225 |pir | | A48425 heat shock protein HSP18 - maize >gi ! 300079 | bbs | 130952 
(S59777) HSP18=18 kda heat shock protein [Zea mays, Oh43, clone cMHSP18-l , 
Peptide, 154 aa] [Zea mays] 
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- % Identity: 100 

- Alignment Length: 32 

- Location of Alignment in SEQ ID NO 1611: from 1 to 31 



Maximum Length Sequence: 

related to: 
Clone IDs: 

289350 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1612 

- Ceres seq_id 1598610 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1613 

- Ceres seq_id 1598611 

- Location of start within SEQ ID NO 1612: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Hsp90 protein 

- Location within SEQ ID NO 1613: from 1 to 110 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 787 

- gi No. 417154 

- Description: HEAT SHOCK PROTEIN 82 >gi j 100685 i pir [j S25541 heat 
shock protein 82 - rice (strain Taichung Native One) >gi I 20256 | emb | CAA77 978 [ 
(Z11920) heat shock protein 82 (HSP82) [Oryza sativa] 

- % Identity: 97.3 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 1613: from 1 to 110 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1614 

- Ceres seq_id 1598612 

- Location of start within SEQ ID NO 1612: at 68 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Hsp90 protein 

- Location within SEQ ID NO 1614: from 1 to 88 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 788 

- gi No. 417154 

- Description: HEAT SHOCK PROTEIN 82 >gi | 100685 [ pir | i S25541 heat 
shock protein 82 - rice (strain Taichung Native One) >gi i 20256 | emb | CAA77 978 | 
(Z11920) heat shock protein 82 (HSP82) [Oryza sativa] 

- % Identity: 97.3 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 1614: from 1 to 88 



Maximum Length Sequence : 

related to: 
Clone IDs: 

290177 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1615 

- Ceres seq__id 1598623 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1616 

- Ceres seq_id 1598624 
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- Location of start within SEQ ID NO 1615: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Ras family 

- Location within SEQ ID NO 1616: from 52 to 123 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 789 

- gi No. 1405561 

- Description: (X98540) FSGTP1 [Fagus sylvatica] 

- % Identity: 92.9 

- Alignment Length: 8 4 

- Location of Alignment in SEQ ID NO 1616: from 41 to 123 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1617 

- Ceres seq_id 1598625 

- Location of start within SEQ ID NO 1615: at 113 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Ras family 

- Location within SEQ ID NO 1617: from 15 to 86 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 90 

- gi No. 1405561 

- Description: (X98540) FSGTP1 [Fagus sylvatica] 

- % Identity: 92.9 

- Alignment Length: 8 4 

- Location of Alignment in SEQ ID NO 1617: from 4 to 86 

Maximum Length Sequence: 

related to: 
Clone IDs: 

291617 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1618 

- Ceres seq__id 1598631 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1619 

- Ceres seq_id 1598632 

- Location of start within SEQ ID NO 1618: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 91 

- gi No. 4091080 

- Description: (AF045571) nucleic acid binding protein [Oryza 

sativa] 

- % Identity: 91.2 

- Alignment Length: 113 

- Location of Alignment in SEQ ID NO 1619: from 44 to 155 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1620 

- Ceres seq_id 1598633 

- Location of start within SEQ ID NO 1618: at 162 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

{Dp} Related Amino Acid Sequences 

- Alignment No. 7 92 

- gi No. 4091080 

- Description: (AF045571) nucleic acid binding protein [Oryza 

sativa] 

- % Identity: 91.2 

- Alignment Length: 113 

- Location of Alignment in SEQ ID NO 1620: from 1 to 102 



Maximum Length Sequence: 

related to: 
Clone IDs: 

291675 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1621 

- Ceres seq_id 1598634 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1622 

- Ceres seq_id 1598635 

- Location of start within SEQ ID NO 1621: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin-conjugating enzyme 

- Location within SEQ ID NO 1622: from 78 to 157 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 7 93 

- gi No. 136640 

- Description: UBIQUITIN-CONJUGATING ENZYME E2-17 KD (UBIQUITIN- 
PROTEIN LIGASE) (UBIQUITIN CARRIER PROTEIN) >gi| 170785 (M62720) ubiquitin 
carrier protein [Triticum aestivum] 

- % Identity: 90.1 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 1622: from 78 to 157 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 162 3 

- Ceres seq_id 1598636 

- Location of start within SEQ ID NO 1621: at 195 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 94 

- gi No. 136640 

- Description: UBIQUITIN-CONJUGATING ENZYME E2-17 KD (UBIQUITIN- 
PROTEIN LIGASE) (UBIQUITIN CARRIER PROTEIN) >gi 1170785 (M62720) ubiquitin 
carrier protein [Triticum aestivum] 

- % Identity: 95.7 

- Alignment Length: 23 

- Location of Alignment in SEQ ID NO 1623: from 1 to 23 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 162 4 

- Ceres seq_id 1598637 

- Location of start within SEQ ID NO 1621: at 222 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 95 

- gi No. 136640 

- Description: UBIQU1TIN-C0NJUGATING ENZYME E2-17 KD (UBIQUITIN- 
PROTEIN LIGASE) (UBIQUITIN CARRIER PROTEIN) >gi 1170785 (M62720) ubiquitin 
carrier protein [Triticuin aestivum] 

- % Identity: 95.7 

- Alignment Length: 23 

- Location of Alignment in SEQ ID NO 1624: from 1 to 14 

Maximum Length Sequence: 

related to: 
Clone IDs: 

291681 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1625 

- Ceres seq_id 1598638 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1626 

- Ceres seq_id 1598639 

- Location of start within SEQ ID NO 1625: at 104 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glutathione S-transf erases . 

- Location within SEQ ID NO 1626: from 21 to 104 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

292476 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1627 

- Ceres seq_id 1598640 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1628 

- Ceres seq_id 1598641 

- Location of start within SEQ ID NO 1627: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1629 

- Ceres seq_id 1598642 

- Location of start within SEQ ID NO 1627: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 96 

- gi No. 122022 

- Description: HISTONE H2B >gi | 283025 1 pir |j S22323 histone H2B - 
wheat >gi|21801|emb|CAA42530| (X59873) histone H2B [Triticum aestivum] 

- % Identity: 87 

- Alignment Length: 23 
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- Location of Alignment in SEQ ID NO 1629: from 28 to 49 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1630 

- Ceres seq_id 1598643 

- Location of start within SEQ ID NO 1627: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

292530 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1631 

- Ceres seq__id 1598644 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1632 

- Ceres seq_id 1598645 

- Location of start within SEQ ID NO 1631: at 61 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1633 

- Ceres seq__id 1598646 

- Location of start within SEQ ID NO 1631: at 64 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1634 

- Ceres seq_id 1598647 

- Location of start within SEQ ID NO 1631: at 203 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 1634: from 12 to 76 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

288165 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1635 

- Ceres seq_id 1598661 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1636 

- Ceres seq_id 1598662 

- Location of start within SEQ ID NO 1635: at 2 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 97 

- gi No. 1883026 

- Description: (X91513) histone H4 [Diprion pini] 

- % Identity: 8 6.7 

- Alignment Length: 15 

- Location of Alignment in SEQ ID NO 1636: from 43 to 56 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1637 

- Ceres seq_id 1598663 

- Location of start within SEQ ID NO 1635: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1638 

- Ceres seq_id 1598664 

- Location of start within SEQ ID NO 1635: at 70 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 98 

- gi No. 1883026 

- Description: (X91513) histone H4 [Diprion pini] 

- % Identity: 93.8 

- Alignment Length: 16 

- Location of Alignment in SEQ ID NO 1638: from 1 to 16 

Maximum Length Sequence: 

related to: 
Clone IDs: 

289000 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1639 

- Ceres seq_id 1598668 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1640 

- Ceres seq_id 1598669 

- Location of start within SEQ ID NO 1639: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 7 99 

- gi No. 607014 

- Description: (M83895) protamine 1 [Cavia porcellus] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 1640: from 105 to 115 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1641 

- Ceres seq_id 1598670 

- Location of start within SEQ ID NO 1639: at 98 nt . 



Table 1 
Page 34 6 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 34 7 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Ribosomal protein S21e 

- Location within SEQ ID NO 1641: from 1 to 81 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 00 

- gi No. 2500497 

- Description: 40S RIBOSOMAL PROTEIN S21 

>gi | 1419372 | emb | CAA67225 . 1 1 (X98656) ribosomal protein S21 [Zea mays] 

- % Identity: 100 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 1641: from 1 to 81 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1642 

- Ceres seq__id 1598671 

- Location of start within SEQ ID NO 1639: at 183 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 8 01 

- gi No. 607014 

- Description: (M83895) protamine 1 [Cavia porcellus] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 1642: from 45 to 55 



Maximum Length Sequence: 

related to: 
Clone IDs: 

293470 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1643 

- Ceres seq_id 1598689 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1644 

- Ceres seq_id 1598690 

- Location of start within SEQ ID NO 1643: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Glutathione S-transf erases . 

- Location within SEQ ID NO 1644: from 26 to 87 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 02 

- gi No. 4468794 

- Description: (AJ010296) Glutathione transferase 111(b) [Zea mays] 

- % Identity: 98.5 

- Alignment Length: 65 

- Location of Alignment in SEQ ID NO 1644: from 23 to 87 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1645 

- Ceres seq_id 1598691 

- Location of start within SEQ ID NO 1643: at 69 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- Glutathione S-trans f erases . 

- Location within SEQ ID NO 1645: from 4 to 65 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 803 

- gi No. 4468794 

- Description: {AJ010296) Glutathione transferase 111(b) [Zea mays] 

- % Identity: 98.5 

- Alignment Length: 65 

- Location of Alignment in SEQ ID NO 1645: from 1 to 65 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 164 6 

- Ceres seq_id 1598692 

- Location of start within SEQ ID NO 1643: at 93 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glutathione S-transf erases . 

- Location within SEQ ID NO 1646: from 1 to 57 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 8 04 

- gi No. 4468794 

- Description: (AJ010296) Glutathione transferase 111(b) [Zea mays] 

- % Identity: 98.5 

- Alignment Length: 65 

- Location of Alignment in SEQ ID NO 1646: from 1 to 57 



Maximum Length Sequence : 

related to: 
Clone IDs: 

293652 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1647 

- Ceres seq_id 1598693 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1648 

- Ceres seq__id 1598694 

- Location of start within SEQ ID NO 1647: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s } 

- Ras family 

- Location within SEQ ID NO 1648: from 58 to 154 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 8 05 

- gi No. 4959463 

- Description: (AF126054) RACC small GTP binding protein [Zea mays] 

- % Identity: 96.4 

- Alignment Length: 112 

- Location of Alignment in SEQ ID NO 1648: from 44 to 154 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1649 

- Ceres seq_id 1598695 

- Location of start within SEQ ID NO 1647: at 135 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 
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- Ras family 

- Location within SEQ ID NO 1649: from 14 to 110 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 806 

- gi No. 4959463 

- Description: (AF12 6054) RACC small GTP binding protein [Zea mays] 

- % Identity: 96.4 

- Alignment Length: 112 

- Location of Alignment in SEQ ID NO 164 9: from 1 to 110 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1650 

- Ceres seq_id 1598696 

- Location of start within SEQ ID NO 1647: at 216 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ras family 

- Location within SEQ ID NO 1650: from 1 to 83 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 07 

- gi No. 4959463 

- Description: (AF126054) RACC small GTP binding protein [Zea mays] 

- % Identity: 96.4 

- Alignment Length: 112 

- Location of Alignment in SEQ ID NO 1650: from 1 to 83 

Maximum Length Sequence: 

related to: 
Clone IDs: 

294638 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1651 

- Ceres seq_id 1598730 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1652 

- Ceres seq_id 1598731 

- Location of start within SEQ ID NO 1651: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 08 

- gi No. 4090265 

- Description: (AJ131850) group I pollen allergen [Poa pratensis] 

- % Identity: 77.8 

- Alignment Length: 12 6 

- Location of Alignment in SEQ ID NO 1652: from 38 to 163 

Maximum Length Sequence: 

related to: 
Clone IDs: 

295332 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1653 

- Ceres seq_id 1598732 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1654 

- Ceres seq_id 1598733 
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- Location of start within SEQ ID NO 1653: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1655 

- Ceres seq_id 1598734 

- Location of start within SEQ ID NO 1653: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 165 6 

- Ceres seq__id 1598735 

- Location of start within SEQ ID NO 1653: at 119 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 09 

- gi No. 1352054 

- Description: ATP SYNTHASE 6 KD SUBUNIT, MITOCHONDRIAL 

- % Identity: 7 6 

- Alignment Length: 25 

- Location of Alignment in SEQ ID NO 1656: from 1 to 25 

Maximum Length Sequence: 

related to: 
Clone IDs: 

296868 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1657 

- Ceres seq_id 1598744 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1658 

- Ceres seq_id 1598745 

- Location of start within SEQ ID NO 1657: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribonucleotide reductase 

- Location within SEQ ID NO 1658: from 1 to 166 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 810 

- gi No. 3661603 

- Description: (AF092841) ribonucleoside-diphosphate reductase 
large subunit [Arabidopsis thaliana] 

- % Identity: 88.6 

- Alignment Length: 167 

- Location of Alignment in SEQ ID NO 1658: from 1 to 166 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 165 9 

- Ceres seq_id 1598746 

- Location of start within SEQ ID NO 1657: at 57 nt . 



Table 1 
Page 350 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 351 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribonucleotide reductase 

- Location within SEQ ID NO 1659: from 1 to 148 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 811 

- gi No. 3661603 

- Description: (AF092841) ribonucleoside-diphosphate reductase 
large subunit [Arabidopsis thaliana] 

- % Identity: 88.6 

- Alignment Length: 167 

- Location of Alignment in SEQ ID NO 1659: from 1 to 148 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1660 

- Ceres seq_id 1598747 

- Location of start within SEQ ID NO 1657: at 120 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribonucleotide reductase 

- Location within SEQ ID NO 1660: from 1 to 127 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 812 

- gi No. 3661603 

- Description: (AF092841) ribonucleoside-diphosphate reductase 
large subunit [Arabidopsis thaliana] 

- % Identity: 88.6 

- Alignment Length: 167 

- Location of Alignment in SEQ ID NO 1660: from 1 to 127 

Maximum Length Sequence: 

related to: 
Clone IDs: 

297059 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1661 

- Ceres seq_id 1598752 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1662 

- Ceres seq_id 1598753 

- Location of start within SEQ ID NO 1661: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Protamine PI 

- Location within SEQ ID NO 1662: from 33 to 91 aa. 



(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1663 

- Ceres seq_id 1598754 

- Location of start within SEQ ID NO 1661: at 117 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1664 

- Ceres seq__id 1598755 

- Location of start within SEQ ID NO 1661: at 132 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

298498 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1665 

- Ceres seq_id 1598764 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1666 

- Ceres seq_id 1598765 

- Location of start within SEQ ID NO 1665: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 813 

- gi No. 1362183 

- Description: histone H2B-6 - wheat >gi i 531056 i dbj | BAA07156 I 
(D37 942) protein H2B-6 [Triticum aestivum] 

- % Identity: 73.1 

- Alignment Length: 2 6 

- Location of Alignment in SEQ ID NO 1666: from 26 to 48 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1667 

- Ceres seq_id 1598766 

- Location of start within SEQ ID NO 1665: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1668 

- Ceres seq_id 1598767 

- Location of start within SEQ ID NO 1665: at 76 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 814 

- gi No. 1362183 

- Description: histone H2B-6 - wheat >gi 1 531056 1 dbj [ BAA07156 i 
(D37942) protein H2B-6 [Triticum aestivum] 

- % Identity: 73.1 

- Alignment Length: 2 6 

- Location of Alignment in SEQ ID NO 1668: from 1 to 23 

Maximum Length Sequence : 

related to: 
Clone IDs: 

298621 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1669 

- Ceres seq_id 1598768 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1670 

- Ceres seq_id 1598769 

- Location of start within SEQ ID NO 1669: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 815 

- gi No. 1710509 

- Description: 60S RIBOSOMAL PROTEIN L18A 

- % Identity: 90.5 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 1670: from 93 to 113 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1671 

- Ceres seq_id 1598770 

- Location of start within SEQ ID NO 1669: at 7 4 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 816 

- gi No. 1710509 

- Description: 60S RIBOSOMAL PROTEIN L18A 

- % Identity: 90.5 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 1671: from 69 to 89 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1672 

- Ceres seq_id 1598771 

- Location of start within SEQ ID NO 1669: at 243 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

297447 

{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1673 

- Ceres seq_id 1598781 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 167 4 

- Ceres seq_id 1598782 

- Location of start within SEQ ID NO 1673: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 817 

- gi No. 122087 

- Description: HISTONE H3 >gi | 81849 | pir | i S04520 histone H3 (clone 
pH3c-l) - alfalfa >gi | 82609 j pir | | A26014 histone H3 - wheat 
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>gi| 19607 | emb | CAA31964 ! (X13673) histone H3 (AA 1-136} [Medicago sativa] 
>gi|19609|emb|CAA31965 1 (X13674) histone H3 (AA 1-136} 

- % Identity: 98 

- Alignment Length: 5 0 

- Location of Alignment in SEQ ID NO 1674: from 26 to 75 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1675 

- Ceres seq_id 1598783 

- Location of start within SEQ ID NO 1673: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s } 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 167 6 

- Ceres seq_id 1598784 

- Location of start within SEQ ID NO 1673: at 7 6 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp} Related Amino Acid Sequences 

- Alignment No. 818 

- gi No. 122087 

- Description: HISTONE H3 >gi | 8184 9 ! pir j [ S04520 histone H3 (clone 
pH3c-l) - alfalfa >gi I 82609 I pir [ | A26014 histone H3 - wheat 

>gi I 19607 j emb | CAA31964 | (X13673} histone H3 (AA 1-136) [Medicago sativa] 
>gi|19609|emb|CAA31965| (X13674) histone H3 (AA 1-136) 

- % Identity: 98 

- Alignment Length: 50 

- Location of Alignment in SEQ ID NO 1676: from 1 to 50 

Maximum Length Sequence : 

related to: 
Clone IDs: 

299330 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1677 

- Ceres seq_id 1598795 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 167 8 

- Ceres seq_id 1598796 

- Location of start within SEQ ID NO 1677: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

(Dp) Related Amino Acid Sequences 

- Alignment No. 819 

- gi No. 100219 

- Description: glycine-rich protein (clone uK-4) - tomato 
>gi I 1345534 ! emb | CAA39225 | (X55696) glycine-rich protein [Lycopersicon 
esculentum] 

- % Identity: 70.6 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 1678: from 38 to 54 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 167 9 

- Ceres seq_id 1598797 

- Location of start within SEQ ID NO 1677: at 2 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 820 

- gi No. 553278 

- Description: (L12699) engrailed protein [Homo sapiens] 

- % Identity: 71.4 

- Alignment Length : 1 4 

- Location of Alignment in SEQ ID NO 167 9: from 38 to 51 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1680 

- Ceres seq_id 1598798 

- Location of start within SEQ ID NO 1677: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

299359 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1681 

- Ceres seq_id 1598802 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1682 

- Ceres seq_id 1598803 

- Location of start within SEQ ID NO 1681: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 21 
» gi No. 4539677 

- Description: (AF061282) patatin-like protein [Sorghum bicolor] 

- % Identity: 93.5 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 1682: from 40 to 115 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1683 

- Ceres seq_id 1598804 

- Location of start within SEQ ID NO 1681: at 119 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 822 

- gi No. 4539677 

- Description: (AF061282) patatin-like protein [Sorghum bicolor] 

- % Identity: 93.5 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 1683: from 1 to 7 6 

Maximum Length Sequence : 

related to: 
Clone IDs: 

299467 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 168 4 

- Ceres seq_id 1598805 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1685 

- Ceres seq_id 1598806 

- Location of start within SEQ ID NO 1684: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1685: from 28 to 122 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 23 

- gi No. 3745759 

- Description: Chain B, X-Ray Structure Of The Nucleosome Core 
Particle At 2 . 8 A Resolution >gi i 37457 63 | pdb | 1AOI | F Chain F, X-Ray Structure 
Of The Nucleosome Core Particle At 2 . 8 A Resolution 

- % Identity: 96.9 

- Alignment Length: 32 

- Location of Alignment in SEQ ID NO 1685: from 97 to 128 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 168 6 

- Ceres seq__id 1598807 

- Location of start within SEQ ID NO 1684: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 824 

- gi No. 3745759 

- Description: Chain B, X-Ray Structure Of The Nucleosome Core 
Particle At 2 . 8 A Resolution >gi i 37457 63 i pdb | 1AOI | F Chain F, X-Ray Structure 
Of The Nucleosome Core Particle At 2.8 A Resolution 

- % Identity: 96.9 

- Alignment Length: 32 

- Location of Alignment in SEQ ID NO 1686: from 63 to 94 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1687 

- Ceres seq__id 1598808 

- Location of start within SEQ ID NO 1684: at 80 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1687: from 2 to 96 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 8 25 

- gi No. 3745759 

- Description: Chain B, X-Ray Structure Of The Nucleosome Core 
Particle At 2 . 8 A Resolution >gi | 37457 63 ! pdb ! 1AOI | F Chain F f X-Ray Structure 
Of The Nucleosome Core Particle At 2 . 8 A Resolution 

- % Identity: 96.9 

- Alignment Length: 32 

- Location of Alignment in SEQ ID NO 1687: from 71 to 102 



Maximum Length Sequence: 
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related to: 
Clone IDs: 

305131 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1688 

- Ceres seq_id 1598847 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1689 

- Ceres seq_id 1598848 

- Location of start within SEQ ID NO 1688: at 164 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S12e 

- Location within SEQ ID NO 1689: from 1 to 96 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 82 6 

- gi No. 5106775 

- Description: (AF067732) ribosomal protein S12 [Hordeum vulgare] 

- % Identity: 92.9 

- Alignment Length: 99 

- Location of Alignment in SEQ ID NO 1689: from 1 to 96 

Maximum Length Sequence: 

related to: 
Clone IDs: 

300786 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1690 

- Ceres seq_id 1598874 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1691 

- Ceres seq_id 1598875 

- Location of start within SEQ ID NO 1690: at 115 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sm protein 

- Location within SEQ ID NO 1691: from 10 to 75 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 827 

- gi No. 5730132 

- Description: (AL1097 96) snRNP Sm protein F-like [Arabidopsis 

thaliana] 

- % Identity: 87.2 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 1691: from 1 to 86 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1692 

- Ceres seq_id 1598876 

- Location of start within SEQ ID NO 1690: at 196 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sm protein 

- Location within SEQ ID NO 1692: from 1 to 48 aa. 



(Dp) Related Amino Acid Sequences 
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- Alignment No. 828 

- gi No. 5730132 

- Description: (AL109796) snRNP Sm protein F-like [Arabidopsis 

thaliana] 

- % Identity: 87.2 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 1692: from 1 to 5 9 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1693 

- Ceres seq_id 1598877 

- Location of start within SEQ ID NO 1690: at 235 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

{Dp) Related Amino Acid Sequences 

- Alignment No. 829 

- gi No. 5730132 

- Description: (AL109796) snRNP Sm protein F-like [Arabidopsis 

thaliana] 

- % Identity: 87.2 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 1693: from 1 to 4 6 

Maximum Length Sequence : 

related to: 
Clone IDs: 

301560 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1694 

- Ceres seq_id 1598878 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1695 

- Ceres seq_id 1598879 

- Location of start within SEQ ID NO 1694: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1695: from 74 to 161 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 830 

- gi No. 122022 

- Description: HISTONE H2B >gi | 283025 | pir jj S22323 histone H2B 
wheat >gi|21801|embiCAA42530| (X59873) histone H2B [Triticum aestivum] 

- % Identity: 81.3 

- Alignment Length: 134 

- Location of Alignment in SEQ ID NO 1695: from 33 to 161 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1696 

- Ceres seq__id 1598880 

- Location of start within SEQ ID NO 1694: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1697 
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- Ceres seq_id 1598881 

- Location of start within SEQ ID NO 1694: at 97 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1697: from 42 to 129 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 831 

- gi No. 122022 

- Description: HISTONE H2B >gi | 28 3025 | pir | | S22323 histone H2B - 
wheat >gi|21801|embiCAA42530| (X59873) histone H2B [Triticum aestivum] 

- % Identity: 81.3 

- Alignment Length: 134 

- Location of Alignment in SEQ ID NO 1697: from 1 to 129 



Maximum Length Sequence: 

related to: 
Clone IDs: 

304048 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1698 

- Ceres seq_id 1598884 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1699 

- Ceres seq_id 1598885 

- Location of start within SEQ ID NO 1698: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cof ilin/tropomyosin-type actin-binding proteins 

- Location within SEQ ID NO 1699: from 45 to 112 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 32 

- gi No. 4566614 

- Description: (AF112887) actin depolymerizing factor [Populus alba 
x Populus tremula] 

- % Identity: 75.9 

- Alignment Length: 83 

- Location of Alignment in SEQ ID NO 1699: from 30 to 112 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1700 

- Ceres seq_id 1598886 

- Location of start within SEQ ID NO 1698: at 64 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cof ilin/tropomyosin-type actin-binding proteins 

- Location within SEQ ID NO 1700: from 24 to 91 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 833 

- gi No. 4566614 

- Description: (AF112887) actin depolymerizing factor [Populus alba 
x Populus tremula] 

- % Identity: 75.9 

- Alignment Length: 8 3 

- Location of Alignment in SEQ ID NO 1700: from 9 to 91 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1701 

- Ceres seq_id 1598887 

- Location of start within SEQ ID NO 1698: at 73 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cof il in /tropomyosin- type actin-binding proteins 

- Location within SEQ ID NO 1701: from 21 to 88 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 834 

- gi No. 4566614 

- Description: {AF112887) actin depolymerizing factor [Populus alba 
x Populus tremula] 

- % Identity: 75.9 

- Alignment Length: 8 3 

- Location of Alignment in SEQ ID NO 1701: from 6 to 88 

Maximum Length Sequence: 

related to: 
Clone IDs: 

304690 

(Ac) cDNA Polynucleotide Sequence 
- Pat. Appln. SEQ ID NO 1702 
■ Ceres seq_id 1598888 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1703 

- Ceres seq_id 1598889 

- Location of start within SEQ ID NO 1702: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 04 

- Ceres seq_id 1598890 

- Location of start within SEQ ID NO 1702: at 142 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 35 

- gi No. 1350965 

- Description: 40S RIBOSOMAL PROTEIN S23 (S12) 

- % Identity: 90.9 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 1704: from 1 to 11 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 05 

- Ceres seq_id 1598891 

- Location of start within SEQ ID NO 1702: at 160 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 
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related to: 
Clone IDs: 

305650 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1706 

- Ceres seq_id 1598896 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1707 

- Ceres seq_id 1598897 

- Location of start within SEQ ID NO 1706: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1708 

- Ceres seq_id 1598898 

- Location of start within SEQ ID NO 1706: at 56 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L31e 

- Location within SEQ ID NO 1708: from 12 to 106 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 836 

- gi No. 1173027 

- Description: 60S RIBOSOMAL PROTEIN L31 >gi I 915313 (U23784) 
ribosomal protein L31 [Nicotiana glutinosa] 

- % Identity: 88 

- Alignment Length: 108 

- Location of Alignment in SEQ ID NO 1708: from 12 to 119 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1709 

- Ceres seq_id 1598899 

- Location of start within SEQ ID NO 1706: at 212 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ribosomal protein L31e 

- Location within SEQ ID NO 1709: from 1 to 54 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 37 

- gi No. 1173027 

- Description: 60S RIBOSOMAL PROTEIN L31 >gi I 915313 (U23784) 
ribosomal protein L31 [Nicotiana glutinosa] 

- % Identity: 88 

- Alignment Length: 108 

- Location of Alignment in SEQ ID NO 1709: from 1 to 67 

Maximum Length Sequence: 

related to: 
Clone IDs: 

310589 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1710 

- Ceres seq_id 1598904 
(B) Polypeptide Sequence 



Table 1 
Page 361 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 362 



- Pat. Appln. SEQ ID NO 1711 

- Ceres seq_id 1598905 

- Location of start within SEQ ID NO 1710: at 156 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cof ilin/tropomyosin-type actin-binding proteins 

- Location within SEQ ID NO 1711: from 12 to 107 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 838 

- gi No. 1419370 

- Description: (X97726) actin depolymerizing factor [Zea mays] 

- % Identity: 99.1 

- Alignment Length: 108 

- Location of Alignment in SEQ ID NO 1711: from 1 to 107 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1712 

- Ceres seq_id 1598906 

- Location of start within SEQ ID NO 1710: at 198 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Cof ilin/tropomyosin-type actin-binding proteins 

- Location within SEQ ID NO 1712: from 1 to 93 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 83 9 

- gi No. 1419370 

- Description: (X97726) actin depolymerizing factor [Zea mays] 

- % Identity: 99.1 

- Alignment Length: 10 8 

- Location of Alignment in SEQ ID NO 1712: from 1 to 93 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1713 

- Ceres seq_id 1598907 

- Location of start within SEQ ID NO 1710: at 255 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cof ilin/tropomyosin-type actin-binding proteins 

- Location within SEQ ID NO 1713: from 1 to 74 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 84 0 

- gi No. 1419370 

- Description: (X97726) actin depolymerizing factor [Zea mays] 

- % Identity: 99.1 

- Alignment Length: 108 

- Location of Alignment in SEQ ID NO 1713: from 1 to 74 



Maximum Length Sequence: 

related to: 
Clone IDs: 

310661 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1714 

- Ceres seq_id 1598914 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 1715 

- Ceres seq_id 1598915 

- Location of start within SEQ ID NO 1714: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ser/Thr protein phosphatase 

- Location within SEQ ID NO 1715: from 64 to 128 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 41 

- gi No. 5714762 

- Description: (AF173881) serine/threonine protein phosphatase 
PP2A-4 catalytic subunit [Oryza sativa subsp. indica] 

- % Identity: 88.8 

- Alignment Length: 8 0 

- Location of Alignment in SEQ ID NO 1715: from 49 to 128 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1716 

- Ceres seq_id 1598916 

- Location of start within SEQ ID NO 1714: at 147 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ser/Thr protein phosphatase 

- Location within SEQ ID NO 1716: from 16 to 80 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 842 

- gi No. 5714762 

- Description: (AF173881) serine/threonine protein phosphatase 
PP2A-4 catalytic subunit [Oryza sativa subsp. indica] 

- % Identity: 88.8 

~ Alignment Length: 8 0 

- Location of Alignment in SEQ ID NO 1716: from 1 to 80 

Maximum Length Sequence : 

related to: 
Clone IDs: 

311316 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1717 

- Ceres seq_id 1598921 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1718 

- Ceres seq_id 1598922 

- Location of start within SEQ ID NO 1717: at 114 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- NAD dependent epimerase/dehydratase family 

- Location within SEQ ID NO 1718: from 22 to 133 aa . 



(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

311539 

(Ac) cDNA Polynucleotide Sequence 
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- Pat. Appln. SEQ ID NO 1719 

- Ceres seq_id 1598927 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1720 

- Ceres seq_id 1598928 

- Location of start within SEQ ID NO 1719: at 112 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribosomal protein L24e 

- Location within SEQ ID NO 1720: from 3 to 63 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 8 43 

- gi No. 1710521 

- Description: 60S RIBOSOMAL PROTEIN L24 >gi 1 1154 85 9 | emb | CAA63960 | 
(X94296) L24 ribosomal protein [Hordeum vulgare] 

- % Identity: 94 

- Alignment Length: 50 

- Location of Alignment in SEQ ID NO 1720: from 1 to 50 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1721 

- Ceres seq_id 1598929 

- Location of start within SEQ ID NO 1719: at 222 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1722 

- Ceres seq_id 1598930 

- Location of start within SEQ ID NO 1719: at 278 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 44 

- gi No. 1710521 

- Description: 60S RIBOSOMAL PROTEIN L24 >gi 1 1154859 1 emb I CAA63960 ! 
(X94296) L24 ribosomal protein [Hordeum vulgare] 

- % Identity: 93.7 

- Alignment Length: 63 

- Location of Alignment in SEQ ID NO 1722: from 1 to 55 

Maximum Length Sequence: 

related to: 
Clone IDs: 

311667 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1723 

- Ceres seq_id 1598932 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1724 

- Ceres seq_id 1598933 

- Location of start within SEQ ID NO 1723: at 1 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 7 transmembrane receptor (rhodopsin family) 
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- Location within SEQ ID NO 1724: from 3 to 85 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 45 

- gi No. 100639 

- Description: pollen allergen plb precursor - perennial ryegrass 

- % Identity: 80 

- Alignment Length: 15 

- Location of Alignment In SEQ ID NO 1724: from 3 to 17 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1725 

- Ceres seq_id 1598934 

- Location of start within SEQ ID NO 1723: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1726 

- Ceres seq_id 1598935 

- Location of start within SEQ ID NO 1723: at 96 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

311682 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1727 

- Ceres seq__id 1598936 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1728 

- Ceres seq_id 1598937 

- Location of start within SEQ ID NO 1727: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1729 

- Ceres seq_id 1598938 

- Location of start within SEQ ID NO 1727: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1730 

- Ceres seq_id 1598939 

- Location of start within SEQ ID NO 1727: at 91 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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- Alignment No. 8 46 

- gi No. 3885884 

- Description: (AF093630) 60S ribosomal protein L21 [Oryza sativa] 

- % Identity: 81.6 

- Alignment Length: 38 

- Location of Alignment in SEQ ID NO 1730: from 1 to 37 



Maximum Length Sequence : 

related to: 
Clone IDs: 

311812 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1731 

- Ceres seq_id 1598943 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1732 

- Ceres seq_id 1598944 

- Location of start within SEQ ID NO 1731: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1733 

- Ceres seq_id 1598945 

- Location of start within SEQ ID NO 1731: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1734 

- Ceres seq_id 1598946 

- Location of start within SEQ ID NO 1731: at 204 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- G10 protein 

- Location within SEQ ID NO 1734: from 1 to 77 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 847 

- gi No. 2911068 

- Description: (AL021960) GlO-like protein [Arabidopsis thaliana] 

- % Identity: 85.7 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 1734: from 1 to 77 



Maximum Length Sequence : 

related to: 
Clone IDs: 

311871 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1735 

- Ceres seq_id 1598953 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1736 

- Ceres seq_id 1598954 

- Location of start within SEQ ID NO 1735: at 8 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1737 

- Ceres seq_id 1598955 

- Location of start within SEQ ID NO 1735: at 109 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Translationally controlled tumor protein 

- Location within SEQ ID NO 1737: from 1 to 121 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 848 

- gi No. 549063 

- Description: TRANSLATIONALLY CONTROLLED TUMOR PROTEIN HOMOLOG 
(TCTP) >gi ! 1072464 | pir MA38958 IgE-dependent histamine-releasing factor 
homolog - rice >gi ] 303835 | dbj 1 BAA0215 1 I (D12626) 21kd polypeptide [Oryza 
sativa] 

- % Identity: 74.6 

- Alignment Length: 122 

- Location of Alignment in SEQ ID NO 1737: from 1 to 121 

Maximum Length Sequence : 

related to: 
Clone IDs: 

311993 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1738 

- Ceres seq_id 1598957 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1739 

- Ceres seq__id 1598958 

- Location of start within SEQ ID NO 1738: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1740 

- Ceres seq_id 1598959 

- Location of start within SEQ ID NO 1738: at 200 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 849 

- gi No. 2224915 

- Description: (U95968) beta-expansin [Oryza sativa] 

- % Identity: 7 9.1 

- Alignment Length: 67 

- Location of Alignment in SEQ ID NO 1740: from 14 to 7 9 

Maximum Length Sequence: 

related to: 
Clone IDs: 

312247 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 17 41 

- Ceres seq__Id 1598960 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 4 2 

- Ceres seq___id 1598961 

- Location of start within SEQ ID NO 1741: at 35 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 50 

- gi No. 1174846 

- Description: UBIQUIT IN-CONJUGATING ENZYME E2-17 KD 3 (UBIQUITIN- 
PROTEIN L I GAS E 3) (UBIQUITIN CARRIER PROTEIN 3} >gi 1 1076425 i pir || S43782 
ubiquitin-conjugating enzyme UBC3 - Arabidopsis thaliana >gi 1431262 (L19352) 
ubiquitin conjugating 

- % Identity: 7 9.3 

- Alignment Length: 29 

- Location of Alignment in SEQ ID NO 1742: from 27 to 55 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1743 

- Ceres seq_id 1598962 

- Location of start within SEQ ID NO 1741: at 211 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Ubiquitin-conjugating enzyme 

- Location within SEQ ID NO 1743: from 1 to 84 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 51 

- gi No. 1174846 

- Description: UBIQUITIN-CONJUGATING ENZYME E2-17 KD 3 (UBIQUITIN- 
P ROTE IN LIGASE 3) (UBIQUITIN CARRIER PROTEIN 3) >gi 1 1 07 64 25 j pir | [ S4 37 8 2 
ubiquitin-conjugating enzyme UBC3 - Arabidopsis thaliana >gi 1431262 (L19352) 
ubiquitin conjugating 

- % Identity: 90.1 

- Alignment Length: 91 

- Location of Alignment in SEQ ID NO 17 43: from 1 to 8 4 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 4 4 

- Ceres seq_id 1598963 

- Location of start within SEQ ID NO 1741: at 337 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 852 

- gi No. 1174846 

- Description: UBIQUITIN-CONJUGATING ENZYME E2-17 KD 3 (UBIQUITIN- 
PROTEIN LIGASE 3) (UBIQUITIN CARRIER PROTEIN 3) >gi | 107 6425 | pir | | S43782 
ubiquitin-conjugating enzyme UBC3 - Arabidopsis thaliana >gi 1431262 (L19352) 
ubiquitin conjugating 

- % Identity: 90.1 

- Alignment Length: 91 

- Location of Alignment in SEQ ID NO 1744: from 1 to 42 



Maximum Length Sequence: 
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related to: 
Clone IDs: 

312307 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 17 4 5 

- Ceres seq__id 1598964 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1746 

- Ceres seq_id 1598965 

- Location of start within SEQ ID NO 1745: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Hsp90 protein 

- Location within SEQ ID NO 1746: from 41 to 93 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 853 

- gi No. 417154 

- Description: HEAT SHOCK PROTEIN 82 >gi I 100685 i pir || S25541 heat 
shock protein 82 - rice (strain Taichung Native One) >gi j 2025 6 | emb j CAA77 97£ 
(Z11920) heat shock protein 82 (HSP82) [Oryza sativa] 

- % Identity: 94.3 

- Alignment Length: 53 

- Location of Alignment in SEQ ID NO 1746: from 41 to 93 

Maximum Length Sequence: 

related to: 
Clone IDs: 

312493 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1747 

- Ceres seq_id 1598972 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1748 

- Ceres seq_id 1598973 

- Location of start within SEQ ID NO 1747: at 115 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S13 

- Location within SEQ ID NO 1748: from 14 to 129 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 8 54 

- gi No. 464707 

- Description: 40S RIBOSOMAL PROTEIN S18 >gi [ 480908 | pir i | S37496 
ribosomal protein S18.A - Arabidopsis thaliana >gi | 405 613 1 emb 1 CAA80 684 1 
(Z23165) ribosomal protein S18A [Arabidopsis thaliana] 

>gi | 4 34 34 3 | emb | CAA8227 3 1 (Z287 01) S18 

- % Identity: 84.6 

- Alignment Length: 130 

- Location of Alignment in SEQ ID NO 1748: from 1 to 129 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 4 9 

- Ceres seq_id 1598974 

- Location of start within SEQ ID NO 1747: at 193 nt • 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 
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- Ribosomal protein S13 

- Location within SEQ ID NO 1749: from 1 to 103 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 855 

- gi No. 464707 

- Description: 40S RIBOSOMAL PROTEIN S18 >gi i 480908 | pir | | S37 4 96 
ribosomal protein S18.A - Arabidopsis thaliana >gi I 4 05 61 3 i emb | CAA8 0 68 4 I 
(Z23165) ribosomal protein S18A [Arabidopsis thaliana] 

>gi j 4 3434 3 | emb|CAA82273 | (Z28701) S18 

- % Identity: 84.6 

- Alignment Length: 130 

- Location of Alignment in SEQ ID NO 1749: from 1 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1750 

- Ceres seq_id 1598975 

- Location of start within SEQ ID NO 1747: at 202 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Ribosomal protein S13 

- Location within SEQ ID NO 1750: from 1 to 100 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 856 

- gi No. 464707 

- Description: 40S RIBOSOMAL PROTEIN S18 >gi i 480908 [ pir |j S374 96 
ribosomal protein S18.A - Arabidopsis thaliana >gi I 405613 I emb [ CAA8 0684 |- 
(Z23165) ribosomal protein S18A [Arabidopsis thaliana] 

>gi j 4 34 34 3 ! emb | CAA82273 i (Z2870I) S18 

- % Identity: 84.6 

- Alignment Length: 130 

- Location of Alignment in SEQ ID NO 1750: from 1 to 100 

Maximum Length Sequence: 

related to: 
Clone IDs: 

312504 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1751 

- Ceres seq_id 1598976 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1752 

- Ceres seq_id 1598977 

- Location of start within SEQ ID NO 1751: at 157 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ubiquitin-conj ugating enzyme 

- Location within SEQ ID NO 1752: from 1 to 69 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 857 

- gi No. 464986 

- Description: UB I QUI TIN-CONJUGATING ENZYME E2-17 KD 9 (UBIQUITIN- 
PROTEIN LIGASE 9) (UBIQUITIN CARRIER PROTEIN 9) (UBCAT4B) 

>gi| 421857 Ipirj | S32674 ubiquitin—protein ligase (EC 6.3.2.19) UBC9 - 
Arabidopsis thaliana >gi j 2 97 8 84 [ emb | CAA7 8714 j 

- % Identity: 95.7 

- Alignment Length: 69 
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- Location of Alignment in SEQ ID NO 1752: from 1 to 69 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1753 

- Ceres seq_id 1598978 

- Location of start within SEQ ID NO 1751: at 244 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 85 8 

- gi No. 464986 

- Description: UBIQUITIN-CONJUGATING ENZYME E2-17 KD 9 (UBIQUITIN- 
PROTEIN LIGASE 9) (UBIQUITIN CARRIER PROTEIN 9) (UBCAT4B) 

>gi 1 421857 |pir 1 j S32674 ubiquitin — protein ligase (EC 6.3.2.19) UBC9 - 
Arabidopsis thaliana >gi j 297884 ! emb | CAA78714 | 

- % Identity: 95.7 

- Alignment Length: 69 

- Location of Alignment in SEQ ID NO 1753: from 1 to 40 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1754 

- Ceres seq_id 1598979 

- Location of start within SEQ ID NO 1751: at 268 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 859 

- gi No. 464986 

- Description: UBIQUITIN-CONJUGATING ENZYME E2-17 KD 9 (UBIQUITIN- 
PROTEIN LIGASE 9) (UBIQUITIN CARRIER PROTEIN 9) (UBCAT4B) 

>gi I 421857 |pir j | S32674 ubiquitin — protein ligase (EC 6.3.2.19) UBC9 - 
Arabidopsis thaliana >gi | 297884 | emb [ CAA78714 | 

- % Identity: 95.7 

- Alignment Length: 6 9 

- Location of Alignment in SEQ ID NO 1754: from 1 to 32 

Maximum Length Sequence : 

related to: 
Clone IDs: 

314173 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1755 

- Ceres seq_id 1599016 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1756 

- Ceres seq_id 1599017 

- Location of start within SEQ ID NO 1755: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L17 

- Location within SEQ ID NO 1756: from 52 to 148 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1757 

- Ceres seq_id 1599018 

- Location of start within SEQ ID NO 1755: at 106 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L17 

- Location within SEQ ID NO 1757: from 17 to 113 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1758 

- Ceres seq_id 1599019 

- Location of start within SEQ ID NO 1755: at 154 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ribosomal protein L17 

- Location within SEQ ID NO 1758: from 1 to 97 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

314219 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1759 

- Ceres seq_Id 1599020 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 60 

- Ceres seq_id 1599021 

- Location of start within SEQ ID NO 1759: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 61 

- Ceres seq_id 1599022 

- Location of start within SEQ ID NO 1759: at 136 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 60 

- gi No. 2208962 

- Description: (Y10118) signal recognition particle subunit 14 
[Oryza sativa] 

- % Identity: 77.2 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 17 61: from 1 to 113 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1762 

- Ceres seq_id 1599023 

- Location of start within SEQ ID NO 1759: at 181 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 61 
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- gi No. 2208962 

- Description: (Y10118) signal recognition particle subunit 14 
[Oryza sativa] 

- % Identity: 77.2 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 17 62: from 1 to 98 

Maximum Length Sequence: 

related to: 
Clone IDs: 

315294 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 17 63 

- Ceres seq_id 1599058 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1764 

- Ceres seq_id 1599059 

- Location of start within SEQ ID NO 17 63: at 158 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- NAD dependent epimerase/dehydratase family 

- Location within SEQ ID NO 1764: from 12 to 108 aa. 



{Dp} Related Amino Acid Sequences 

- Alignment No. 8 62 

- gi No. 4836876 

- Description: (AC007260) Similar to dTDP-D-glucose 4, 6-dehydratase 
[Arabidopsis thaliana] 

- % Identity: 78.6 

- Alignment Length: 103 

- Location of Alignment in SEQ ID NO 17 64: from 7 to 108 

Maximum Length Sequence: 

related to: 
Clone IDs: 

315661 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 17 65 

- Ceres seq__id 1599062 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1766 

- Ceres seq_id 1599063 

- Location of start within SEQ ID NO 1765: at 131 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 63 

- gi No. 464705 

- Description: 40S RIBOSOMAL PROTEIN S13 >gi I 419802 | pir | t S3014 6 
ribosomal protein S13.e - maize >gi I 288059 | emb i CAA44311 i (X62455) 
cytoplasmatic ribosomal protein S13 [Zea mays] 

- % Identity: 98.8 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 17 66: from 1 to 8 6 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1767 

- Ceres seq_id 1599064 

- Location of start within SEQ ID NO 17 65: at 140 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 64 

- gi No. 464705 

- Description: 40S RIBOSOMAL PROTEIN S13 >gi I 41 9802 | pir | | S3014 6 
ribosomal protein S13.e - maize >gi | 288 05 9 1 emb j CAA4 4 311 I (X62455) 
cytoplasmatic ribosomal protein S13 [Zea mays] 

- % Identity: 98.8 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 1767: from 1 to 83 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 68 

- Ceres seq_id 1599065 

- Location of start within SEQ ID NO 1765: at 236 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 65 

- gi No. 464705 

- Description: 40S RIBOSOMAL PROTEIN S13 >gi 1 419802 | pir | | S30146 
ribosomal protein S13.e - maize >gi I 288059 | emb | CAA44311 1 (X62455) 
cytoplasmatic ribosomal protein S13 [Zea mays] 

- % Identity: 98.8 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 1768: from 1 to 51 

Maximum Length Sequence: 

related to: 
Clone IDs: 

316025 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 17 69 

- Ceres seq_id 1599068 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1770 

- Ceres seq_id 1599069 

- Location of start within SEQ ID NO 1769: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1771 

- Ceres seq_id 1599070 

- Location of start within SEQ ID NO 1769: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1772 

- Ceres seq_id 1599071 

- Location of start within SEQ ID NO 1769: at 118 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 66 

- gi No. 1173055 

- Description: 60S RIBOSOMAL PROTEIN Lll (L5) 
>gi I 541961 |pir M S42497 ribosomal protein Lll.e - alfalfa 
>gi 11076504 ipir i | S51819 RL5 ribosomal protein - alfalfa 

>gi i 4 632 52 | emb 1 CAA55090 i (X78284) RL5 ribosomal protein [Medicago sativa] 

- % Identity: 95.3 

- Alignment Length: 4 3 

- Location of Alignment in SEQ ID NO 1772: from 1 to 43 



Maximum Length Sequence: 

related to: 
Clone IDs: 

316197 

{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1773 

- Ceres seq_id 1599076 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1774 

- Ceres seq_id 1599077 

- Location of start within SEQ ID NO 1773: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Tubulin 

- Location within SEQ ID NO 1774: from 1 to 152 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 867 

- gi No. 542179 

- Description: alpha tubulin - maize >gi I 629837 1 pir M S39998 tubulin 
alpha chain - maize (fragment) >gi I 393401 1 emb ] CAA52158 t (X73980) alpha 
tubulin [Zea mays] 

- % Identity: 100 

- Alignment Length: 153 

- Location of Alignment in SEQ ID NO 1774: from 1 to 152 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1775 

- Ceres seq_id 1599078 

- Location of start within SEQ ID NO 1773: at 176 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Tubulin 

- Location within SEQ ID NO 1775: from 1 to 94 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 68 

- gi No. 542179 

- Description: alpha tubulin - maize >gi | 62 9837 1 pir | [S39998 tubulin 
alpha chain - maize (fragment) >gi j 393401 | emb i CAA52158 1 (X73980) alpha 
tubulin [Zea mays] 

- % Identity: 100 

- Alignment Length: 153 

- Location of Alignment in SEQ ID NO 1775: from 1 to 94 



Maximum Length Sequence: 
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related to: 
Clone IDs: 

316376 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 177 6 

- Ceres seq__id 1599084 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1777 

- Ceres seq__id 1599085 

- Location of start within SEQ ID NO 1776: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1778 

- Ceres seq_id 1599086 

- Location of start within SEQ ID NO 1776: at 127 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DnaJ domain 

- Location within SEQ ID NO 1778: from 12 to 76 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1779 

- Ceres seq_id 1599087 

- Location of start within SEQ ID NO 1776: at 263 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

316506 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1780 

- Ceres seq_id 1599092 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 81 

- Ceres seq_id 1599093 

- Location of start within SEQ ID NO 1780: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Hsp90 protein 

- Location within SEQ ID NO 1781: from 1 to 98 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 69 

- gi No. 544242 

- Description: ENDOPLASMIN HOMOLOG PRECURSOR (GRP94 HOMOLOG) 
>gi i 485498 |pir M S33533 heat shock protein 90 homolog precursor - barley 
>gi|22652|emb|CAA48143| (X67960) GRP94 homologue [ Horde urn vulgare] 

- % Identity: 81.6 

- Alignment Length: 98 
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- Location of Alignment in SEQ ID NO 1781: from 1 to 98 

Maximum Length Sequence : 

related to: 
Clone IDs: 

316629 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1782 

- Ceres seq_id 1599094 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1783 

- Ceres seq_id 1599095 

- Location of start within SEQ ID NO 1782: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1784 

- Ceres seq_id 1599096 

- Location of start within SEQ ID NO 1782: at 95 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 870 

- gi No. 2493053 

- Description: ATP SYNTHASE EPSILON CHAIN, MITOCHONDRIAL >gi 1639793 
(L39120) mitochondrial FIFO ATP synthase epsilon subunit [Zea mays] 

- % Identity: 98.2 

- Alignment Length: 57 

- Location of Alignment in SEQ ID NO 1784: from 1 to 56 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1785 

- Ceres seq_id 1599097 

- Location of start within SEQ ID NO 1782: at 140 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 871 

- gi No. 2493053 

- Description: ATP SYNTHASE EPSILON CHAIN, MITOCHONDRIAL >gi 1 639793 
(L39120) mitochondrial FIFO ATP synthase epsilon subunit [Zea mays] 

- % Identity: 98.2 

- Alignment Length: 57 

- Location of Alignment in SEQ ID NO 1785: from 1 to 41 

Maximum Length Sequence: 

related to: 
Clone IDs: 

317148 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 178 6 

- Ceres seq_id 1599107 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 87 

- Ceres seq_id 1599108 

- Location of start within SEQ ID NO 1786: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 872 

- gi No. 401043 

- Description: 40S RIBOSOMAL PROTEIN S15 >gi I 218131 | dbj | BAA01746 | 
(D10962) unnamed protein product [Oryza sativa] 

- % Identity: 88.2 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 1787: from 99 to 112 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 8 8 

- Ceres seq_id 1599109 

- Location of start within SEQ ID NO 1786: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 87 3 

- gi No. 730645 

- Description: 40S RIBOSOMAL PROTEIN S15 >gi 1 62 955 6 | pir || S4 34 12 
ribosomal protein S15 - Arabidopsis thaliana >gi | 313152 j emb j CAA8 0 67 9 1 
(Z23161) ribosomal protein S15 [Arabidopsis thaliana] 

>gi| 313188 | emb | CAA80681 | (Z23162) ribosomal thaliana] 

- % Identity: 79 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 1788: from 45 to 124 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1789 

- Ceres seq__id 1599110 

- Location of start within SEQ ID NO 1786: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 87 4 

- gi No. 401043 

- Description: 40S RIBOSOMAL PROTEIN S15 >gi | 218131 I dbj I BAA017 4 6 i 
(D10962) unnamed protein product [Oryza sativa] 

- % Identity: 88.2 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 178 9: from 60 to 73 

Maximum Length Sequence : 

related to: 
Clone IDs: 

317222 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1790 

- Ceres seq_id 1599111 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1791 

- Ceres seq_id 1599112 

- Location of start within SEQ ID NO 17 90: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 92 

- Ceres seq_id 1599113 

- Location of start within SEQ ID NO 1790: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 87 5 

- gi No. 2280480 

- Description: (AB002344) KIAA0346 [Homo sapiens] 

- % Identity: 72.2 

- Alignment Length: 18 

- Location of Alignment in SEQ ID NO 1792: from 7 to 23 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1793 

- Ceres seq_id 1599114 

- Location of start within SEQ ID NO 17 90: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 87 6 

- gi No. 544277 

- Description: FAS ANTIGEN LIGAND >gi | 1083659 | pir |[ A4 9266 fas 
ligand - rat >gi 144017 9 (U03470) ligand for Fas antigen [Rattus norvegicus] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 17 93: from 4 to 15 

Maximum Length Sequence: 

related to: 
Clone IDs: 

313090 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1794 

- Ceres seq_id 1599134 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1795 

- Ceres seq_id 1599135 

- Location of start within SEQ ID NO 1794: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1796 

- Ceres seq_id 1599136 

- Location of start within SEQ ID NO 1794: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Myb-like DNA-binding domain 

- Location within SEQ ID NO 1796: from 83 to 126 aa . 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 877 

- gi No. 2832387 
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- Description: (Z95745) R2R3-MYB transcription factor [Arabidopsis 

thaliana] 

- % Identity: 73.1 

- Alignment Length: 2 6 

- Location of Alignment in SEQ ID NO 1796: from 88 to 113 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 97 

- Ceres seq_id 1599137 

- Location of start within SEQ ID NO 1794: at 45 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Myb-like DNA-binding domain 

- Location within SEQ ID NO 1797: from 69 to 112 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 878 

- gi No. 2832387 

- Description: (Z95745) R2R3-MYB transcription factor [Arabidopsis 

thaliana] 

- % Identity: 73.1 

- Alignment Length: 2 6 

- Location of Alignment in SEQ ID NO 17 97: from 7 4 to 99 



Maximum Length Sequence : 

related to: 
Clone IDs: 

313138 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 17 98 

- Ceres seq_id 1599138 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 17 99 

- Ceres seq_id 1599139 

- Location of start within SEQ ID NO 1798: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Mucin-like glycoprotein 

- Location within SEQ ID NO 1799: from 11 to 142 aa . 



(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1800 

- Ceres seq_id 1599140 

- Location of start within SEQ ID NO 1798: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1801 

- Ceres seq_id 1599141 

- Location of start within SEQ ID NO 1798: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ras family 
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- Location within SEQ ID NO 1801: from 70 to 161 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 879 

- gi No. 303742 

- Description: (D12544) GTP-binding protein [Pisum sativum] 
>gi I 738936 iprf | j 2001457D GTP-binding protein [Pisum sativum] 

- % Identity: 93.3 

- Alignment Length: 104 

- Location of Alignment in SEQ ID NO 1801: from 58 to 161 

Maximum Length Sequence: 

related to: 
Clone IDs: 

313872 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1802 

- Ceres seq_id 1599152 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1803 

- Ceres seq_id 1599153 

- Location of start within SEQ ID NO 1802: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 880 

- gi No. 4490330 

- Description: (AL035656) splicing factor-like protein [Arabidopsis 

thaliana] 

- % Identity: 84.8 

- Alignment Length: 165 

- Location of Alignment in SEQ ID NO 1803: from 1 to 165 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1804 

- Ceres seq_id 1599154 

- Location of start within SEQ ID NO 1802: at 120 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 881 

- gi No. 4490330 

- Description: (AL035656) splicing factor-like protein [Arabidopsis 

thaliana] 

- % Identity: 84.8 

- Alignment Length: 165 

- Location of Alignment in SEQ ID NO 1804: from 1 to 126 

Maximum Length Sequence: 

related to: 
Clone IDs: 

314609 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1805 

- Ceres seq_id 1599163 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 18 0 6 

- Ceres seq_id 1599164 

- Location of start within SEQ ID NO 1805: at 2 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 82 

- gi No. 115511 

- Description: CALMODULIN >gi | 231682 | sp | P2 9612 | CALM_ORYSA 
CALMODULIN >gi | 7 1 68 2 | pir 1 i MCBH calmodulin - barley >gi I 100666 i pir | | S24 952 
calmodulin 1 (clone lambda DASH) - rice >gi | 20188 1 emb | CAA78287 | (Z12827) 
calmodulin [Oryza sativa] 

- % Identity: 100 

- Alignment Length: 33 

- Location of Alignment in SEQ ID NO 1806: from 53 to 85 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1807 

- Ceres seq_id 1599165 

- Location of start within SEQ ID NO 1805: at 23 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 883 

- gi No. 115511 

- Description: CALMODULIN >gi | 231 682 | sp i P2 9612 | CALM_ORYSA 
CALMODULIN >gi [ 7 1 682 | pir j | MCBH calmodulin - barley >gi I 100666 | pir | ! S24952 
calmodulin 1 (clone lambda DASH) - rice >gi | 20188 | emb | CAA78287 | (Z12827) 
calmodulin [Oryza sativa] 

- % Identity: 100 

- Alignment Length: 33 

- Location of Alignment in SEQ ID NO 1807: from 4 6 to 7 8 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1808 

- Ceres seq_id 1599166 

- Location of start within SEQ ID NO 1805: at 265 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- EF hand 

- Location within SEQ ID NO 1808: from 14 to 40 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 884 

- gi No. 115511 

- Description: CALMODULIN >gi | 231682 i sp | P2 9612 | CALM__ORYSA 
CALMODULIN >gi I 7 1 68 2 | pir j | MCBH calmodulin - barley >gi I 100 666 ! pir i j S24 952 
calmodulin 1 (clone lambda DASH) - rice >gi i 20188 i emb | CAA78287 | (Z12827) 
calmodulin [Oryza sativa] 

- % Identity: 100 

- Alignment Length: 68 

- Location of Alignment in SEQ ID NO 1808: from 1 to 65 

Maximum Length Sequence: 

related to: 
Clone IDs: 

314685 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1809 

- Ceres seq_id 1599167 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 1810 

- Ceres seq_id 1599168 

- Location of start within SEQ ID NO 1809: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 885 

- gi No. 2499328 

- Description: NADH-UBIQUINONE OXIDOREDUCTASE 20 KD SUBUNIT 
PRECURSOR {COMPLEX I-20KD) (CI-20KD) >gi I 1235 607 | emb | CAA654 5 1 . 1 [ (X96671) 
NADH-ubiquinone oxidoreductase [Solanum tuberosum] 

- % Identity: 86.4 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 1810: from 71 to 151 

Maximum Length Sequence: 

related to: 
Clone IDs: 

315543 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1811 

- Ceres seq_id 1599179 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1812 

- Ceres seq_id 1599180 

- Location of start within SEQ ID NO 1811: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1813 

- Ceres seq_id 1599181 

- Location of start within SEQ ID NO 1811: at 111 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 88 6 

- gi No. 3914465 

- Description: PHOTOS YSTEM I REACTION CENTRE SUBUNIT VI PRECURSOR 
(LIGHT -HARVESTING COMPLEX I 11 KD PROTEIN) (PSI-H) >gi I 2981207 (AF052076) 
photosystem I complex PsaH subunit precursor [Zea mays] 

- % Identity: 93.5 

- Alignment Length: 108 

- Location of Alignment in SEQ ID NO 1813: from 1 to 106 

Maximum Length Sequence: 

related to: 
Clone IDs: 

317293 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1814 

- Ceres seq_id 1599195 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1815 

- Ceres seq_id 1599196 

- Location of start within SEQ ID NO 1814: at 1 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1816 

- Ceres seq_id 1599198 

- Location of start within SEQ ID NO 1814: at 310 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 87 

- gi No. 2961300 

- Description: (AJ225027) ribosomal protein L24 [Cicer arietinum] 

- % Identity: 81 

- Alignment Length: 8 4 

- Location of Alignment in SEQ ID NO 1816: from 1 to 38 

Maximum Length Sequence : 

related to: 
Clone IDs: 

317975 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1817 

- Ceres seq_id 1599222 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1818 

- Ceres seq_id 1599223 

- Location of start within SEQ ID NO 1817: at 68 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1819 

- Ceres seq_id 1599224 

- Location of start within SEQ ID NO 1817: at 135 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

{ B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1820 

- Ceres seq_id 1599225 

- Location of start within SEQ ID NO 1817: at 265 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ribosomal protein L13e 

- Location within SEQ ID NO 1820: from 1 to 58 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 88 

- gi No. 730450 

- Description: 60S RIBOSOMAL PROTEIN L13-2 (COLD INDUCED PROTEIN 
C24B) >gi j 480649|pir i ! S37134 cold-induced protein BnC24B - rape 

>gi I 398922 | emb i CAA80343 | (Z22620) cold induced protein (BnC24B) [Brassica 
napus ] 
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- % Identity: 88 

- Alignment Length: 117 

- Location of Alignment in SEQ ID NO 1820: from 1 to 58 



Maximum Length Sequence: 

related to: 
Clone IDs: 

317995 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1821 

- Ceres seq_id 1599226 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1822 

- Ceres seq_id 1599227 

- Location of start within SEQ ID NO 1821: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L2 3 

- Location within SEQ ID NO 1822: from 108 to 161 aa. 

{Dp} Related Amino Acid Sequences 

- Alignment No. 88 9 

- gi No. 585876 

- Description: 60S RIBOSOMAL PROTEIN L23A (L25) 

>gi I 1084424 jpir | 1S48026 ribosomal protein L25 - common tobacco >gi 1310935 
(L18908) 60S ribosomal protein L25 [Nicotiana tabacum] 

- % Identity: 82.5 

- Alignment Length: 12 6 

- Location of Alignment in SEQ ID NO 1822: from 39 to 161 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1823 

- Ceres seq_id 1599228 

- Location of start within SEQ ID NO 1821: at 116 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L23 

- Location within SEQ ID NO 1823: from 70 to 123 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 8 90 

- gi No. 585876 

- Description: 60S RIBOSOMAL PROTEIN L23A (L25) 

>gi i 1084424 Ipirt I S48026 ribosomal protein L25 - common tobacco >gi|310935 
(L18908) 60S ribosomal protein L25 [Nicotiana tabacum] 

- % Identity: 82.5 

- Alignment Length: 12 6 

- Location of Alignment in SEQ ID NO 1823: from 1 to 123 



Maximum Length Sequence: 

related to: 
Clone IDs: 

318140 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1824 

- Ceres seq_id 1599235 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1825 

- Ceres seq_id 1599236 
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- Location of start within SEQ ID NO 1824: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 91 

- gi No. 1350720 

- Description: 60S RIBOSOMAL PROTEIN L32 

- % Identity: 80.2 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 1825: from 24 to 134 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1826 

- Ceres seq__id 1599237 

- Location of start within SEQ ID NO 1824: at 64 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 92 

- gi No. 1350720 

- Description: 60S RIBOSOMAL PROTEIN L32 

- % Identity: 80.2 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 1826: from 3 to 113 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1827 

- Ceres seq__id 1599238 

- Location of start within SEQ ID NO 1824: at 220 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 8 93 

- gi No. 1350720 

- Description: 60S RIBOSOMAL PROTEIN L32 

- % Identity: 80.2 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 1827: from 1 to 61 



Maximum Length Sequence: 

related to: 
Clone IDs: 

318251 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1828 

- Ceres seq_id 1599243 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1829 

- Ceres seq_id 1599244 

- Location of start within SEQ ID NO 1828: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S10 

- Location within SEQ ID NO 1829: from 54 to 149 aa. 



(Dp) Related Amino Acid Sequences 
- Alignment No. 8 94 
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- gi No. 548851 

- Description: 40S RIBOSOMAL PROTEIN S20 >gi I 48122 6 1 pir | | S38356 
ribosomal protein S20 - rice >gi i 391875 ! dbj | BAA02157 j (D12632) 40S subunit 
ribosomal protein [Oryza sativa] 

- % Identity: 83.3 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 1829: from 40 to 153 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1830 

- Ceres seq_id 1599245 

- Location of start within SEQ ID NO 1828: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1831 

- Ceres seq_id 1599246 

- Location of start within SEQ ID NO 1828: at 85 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S10 

- Location within SEQ ID NO 1831: from 26 to 121 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 8 95 

- gi No. 548851 

- Description: 40S RIBOSOMAL PROTEIN S20 >gi I 4 8 122 6 i pir | I S3835 6 
ribosomal protein S20 - rice >gi I 391875 I dbj I BAA02157 ! (D12632) 40S subunit 
ribosomal protein [Oryza sativa] 

- % Identity: 83.3 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 1831: from 12 to 125 



Maximum Length Sequence: 

related to: 
Clone IDs: 

318670 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1832 

- Ceres seq_id 1599263 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1833 

- Ceres seq__id 1599264 

- Location of start within SEQ ID NO 1832: at 90 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 1833: from 1 to 56 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 896 

- gi No. 2497897 

- Description: METALLOTHIONEIN-LIKE PROTEIN TYPE 2 B >gi| 1449138 
(L77966) homologue [Lycopersicon esculentum] 

- % Identity: 70 

- Alignment Length: 50 
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- Location of Alignment in SEQ ID NO 1833: from 1 to 50 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1834 

- Ceres seq_id 1599265 

- Location of start within SEQ ID NO 1832: at 162 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 97 

- gi No. 2497897 

- Description: METALLOTH I ONE IN-LIKE PROTEIN TYPE 2 B >gi 11449138 
(L77 966) homologue [Lycopersicon esculentum] 

- % Identity: 70 

- Alignment Length: 50 

- Location of Alignment in SEQ ID NO 1834: from 1 to 26 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1835 

- Ceres seq_id 1599266 

- Location of start within SEQ ID NO 1832: at 174 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 98 

- gi No. 2497897 

- Description: METALLOTH I ONE IN- LIKE PROTEIN TYPE 2 B >gi [ 1449138 
(L77 966) homologue [Lycopersicon esculentum] 

- % Identity: 70 

- Alignment Length: 5 0 

- Location of Alignment in SEQ ID NO 1835: from 1 to 22 



Maximum Length Sequence: 

related to: 
Clone IDs: 

318674 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1836 

- Ceres seq__id 1599267 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1837 

- Ceres seq_id 1599268 

- Location of start within SEQ ID NO 1836: at 100 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1838 

- Ceres seq_id 1599269 

- Location of start within SEQ ID NO 1836: at 254 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 8 99 

- gi No. 102820 
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- Description: major ampullate fibroin protein - orb spider 
(Nephila clavipes) (fragment) >gi 1 159712 (M37137) dragline silk fibroi 
[Nephila clavipes] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 1838: from 35 to 4 6 

Maximum Length Sequence: 

related to: 
Clone IDs: 

318828 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1839 

- Ceres seq_id 1599274 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1840 

- Ceres seq__id 1599275 

- Location of start within SEQ ID NO 1839: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1841 

- Ceres seq_id 1599276 

- Location of start within SEQ ID NO 1839: at 90 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Transcription factor S-II (TFIIS) 

- Location within SEQ ID NO 1841: from 4 to 113 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1842 

- Ceres seq_id 1599277 

- Location of start within SEQ ID NO 1839: at 126 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s ) 

- Transcription factor S-II (TFIIS) 

- Location within SEQ ID NO 1842: from 1 to 101 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

318947 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1843 

- Ceres seq_id 1599278 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1844 

- Ceres seq_id 1599279 

- Location of start within SEQ ID NO 1843: at 3 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 90 0 

- gi No. 5456980 

- Description: (AB024370) ORF1 [TT virus] 

- % Identity: 72.2 

- Alignment Length: 18 

- Location of Alignment in SEQ ID NO 1844: from 118 to 135 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1845 

- Ceres seq_id 1599280 

- Location of start within SEQ ID NO 1843: at 91 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 1845: from 17 to 131 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 901 

- gi No. 122085 

- Description: HISTONE H3 >gi j 81 641 | pir j | SO 6250 histone H3 - 
Arabidopsis thaliana >gi 1 82482 | pir | j S04099 histone H3 (variant H3R-21) - ri 
>gi I 1362194 Ipir | | S57626 histone H3 - maize >gi | 20251 | emb i CAA31969 [ (X13678) 
histone H3 (AA 1-136) [Oryza 

- % Identity: 97.7 

- Alignment Length: 132 

- Location of Alignment in SEQ ID NO 1845: from 1 to 131 

Maximum Length Sequence: 

related to: 
Clone IDs: 

319315 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 184 6 

- Ceres seq_id 1599297 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1847 

- Ceres seq__id 1599298 

- Location of start within SEQ ID NO 1846: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1848 

- Ceres seq_id 1599299 

- Location of start within SEQ ID NO 1846: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 902 

- gi No. 2431767 

- Description: (U62751) acidic ribosomal protein P3a [Zea mays] 

- % Identity: 97.9 

- Alignment Length: 4 8 

- Location of Alignment in SEQ ID NO 1848: from 78 to 123 
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(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 1849 

- Ceres seq_id 1599300 

- Location of start within SEQ ID NO 1846: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 903 

- gi No. 2431767 

- Description: (U62751) acidic ribosomal protein P3a [Zea mays] 

- % Identity: 94 

- Alignment Length: 50 

- Location of Alignment in SEQ ID NO 184 9: from 28 to 77 

Maximum Length Sequence : 

related to: 
Clone IDs: 

319418 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1850 

- Ceres seq_id 1599315 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 18 51 

- Ceres seq_id 1599316 

- Location of start within SEQ ID NO 1850: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Ubiquitin-conj ugating enzyme 

- Location within SEQ ID NO 1851: from 92 to 151 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 904 

- gi No. 2982253 

- Description: (AF051209) CROC-l-like protein [Picea mariana] 

- % Identity: 84.7 

- Alignment Length: 98 

- Location of Alignment in SEQ ID NO 1851: from 61 to 157 

Maximum Length Sequence: 

related to: 
Clone IDs: 

319432 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1852 

- Ceres seq_id 1599317 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1853 

- Ceres seq_id 1599318 

- Location of start within SEQ ID NO 1852: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1854 

- Ceres seq_id 1599319 

- Location of start within SEQ ID NO 1852: at 114 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 905 

- gi No. 128388 

- Description: NONSPECIFIC LIP ID-TRANSFER PROTEIN PRECURSOR (LTP) 
( PHOSPHOLIPID TRANSFER PROTEIN) (PLTP) >gi \ 82711 | pir i |A31779 phospholipid 
transfer protein 9C2 precursor - maize >gi 1168576 (J04176) phospholipid 
transfer protein precursor [Zea mays] 

- % Identity: 100 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 1854: from 1 to 12 

Maximum Length Sequence : 

related to: 
Clone IDs: 

319452 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1855 

- Ceres seq_id 1599320 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1856 

- Ceres seq_id 1599321 

- Location of start within SEQ ID NO 1855: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ribosomal protein S8 

- Location within SEQ ID NO 1856: from 43 to 133 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1857 

- Ceres seq_id 1599322 

- Location of start within SEQ ID NO 1855: at 112 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein 38 

- Location within SEQ ID NO 1857: from 6 to 96 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1858 

- Ceres seq_id 1599323 

- Location of start within SEQ ID NO 1855: at 148 nt - 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribosomal protein S8 

- Location within SEQ ID NO 1858: from 1 to 84 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

319520 

(Ac) cDNA Polynucleotide Sequence 
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- Pat. Appln. SEQ ID NO 1859 

- Ceres seq_id 1599324 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1860 

- Ceres seq_id 1599325 

- Location of start within SEQ ID NO 1859: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1861 

- Ceres seq_id 1599326 

- Location of start within SEQ ID NO 1859: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 1861: from 62 to 114 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 90 6 

- gi No. 3046731 

- Description; (AJ005373) protein kinase [Craterostigma 
plantagineum] 

- % Identity: 92.9 

- Alignment Length: 5 6 

- Location of Alignment in SEQ ID NO 1861: from 59 to 114 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 18 62 

- Ceres seq_id 1599327 

- Location of start within SEQ ID NO 1859: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 7 transmembrane receptor {rhodopsin family) 

- Location within SEQ ID NO 1862: from 10 to 89 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

320152 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1863 

- Ceres seq_id 1599360 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 18 64 

- Ceres seq_id 1599361 

- Location of start within SEQ ID NO 18 63: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ubiquitin-conj ugating enzyme 

- Location within SEQ ID NO 1864: from 45 to 157 aa . 



(Dp) Related Amino Acid Sequences 
- Alignment No. 907 
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- gi No. 1174162 

- Description: (U44976) ubiguitin-conjugating enzyme [Arabidopsis 
thaliana] >gi 13746915 (AF091106) E2 ubiquitin-conj ugating-like enzyme 
[Arabidopsis thaliana] 

- % Identity: 92 

- Alignment Length: 113 

- Location of Alignment in SEQ ID NO 1864: from 45 to 157 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1865 

- Ceres seq_id 1599362 

- Location of start within SEQ ID NO 18 63: at 225 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquit in-conjugating enzyme 

- Location within SEQ ID NO 1865: from 1 to 83 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 908 

- gi No. 1174162 

- Description: (U44 976) ubiquitin-conj ugating enzyme [Arabidopsis 
thaliana] >gi ! 3746915 (AF091106) E2 ubiquitin-conj ugating-like enzyme 
[Arabidopsis thaliana] 

- % Identity: 92 

- Alignment Length: 113 

- Location of Alignment in SEQ ID NO 18 65: from 1 to 83 

Maximum Length Sequence : 

related to: 
Clone IDs: 

320219 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 18 66 

- Ceres seq_id 1599363 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1867 

- Ceres seq_id 1599364 

- Location of start within SEQ ID NO 1866: at 76 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Sugar (and other) transporter 

- Location within SEQ ID NO 1867: from 48 to 103 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 18 68 

- Ceres seq_id 1599365 

- Location of start within SEQ ID NO 1866: at 91 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sugar (and other) transporter 

- Location within SEQ ID NO 18 68: from 43 to 98 aa. 
(Dp) Related Amino Acid Sequences 



Maximum Length Sequence : 
related to: 
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Clone IDs: 

320282 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 18 69 

- Ceres seq_id 1599373 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1870 

- Ceres seq_id 1599374 

- Location of start within SEQ ID NO 1869: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Calreticulin family 

- Location within SEQ ID NO 1870: from 50 to 150 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 90 9 

- gi No. 510907 

- Description: (Z35108) calnexin [Helianthus tuberosus] 

- % Identity: 70.3 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 1870: from 35 to 150 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1871 

- Ceres seq_id 1599375 

- Location of start within SEQ ID NO 1869: at 76 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Calreticulin family 

- Location within SEQ ID NO 1871: from 25 to 125 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 910 

- gi No. 510907 

- Description: (Z35108) calnexin [Helianthus tuberosus] 

- % Identity: 70.3 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 1871: from 10 to 125 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1872 

- Ceres seq_id 1599376 

- Location of start within SEQ ID NO 18 69: at 7 9 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Calreticulin family 

- Location within SEQ ID NO 1872: from 24 to 124 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 911 

- gi No. 510907 

- Description: (Z35108) calnexin [Helianthus tuberosus] 

- % Identity: 70.3 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 1872: from 9 to 124 



Maximum Length Sequence: 
related to: 
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Clone IDs: 

320388 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1873 

- Ceres seq_id 1599385 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1874 

- Ceres seq_id 1599386 

- Location of start within SEQ ID NO 1873: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 912 

- gi No. 4090257 

- Description: (AJ131732) ribosomal protein L37A [Pseudotsuga 



menziesii] 

- % Identity: 98.5 

- Alignment Length: 68 

- Location of Alignment in SEQ ID NO 1874: from 29 to 96 

{B} Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1875 

- Ceres seq_id 1599387 

- Location of start within SEQ ID NO 1873: at 85 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 913 

- gi No. 4090257 

- Description: (AJ131732) ribosomal protein L37A [Pseudotsuga 



menziesii] 

- % Identity: 98.5 

- Alignment Length: 68 

- Location of Alignment in SEQ ID NO 1875: from 1 to 68 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 187 6 

- Ceres seq_id 1599388 

- Location of start within SEQ ID NO 1873: at 169 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 914 

- gi No. 4090257 

- Description: (AJ131732) ribosomal protein L37A [Pseudotsuga 



- % Identity: 98.5 

- Alignment Length: 68 

- Location of Alignment in SEQ ID NO 1876: from 1 to 40 



Maximum Length Sequence: 

related to: 
Clone IDs: 

320423 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1877 

- Ceres seq_id 1599393 
(B) Polypeptide Sequence 



menziesii] 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 

- Pat. Appln. SEQ ID NO 1878 

- Ceres seq_id 1599394 

- Location of start within SEQ ID NO 1877: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 915 

- gi No. 2137785 

- Description: SmD homolog, liver - mouse (fragment) 

>gi ! 557920 |bbs | 150447 (S71494) SmD homolog {Gly-Arg repeat} [mice, liver, 
Peptide Partial, 59 aa] [Mus sp.] 

- % Identity: 78.9 

- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 1878: from 108 to 126 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 187 9 

- Ceres seq_id 1599395 

- Location of start within SEQ ID NO 1877: at 80 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 916 

- gi No. 2137785 

- Description: SmD homolog, liver - mouse (fragment) 

>gi | 557920 |bbs | 150447 (S71494) SmD homolog {Gly-Arg repeat} [mice, liver, 
Peptide Partial, 59 aa] [Mus sp.] 

- % Identity: 78.9 

- Alignment Length; 19 

- Location of Alignment in SEQ ID NO 187 9: from 82 to 100 

Maximum Length Sequence; 

related to: 
Clone IDs: 

320538 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1880 

- Ceres seq_id 1599396 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1881 

- Ceres seq_id 1599397 

- Location of start within SEQ ID NO 1880: at 229 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 917 

- gi No. 3914431 

- Description: PROTEASOME COMPONENT C8 (MACRO PA IN SUBUNIT C8) 
(MULT I CATALYTIC ENDOPEPTIDASE COMPLEX SUBUNIT C8 ) >gi [ 228 5 8 02 | dbj I BAA21651 | 
(D78173) 26S proteasome alpha subunit [Spinacia oleracea] 

- % Identity; 92 

- Alignment Length: 7 5 

- Location of Alignment in SEQ ID NO 1881: from 1 to 74 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1882 

- Ceres seq_id 1599398 

- Location of start within SEQ ID NO 1880: at 251 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

320628 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1883 

- Ceres seq_id 1599403 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1884 

- Ceres seq_id 1599404 

- Location of start within SEQ ID NO 1883: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1885 

- Ceres seq_id 1599405 

- Location of start within SEQ ID NO 1883: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1886 

- Ceres seq_id 1599406 

- Location of start within SEQ ID NO 1883: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 1886: from 29 to 64 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 918 

- gi No. 3694984 

- Description: (AF093585) metallothionein-l-like protein 
[Pimpinella brachycarpa] 

- % Identity: 78.3 

- Alignment Length: 2 3 

- Location of Alignment in SEQ ID NO 188 6: from 2 6 to 4 8 

Maximum Length Sequence: 

related to: 
Clone IDs: 

320722 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1887 

- Ceres seq_id 1599416 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1888 

- Ceres seq_id 1599417 

- Location of start within SEQ ID NO 1887: at 1 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 919 

- gi No. 2285792 

- Description: (AB004568) cyanase [Arabidopsis thaliana] 
>gi | 3287503 1 dbj I BAA3122 4 | (AB015748) cyanase [Arabidopsis thaliana] 

- % Identity: 75.2 

- Alignment Length: 12 9 

- Location of Alignment in SEQ ID NO 1888: from 41 to 168 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1889 
_ ceres seq_id 1599418 

- Location of start within SEQ ID NO 1887: at 121 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 920 

- gi No. 2285792 

- Description: (AB004568) cyanase [Arabidopsis thaliana] 
>gi|3287503tdbj [BAA31224I (AB015748) cyanase [Arabidopsis thaliana] 

- % Identity: 75.2 

- Alignment Length: 12 9 

- Location of Alignment in SEQ ID NO 188 9: from 1 to 128 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 18 90 

- Ceres seq_id 1599419 

- Location of start within SEQ ID NO 1887: at 163 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 921 

- gi No. 2285792 

- Description: (AB004568) cyanase [Arabidopsis thaliana] 
>gi I 3287503! dbj |BAA31224j (AB015748) cyanase [Arabidopsis thaliana] 

- % Identity: 75.2 

- Alignment Length: 12 9 

- Location of Alignment in SEQ ID NO 1890: from 1 to 114 

Maximum Length Sequence: 

related to: 
Clone IDs: 

320798 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 18 91 

- Ceres seq_id 1599424 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 18 92 

- Ceres seq__id 1599425 

- Location of start within SEQ ID NO 1891: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 1892: from 30 to 126 aa. 
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(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 18 93 

- Ceres seq_id 1599426 

- Location of start within SEQ ID NO 1891: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1894 

- Ceres seq_id 1599427 

- Location of start within SEQ ID NO 1891: at 93 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Flavin-binding monooxygenase-like 

- Location within SEQ ID NO 18 94: from 7 to 117 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

320985 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 18 95 

- Ceres seq__id 1599434 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1896 

- Ceres seq_id 1599435 

- Location of start within SEQ ID NO 1895: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 922 

- gi No. 3153821 

- Description: (AF062655) plenty-of -prolines-101 ; POP101; SH3- 
philo-protein [Mus musculus] 

- % Identity: 70.6 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 18 96: from 17 to 32 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 18 97 

- Ceres seq_id 1599436 

- Location of start within SEQ ID NO 1895: at 106 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 18 98 

- Ceres seq__id 1599437 

- Location of start within SEQ ID NO 18 95: at 191 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 1898: from 15 to 102 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

320997 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1899 

- Ceres seq_id 1599438 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1900 

- Ceres seq_id 1599439 

- Location of start within SEQ ID NO 1899: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1901 

- Ceres seq_id 1599440 

- Location of start within SEQ ID NO 1899: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1902 

- Ceres seq__id 1599441 

- Location of start within SEQ ID NO 1899: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 1902: from 7 to 128 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs : 

321137 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1903 

- Ceres seq_id 1599443 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1904 

- Ceres seq_id 1599444 

- Location of start within SEQ ID NO 1903: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 923 

- gi No. 1168940 

- Description: CHORISMATE MUTASE PRECURSOR (CM-1) 

>gi 1 629509 j pir M S38 958 chorismate mutase precursor - Arabidopsis thaliana 



Table 1 
Page 4 





Attorney Docket No, 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 402 



>gi 1 429153 | emb | CAA81286 i (Z26519) chorismate mutase precursor [Arabidopsis 
thaliana] 

- % Identity: 72.1 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 1904: from 7 6 to 160 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1905 

- Ceres seq_id 1599445 

- Location of start within SEQ ID NO 1903: at 81 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 92 4 

- gi No. 1168940 

- Description: CHORISMATE MUTASE PRECURSOR (CM-1) 

>gi I 629509 jpir | | S38958 chorismate mutase precursor - Arabidopsis thaliana 
>gi I 429153 | emb | CAA8128 6 | (Z26519) chorismate mutase precursor [Arabidopsis 
thaliana] 

- % Identity: 72.1 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 1905: from 50 to 134 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 190 6 

- Ceres seq_id 1599446 

- Location of start within SEQ ID NO 1903: at 237 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 92 5 

- gi No. 1168940 

- Description: CHORISMATE MUTASE PRECURSOR (CM-1) 

>gi | 62 950 9 ! pir | 1 S3 8 958 chorismate mutase precursor - Arabidopsis thaliana 
>gi| 429153jemb|CAA81286| (Z26519) chorismate mutase precursor [Arabidopsis 
thaliana] 

- % Identity: 72.1 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 1906: from 1 to 82 



Maximum Length Sequence: 

related to: 
Clone IDs: 

321178 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1907 

- Ceres seq_id 1599450 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1908 

- Ceres seq_id 1599451 

- Location of start within SEQ ID NO 1907: at 67 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 926 

- gi No. 544423 

- Description: GLYCINE- RICH RNA-BINDING PROTEIN 2 

>gi 1 485421 j pir | ! S12312 glycine-rich RNA-binding protein (clone S2) - sorghum 
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>gij21625jemb!CAA40862| (X57662) glycine-rich RNA-binding protein [Sorghum 
bicolor] 

- % Identity: 71.4 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 1908: from 12 to 32 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1909 

- Ceres seq_id 1599452 

- Location of start within SEQ ID NO 1907: at 77 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1910 

- Ceres seq_id 1599453 

- Location of start within SEQ ID NO 1907: at 113 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

321207 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1911 

- Ceres seq_id 1599454 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1912 

- Ceres seq_id 1599455 

- Location of start within SEQ ID NO 1911: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1913 

- Ceres seq_id 1599456 

- Location of start within SEQ ID NO 1911: at 148 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 927 

- gi No. 2737973 

- Description: (U83625) protein kinase ZmMEKl [Zea mays] 

- % Identity: 100 

- Alignment Length: 101 

- Location of Alignment in SEQ ID NO 1913: from 1 to 100 

Maximum Length Sequence: 

related to: 
Clone IDs: 

321258 

(Ac) cDNA Polynucleotide Sequence 
- Pat. Appln. SEQ ID NO 1914 
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- Ceres seq_id 1599461 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1915 

- Ceres seq_id 1599462 

- Location of start within SEQ ID NO 1914: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1916 

- Ceres seq_id 1599463 

- Location of start within SEQ ID NO 1914: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 928 

- gi No. 5732922 

- Description: (AF167708) excretory/secretory mucin MUC-3 [Toxoca 

canis ] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 1916: from 6 to 16 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1917 

- Ceres seq_id 1599464 

- Location of start within SEQ ID NO 1914: at 35 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

321382 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1918 

- Ceres seq_id 1599474 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1919 

- Ceres seq_id 1599475 

- Location of start within SEQ ID NO 1918: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1920 

- Ceres seq_id 1599476 

- Location of start within SEQ ID NO 1918: at 205 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ras family 

- Location within SEQ ID NO 1920: from 11 to 86 aa . 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 92 9 

- gi No. 3341681 

- Description: (AC003672) small GTP-binding protein [Arabidopsis 
thaliana] >gi i 741994 i prf | | 2008312A GTP-binding protein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 87 

- Location of Alignment in SEQ ID NO 1920: from 1 to 86 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1921 

- Ceres seq__id 1599477 

- Location of start within SEQ ID NO 1918: at 292 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ras family 

- Location within SEQ ID NO 1921: from 1 to 57 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 930 

- gi No. 3341681 

- Description: (AC003672) small GTP-binding protein [Arabidopsis 
thaliana] >gi I 7 41 994 | prf | | 2008312A GTP-binding protein [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 87 

- Location of Alignment in SEQ ID NO 1921: from 1 to 57 

Maximum Length Sequence: 

related to: 
Clone IDs: 

321524 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1922 

- Ceres seq_id 1599487 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1923 

- Ceres seq_id 1599488 

- Location of start within SEQ ID NO 1922: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1924 

- Ceres seq_id 1599489 

- Location of start within SEQ ID NO 1922: at 90 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribonucleotide reductases 

- Location within SEQ ID NO 1924: from 47 to 135 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 931 

- gi No. 1710401 

- Description: RIBONUCLEOS I DE-DI PHOSPHATE REDUCTASE SMALL CHAIN 
(RIBONUCLEOTIDE REDUCTASE) (R2 SUBUNIT) >gi | 10 4 4 912 | emb | CAA631 94 | (X92443) 
ribonucleotide reductase R2 [Nicotiana tabacum] 

- % Identity: 80.3 
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- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 1924: from 10 to 135 

{B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1925 

- Ceres seq_id 1599490 

- Location of start within SEQ ID NO 1922: at 168 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribonucleotide reductases 

- Location within SEQ ID NO 1925: from 21 to 109 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 932 

- gi No. 1710401 

- Description: RIBONUCLEOSIDE-DIPHOSPHATE REDUCTASE SMALL CHAIN 
{RIBONUCLEOTIDE REDUCTASE) (R2 SUBUNIT) >gi | 1044 912 | emb | CAA63194 ! {X92443} 
ribonucleotide reductase R2 [Nicotiana tabacum] 

- % Identity: 80.3 

- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 1925: from 1 to 109 

Maximum Length Sequence: 

related to: 
Clone IDs: 

321622 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1926 

- Ceres seq_id 1599491 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1927 

- Ceres seq_id 1599492 

- Location of start within SEQ ID NO 1926: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1928 

- Ceres seq_id 1599493 

- Location of start within SEQ ID NO 1926: at 238 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Germin family 

- Location within SEQ ID NO 1928: from 30 to 87 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 933 

- gi No. 2655291 

- Description: (AF032974) germin-like protein 4 [Oryza sativa] 

- % Identity: 81.6 

- Alignment Length: 87 

- Location of Alignment in SEQ ID NO 1928: from 1 to 87 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1929 

- Ceres seq_id 1599494 

- Location of start within SEQ ID NO 1926: at 322 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Germin family 

- Location within SEQ ID NO 1929: from 2 to 59 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 934 

- gi No. 2655291 

- Description: (AF032974) germin-like protein 4 [Oryza sativa] 

- % Identity: 81.6 

- Alignment Length: 87 

- Location of Alignment in SEQ ID NO 1929: from 1 to 59 



Maximum Length Sequence: 

related to: 
Clone IDs: 

321649 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1930 

- Ceres seq_id 1599495 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1931 

- Ceres seq_id 1599496 

- Location of start within SEQ ID NO 1930: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Protamine PI 

- Location within SEQ ID NO 1931: from 89 to 155 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 935 

- gi No. 3023480 

- Description: VOLTAGE -GAT ED POTASSIUM CHANNEL PROTEIN KV3 . 3 
(KSHIIID) >gi | 205045 (M84210) [Rattus norvegicus mRNA sequence.], gene 
product [Rattus norvegicus] >gi I 228 958 1 prf i 1 18144 98A voltage-activating K 
channel [Rattus norvegicus] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 1931: from 47 to 57 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1932 

- Ceres seq_id 1599497 

- Location of start within SEQ ID NO 1930: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- short chain dehydrogenase 

- Location within SEQ ID NO 1932: from 85 to 169 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 936 

- gi No. 2586129 

- Description: (U89509) b-keto acyl reductase [Zea mays] 

- % Identity: 100 

- Alignment Length: 142 

- Location of Alignment in SEQ ID NO 1932: from 29 to 169 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 1933 

- Ceres seq__id 1599498 

- Location of start within SEQ ID NO 1930: at 87 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- short chain dehydrogenase 

- Location within SEQ ID NO 1933: from 57 to 141 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 937 

- gi No. 2586129 

- Description: (U89509) b-keto acyl reductase [Zea mays] 

- % Identity: 100 

- Alignment Length: 142 

- Location of Alignment in SEQ ID NO 1933: from 1 to 141 

Maximum Length Sequence: 

related to: 
Clone IDs: 

321802 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1934 

- Ceres seq_id 1599507 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1935 

- Ceres seq_id 1599508 

- Location of start within SEQ ID NO 1934: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1936 

- Ceres seq_id 1599509 

- Location of start within SEQ ID NO 1934: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1937 

- Ceres seq_id 1599510 

- Location of start within SEQ ID NO 1934: at 248 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 938 

- gi No. 133867 

- Description: 40S RIBOSOMAL PROTEIN Sll >gi | 8 2722 i pir j | SI 657 7 
ribosomal protein Sll - maize >gi | 22470 I emb I CAA39438 I (X55967) ribosomal 
protein Sll [Zea mays] 

- % Identity: 100 

- Alignment Length: 8 9 

- Location of Alignment in SEQ ID NO 1937: from 1 to 88 

Maximum Length Sequence: 
related to: 
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Clone IDs: 

322059 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1938 

- Ceres seq_id 1599520 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1939 

- Ceres seq_id 1599521 

- Location of start within SEQ ID NO 1938: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Prion protein 

- Location within SEQ ID NO 1939: from 18 to 105 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 939 

- gi No. 100219 

- Description: glycine-rich protein (clone uK-4) - tomato 
>gi| 1345534 j emb | CAA39225 | (X55696) glycine-rich protein [Lycopersicon 
esculentum] 

- % Identity: 71.1 

- Alignment Length: 47 

- Location of Alignment in SEQ ID NO 1939: from 45 to 8 9 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1940 

- Ceres seq_id 1599522 

- Location of start within SEQ ID NO 1938: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1941 

- Ceres seq_id 1599523 

- Location of start within SEQ ID NO 1938: at 63 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

322088 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1942 

- Ceres seq_id 1599524 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1943 

- Ceres seq_id 1599525 

- Location of start within SEQ ID NO 1942: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 194 4 
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- Ceres seq_id 1599526 

- Location of start within SEQ ID NO 1942: at 201 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 94 0 

- gi No. 2190548 

- Description: (AC001229) EST gbiATTS1121 comes from this gene. 
[Arabidopsis thaliana] 

- % Identity: 84.2 

- Alignment Length: 57 

- Location of Alignment in SEQ ID NO 1944: from 30 to 85 

Maximum Length Sequence : 

related to: 
Clone IDs: 

322114 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1945 

- Ceres seq_id 1599530 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 194 6 

- Ceres seq_id 1599531 

- Location of start within SEQ ID NO 1945: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1947 

- Ceres seq_id 1599532 

- Location of start within SEQ ID NO 1945: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1948 

- Ceres seq_id 1599533 

- Location of start within SEQ ID NO 1945: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Mucin-like glycoprotein 

- Location within SEQ ID NO 1948: from 78 to 132 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

322228 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 194 9 

- Ceres seq_id 1599544 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1950 

- Ceres seq_id 1599545 
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- Location of start within SEQ ID NO 194 9: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Inorganic pyrophosphatase 

- Location within SEQ ID NO 1950: from 93 to 158 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 941 

- gi No. 5669924 

- Description: (AF149116) soluble inorganic pyrophosphatase 
[Populus tremula x Populus tremuloides] 

- % Identity: 83.5 

- Alignment Length: 109 

- Location of Alignment in SEQ ID NO 1950: from 51 to 158 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1951 

- Ceres seq_id 1599546 

- Location of start within SEQ ID NO 194 9: at 125 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Inorganic pyrophosphatase 

- Location within SEQ ID NO 1951: from 52 to 117 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 942 

- gi No. 5669924 

- Description: (AF149116) soluble inorganic pyrophosphatase 
[Populus tremula x Populus tremuloides] 

- % Identity: 83.5 

- Alignment Length: 109 

- Location of Alignment in SEQ ID NO 1951: from 10 to 117 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1952 

- Ceres seq_id 1599547 

- Location of start within SEQ ID NO 1949: at 206 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Inorganic pyrophosphatase 

- Location within SEQ ID NO 1952: from 25 to 90 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 943 

- gi No. 5669924 

- Description: (AF149116) soluble inorganic pyrophosphatase 
[Populus tremula x Populus tremuloides] 

- % Identity: 83.5 

- Alignment Length: 109 

- Location of Alignment in SEQ ID NO 1952: from 1 to 90 



Maximum Length Sequence: 

related to: 
Clone IDs: 

322260 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1953 

- Ceres seq_id 1599548 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 195 4 

- Ceres seq__id 1599549 

- Location of start within SEQ ID NO 1953: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypept ide ( s ) 

- Collagen triple helix repeat (20 copies) 

- Location within SEQ ID NO 1954: from 6 to 51 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 94 4 

- gi No. 2879816 

- Description: (AJ000515) cyclic nucleotide-gated cation channel 
beta subunit [Rattus norvegicus] >gi 13192883 (AF068572) cyclic nucleotide- 
gated channel beta subunit lb [Rattus norvegicus] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 1954: from 1 to 11 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1955 

- Ceres seq_id 1599550 

- Location of start within SEQ ID NO 1953: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3 /H4 

- Location within SEQ ID NO 1955: from 33 to 148 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 94 5 

- gi No. 1708107 

- Description: HISTONE H2B >gi | 473605 (008226) histone H2B [Zea 

mays] 

- % Identity: 98.7 

- Alignment Length: 14 9 

- Location of Alignment in SEQ ID NO 1955: from 1 to 14 9 



Maximum Length Sequence : 

related to: 
Clone IDs: 

322311 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 195 6 

- Ceres seq_id 1599551 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1957 

- Ceres seq_id 1599552 

- Location of start within SEQ ID NO 1956: at 114 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 1957: from 39 to 106 aa. 



(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1958 

- Ceres seq_id 1599553 
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- Location of start within SEQ ID NO 1956: at 187 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1959 

- Ceres seq_id 1599554 

- Location of start within SEQ ID NO 1956: at 196 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

322312 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1960 

- Ceres seq_id 1599555 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1961 

- Ceres seq_id 1599556 

- Location of start within SEQ ID NO 1960: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1962 

- Ceres seq__id 1599557 

- Location of start within SEQ ID NO 1960: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1963 

- Ceres seq_id 1599558 

- Location of start within SEQ ID NO 1960: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 1963: from 2 to 106 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

322422 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1964 

- Ceres seq__id 1599563 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1965 
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- Ceres seq__id 1599564 

- Location of start within SEQ ID NO 1964: at 98 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 94 6 

- gi No. 1352605 

- Description: NADH-UBIQUINONE OXIDOREDUCTASE 18 KD SUBUNIT 
(COMPLEX I-18KD) (CI-18KD) 

- % Identity: 86.7 

- Alignment Length: 30 

- Location of Alignment in SEQ ID NO 1965: from 29 to 58 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1966 

- Ceres seq__id 1599565 

- Location of start within SEQ ID NO 1964: at 167 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 947 

- gi No. 1352605 

- Description: NADH-UBIQUINONE OXIDOREDUCTASE 18 KD SUBUNIT 
(COMPLEX I-18KD) (CI-18KD) 

- % Identity: 86.7 

- Alignment Length: 30 

- Location of Alignment in SEQ ID NO 1966: from 6 to 35 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1967 

- Ceres seq_id 1599566 

- Location of start within SEQ ID NO 1964: at 224 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 948 

- gi No. 1352605 

- Description: NADH-UBIQUINONE OXIDOREDUCTASE 18 KD SUBUNIT 
(COMPLEX I-18KD) (CI-18KD) 

- % Identity: 86.7 

- Alignment Length: 30 

- Location of Alignment in SEQ ID NO 1967: from 1 to 16 

Maximum Length Sequence: 

related to: 
Clone IDs: 

322622 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1968 

- Ceres seq_id 1599574 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1969 

- Ceres seq_id 1599575 

- Location of start within SEQ ID NO 1968: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1970 

- Ceres seq_id 1599576 

- Location of start within SEQ ID NO 1968: at 53 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 94 9 

- gi No. 733454 

- Description: (U23188) chlorophyll a/b-binding apoprotein CP26 
precursor [Zeamays] 

- % Identity: 81.8 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 1970: from 1 to 11 

Maximum Length Sequence : 

related to: 
Clone IDs: 

322869 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1971 

- Ceres seq__id 1599588 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1972 

- Ceres seq_id 1599589 

- Location of start within SEQ ID NO 1971: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Chitin recognition protein 

- Location within SEQ ID NO 1972: from 45 to 78 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 950 

- gi No. 299189 

- Description: Ac-AMP2=antimicrobial peptide [Amaranthus caudatus, 
seeds, Peptide, 30 aa] >gi I 1431748 | pdb i 1MMC | lh Nmr Study Of The Solution 
Structure Of Ac-Amp2 

- % Identity: 71.4 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 1972: from 50 to 70 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 197 3 

- Ceres seq_id 1599590 

- Location of start within SEQ ID NO 1971: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 197 4 

- Ceres seq_id 1599591 

- Location of start within SEQ ID NO 1971: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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Maximum Length Sequence: 

related to; 
Clone IDs: 

323272 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1975 

- Ceres seq_id 1599617 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1976 

- Ceres seq_id 1599618 

- Location of start within SEQ ID NO 1975: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 951 

- gi No. 1173045 

- Description: 60S RIBOSOMAL PROTEIN L37A >gi | 4 218 66 [ pir | | S34 661 
ribosomal protein L37a - turnip >gi 1347062 (L21897) ribosomal protein 
[Brassica rapa] >gi I 395077 | emb t CAA808 64 | (Z24739) ribosomal protein L37a 
[Brassica rapa] 

- % Identity: 85.4 
-'Alignment Length: 82 

- Location of Alignment in SEQ ID NO 1976: from 32 to 113 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1977 

- Ceres seq_id 1599619 

- Location of start within SEQ ID NO 1975: at 95 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 952 

- gi No. 1173045 

- Description: 60S RIBOSOMAL PROTEIN L37A >gi I 4218 66 ! pir || S34 661 
ribosomal protein L37a - turnip >gi 1347062 (L21897) ribosomal protein 
[Brassica rapa] >gi i 395077 j emb | CAA80864 t (Z24739) ribosomal protein L37a 
[Brassica rapa] 

- % Identity: 85.4 

- Alignment Length: 82 

- Location of Alignment in SEQ ID NO 1977: from 1 to 82 

Maximum Length Sequence: 

related to: 
Clone IDs: 

323409 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1978 

- Ceres seq_id 1599628 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 197 9 

- Ceres seq_id 1599629 

- Location of start within SEQ ID NO 1978: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 1980 

- Ceres seq_id 1599630 

- Location of start within SEQ ID NO 1978: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

{B} Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1981 

- Ceres seq__id 1599631 

- Location of start within SEQ ID NO 1978: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribosomal L39 protein 

- Location within SEQ ID NO 1981: from 35 to 77 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 953 

- gi No. 1710551 

- Description: 60S RIBOSOMAL PROTEIN L39 

>gi | 1177369 | emb ICAA64728 . 1 I (X95458) ribosomal protein L39 [Zea mays] 

- % Identity: 100 

- Alignment Length: 51 

- Location of Alignment in SEQ ID NO 1981: from 27 to 77 

Maximum Length Sequence: 

related to: 
Clone IDs: 

323590 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1982 

- Ceres seq_id 1599638 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1983 

- Ceres seq__id 1599639 

- Location of start within SEQ ID NO 1982: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Hsp20/alpha crystallin family 

- Location within SEQ ID NO 1983: from 80 to 114 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 954 

- gi No. 100883 

- Description: heat shock protein 17.2 - maize 
>gi[22335|embjCAA46641| (X65725) heat shock protein 17.2 [Zea mays] 

- % Identity: 96.2 

- Alignment Length: 7 8 

- Location of Alignment in SEQ ID NO 1983: from 46 to 122 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1984 

- Ceres seq_id 1599640 

- Location of start within SEQ ID NO 1982: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1985 

- Ceres seq_id 1599641 

- Location of start within SEQ ID NO 1982: at 159 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Hsp20/alpha crystallin family 

- Location within SEQ ID NO 1985: from 28 to 62 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 955 

- gi No. 100883 

- Description: heat shock protein 17.2 - maize 

>gi I 22335 iembl CAA4 6641 | (X65725) heat shock protein 17.2 [Zea mays] 

- % Identity: 96.2 

- Alignment Length: 7 8 

- Location of Alignment in SEQ ID NO 1985: from 1 to 70 

Maximum Length Sequence: 

related to: 
Clone IDs: 

323748 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1986 

- Ceres seq_id 1599657 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1987 

- Ceres seq_id 1599658 

- Location of start within SEQ ID NO 1986: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 956 

- gi No. 5732069 

- Description: (AF147263) contains similarity to Pfam family 
PF00036 - EF hand; score=11.7, E=0.66,N=1 [Arabidopsis thaliana] 

- % Identity: 81.4 

- Alignment Length: 7 0 

- Location of Alignment in SEQ ID NO 1987: from 2 to 65 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1988 

- Ceres seq_id 1599659 

- Location of start within SEQ ID NO 198 6: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

323843 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 198 9 

- Ceres seq_id 1599663 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1990 
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- Ceres seq_id 1599664 

- Location of start within SEQ ID NO 1989: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s } 

(Dp) Related Amino Acid Sequences 

- Alignment No. 957 

- gi No. 2244772 

- Description: (Z97335) transport protein [Arabidopsis thaliana] 

- % Identity: 79.2 

- Alignment Length: 72 

- Location of Alignment in SEQ ID NO 1990: from 1 to 72 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1991 

- Ceres seq__id 1599665 

- Location of start within SEQ ID NO 1989: at 54 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 958 

- gi No. 2244772 

- Description: (Z97335) transport protein [Arabidopsis thaliana] 

- % Identity: 79.2 

- Alignment Length; 72 

- Location of Alignment in SEQ ID NO 1991: from 1 to 55 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1992 

- Ceres seq__id 1599666 

- Location of start within SEQ ID NO 1989: at 193 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 959 

- gi No. 3935152 

- Description: (AC005106) T25N20.16 [Arabidopsis thaliana] 

- % Identity: 89.5 

- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 1992: from 11 to 29 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324028 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 1993 

- Ceres seq_id 1599678 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1994 

- Ceres seq_id 1599679 

- Location of start within SEQ ID NO 1993: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1995 
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- Ceres seq__id 1599680 

- Location of start within SEQ ID NO 1993: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s } 

- Ribosomal L3 9 protein 

- Location within SEQ ID NO 1995: from 29 to 71 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 960 

- gi No. 1710551 

- Description: 60S RIBOSOMAL PROTEIN L39 

>gi I 1177369 | emb | CAA64728 . 1 I (X95458) ribosomal protein L39 [Zea mays] 

- % Identity: 98 

- Alignment Length: 51 

- Location of Alignment in SEQ ID NO 1995: from 21 to 71 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1996 

- Ceres seq_id 1599681 

- Location of start within SEQ ID NO 1993: at 63 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal L39 protein 

- Location within SEQ ID NO 1996: from 9 to 51 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 961 

- gi No. 1710551 

- Description: 60S RIBOSOMAL PROTEIN L39 

>gi 1 1177369 i emb [CAA64728 . 1 | (X95458) ribosomal protein L39 [Zea mays] 

- % Identity: 98 

- Alignment Length: 51 

- Location of Alignment in SEQ ID NO 1996: from 1 to 51 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324052 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 19 97 

- Ceres seq_id 1599686 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 1998 

- Ceres seq_id 1599687 

- Location of start within SEQ ID NO 1997: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 962 

- gi No. 688080 

- Description: lectin=chitin-binding protein [Solanum 
tuberosum=potatoes, tubers, Peptide Partial, 27 aa] 

- % Identity: 73.3 

- Alignment Length: 15 

- Location of Alignment in SEQ ID NO 1998: from 26 to 40 
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Maximum Length Sequence : 
related to: 
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Clone IDs: 

324064 

(Ac) cDNA Polynucleotide Sequence 
» Pat. Appln. SEQ ID NO 1999 
- Ceres seq_id 1599692 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2000 

- Ceres seq_id 1599693 

- Location of start within SEQ ID NO 1999: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 20 01 

- Ceres seq_id 1599694 

- Location of start within SEQ ID NO 1999: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 963 

- gi No. 2130129 

- Description: glucose starvation-induced protein (clone pZSS3) - 
maize (fragment) >gi | 575 42 6 | emb I CAA57 939 i (X82617) sugar-starvation induced 
protein [Zea mays] 

- % Identity: 93.6 

- Alignment Length: 47 

- Location of Alignment in SEQ ID NO 2001: from 85 to 130 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324131 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2002 

- Ceres seq_id 1599697 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2003 

- Ceres seq_id 1599698 

- Location of start within SEQ ID NO 2002: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L30 

- Location within SEQ ID NO 2003: from 65 to 116 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

324211 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2004 

- Ceres seq_id 1599712 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2005 

- Ceres seq_id 1599713 

- Location of start within SEQ ID NO 2004: at 2 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 964 

- gi No. 585963 

- Description: PROTEIN TRANSPORT PROTEIN SEC61 GAMMA SUBUNIT 

- % Identity: 100 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 2005: from 19 to 30 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2006 

- Ceres seq_id 1599714 

- Location of start within SEQ ID NO 2004: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2007 

- Ceres seq_id 1599715 

- Location of start within SEQ ID NO 2004: at 56 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 9 65 

- gi No. 585963 

- Description: PROTEIN TRANSPORT PROTEIN SEC61 GAMMA SUBUNIT 

- % Identity: 100 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 2007: from 1 to 12 

Maximum Length Sequence : 

related to: 
Clone IDs: 

324237 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2008 

- Ceres seq_id 1599724 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2009 

- Ceres seq_id 1599725 

- Location of start within SEQ ID NO 2008: at 3 nt, 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Signal carboxyl-terminal domain 

- Location within SEQ ID NO 2009: from 1 to 75 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 96 6 

- gi No. 2281705 

- Description: (AF013979) ethylene responsive factor [Oryza sativa] 

- % Identity: 73.2 

- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 2009: from 1 to 125 
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(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2010 

- Ceres seq_id 1599726 

- Location of start within SEQ ID NO 2008: at 61 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2011 

- Ceres seq_id 1599727 

- Location of start within SEQ ID NO 2008: at 159 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 967 

- gi No. 2281705 

- Description: (AF013979) ethylene responsive factor [Oryza sativa] 

- % Identity: 73.2 

- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 2011: from 1 to 73 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324278 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2012 

- Ceres seq_id 1599732 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2013 

- Ceres seq_id 1599733 

- Location of start within SEQ ID NO 2012: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 968 

- gi No. 480897 

- Description: gene msgl protein - mouse >gi | 406257 ] emb | CAA50636 I 
(X71629) msgl [Mus musculus] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in ° SEQ ID NO 2013: from 18 to 28 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2014 

- Ceres seq_id 1599734 

» Location of start within SEQ ID NO 2012: at 51 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2015 

- Ceres seq_id 1599735 

- Location of start within SEQ ID NO 2012: at 60 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324313 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2016 

- Ceres seq_id 1599740 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2017 

- Ceres seq_id 1599741 

- Location of start within SEQ ID NO 2016: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 969 

- gi No. 3876913 

- Description: (Z92834) similar to 40S ribosomal protein S26; cDNA 
EST yk329c3.3 comes from this gene; cDNA EST yk4 0 9f4.5 comes from this gene; 
cDNA EST yk504dl0.3 comes from this gene; cDNA EST yk4 95a3.3 comes from this 
gene; cDNA EST yk4 42b... 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 2017: from 34 to 47 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2018 

- Ceres seq__id 1599742 

- Location of start within SEQ ID NO 2016: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324347 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2019 

- Ceres seq_id 1599746 
(B) Polypeptide Sequence 

- Pat* Appln. SEQ ID NO 2020 

- Ceres seq_id 1599747 

- Location of start within SEQ ID NO 2019: at 146 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 97 0 

- gi No. 1711507 

- Description: SIGNAL RECOGNITION PARTICLE 19 KD PROTEIN (SRP19) 
>gi I 624221 (U19030) signal recognition particle 19 kDa protein subunit SRP19 
[Oryza sativa] 

- % Identity: 81.5 

- Alignment Length: 65 

- Location of Alignment in SEQ ID NO 2020: from 13 to 73 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2021 

- Ceres seq_id 1599748 

- Location of start within SEQ ID NO 2019: at 170 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 971 

- gi No. 1711507 

- Description: SIGNAL RECOGNITION PARTICLE 19 KD PROTEIN (SRP19) 
>gi I 624221 (U19030) signal recognition particle 19 kDa protein subunit SRP19 
[Oryza sativa] 

- % Identity: 81.5 

- Alignment Length: 65 

- Location of Alignment in SEQ ID NO 2021: from 5 to 65 

Maximum Length Sequence : 

related to: 
Clone IDs: 

324360 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2022 

- Ceres seq__id 1599756 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2023 

- Ceres seq_id 1599757 

- Location of start within SEQ ID NO 2022: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2024 

- Ceres seq_id 1599758 

- Location of start within SEQ ID NO 2022: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 97 2 

- gi No. 133867 

- Description: 40S RIBOSOMAL PROTEIN Sll >gi 1 82722 | pir | | S16577 
ribosomal protein Sll - maize >gi 1 22470 | emb | CAA39438 | (X55967) ribosomal 
protein Sll [Zea mays] 

- % Identity: 98 

- Alignment Length: 100 

- Location of Alignment in SEQ ID NO 2024: from 30 to 129 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324408 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2025 

- Ceres seq_id 1599770 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2026 

- Ceres seq_id 1599771 
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- Location of start within SEQ ID NO 2025: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 2026: from 56 to 138 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2027 

- Ceres seq_id 1599772 

- Location of start within SEQ ID NO 2025: at 54 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 2027: from 39 to 121 aa. 
(Dp) Related Amino Acid Sequences 

{ B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2028 

- Ceres seq__id 1599773 

- Location of start within SEQ ID NO 2025: at 114 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 2028: from 19 to 101 aa . 



Maximum Length Sequence: 

related to: 
Clone IDs: 

324579 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2029 

- Ceres seq_id 1599795 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2030 

- Ceres seq__id 1599796 

- Location of start within SEQ ID NO 2029: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Calreticulin family 

- Location within SEQ ID NO 2030: from 1 to 137 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 97 3 

- gi No. 1181331 

- Description: (X77569) calnexin [Zea mays] 

- % Identity: 96.5 

- Alignment Length: 14 3 

- Location of Alignment in SEQ ID NO 2030: from 1 to 142 



(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2031 

- Ceres seq__id 1599797 
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- Location of start within SEQ ID NO 2029: at 22 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Calreticulin family 

- Location within SEQ ID NO 2031: from 1 to 130 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 97 4 

- gi No. 1181331 

- Description: (X77569) calnexin [Zea mays] 

- % Identity: 96.5 

- Alignment Length: 143 

- Location of Alignment in SEQ ID NO 2031: from 1 to 135 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2032 

- Ceres seq_id 1599798 

- Location of start within SEQ ID NO 2029: at 211 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Calreticulin family 

- Location within SEQ ID NO 2032: from 1 to 67 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 97 5 

- gi No. 1181331 

- Description: (X77569) calnexin [Zea mays] 

- % Identity: 96.5 

- Alignment Length: 14 3 

- Location of Alignment in SEQ ID NO 2032: from 1 to 72 

Maximum Length Sequence : 

related to: 
Clone IDs: 

324696 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2033 

- Ceres seq_id 1599813 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2034 

- Ceres seq_id 1599814 

- Location of start within SEQ ID NO 2033: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2035 

- Ceres seq__id 1599815 

- Location of start within SEQ ID NO 2033: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EF hand 

- Location within SEQ ID NO 2035: from 39 to 67 aa. 



(Dp) Related Amino Acid Sequences 
- Alignment No. 97 6 
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- gi No. 170076 

- Description: (L01433) calmodulin [Glycine max] 
>gi I 1583770 Iprf j j 2 12 138 4 Dcalmodulin [Glycine max] 

- % Identity: 73.7 

- Alignment Length: 38 

- Location of Alignment in SEQ ID NO 2035: from 30 to 67 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2036 

- Ceres seq_id 1599816 

- Location of start within SEQ ID NO 2033: at 87 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EF hand 

- Location within SEQ ID NO 2036: from 11 to 39 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 977 

- gi No. 170076 

- Description: (L01433) calmodulin [Glycine max] 
>gi | 1583770 Iprf | ! 212138 4Dcalmodulin [Glycine max] 

- % Identity: 73.7 

- Alignment Length: 38 

- Location of Alignment in SEQ ID NO 2036: from 2 to 39 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324790 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2037 

- Ceres seq_id 1599829 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2038 

- Ceres seq__id 1599830 

- Location of start within SEQ ID NO 2037: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribulose bisphosphate carboxylase, small chain 

- Location within SEQ ID NO 2038: from 97 to 151 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 978 

- gi No. 132147 

- Description: RIBULOSE BISPHOSPHATE CARBOXYLASE SMALL CHAIN 
PRECURSOR (RUBISCO SMALL SUBUNIT) >gi | 68 0 8 9 i pir f | RKZMS ribulose-bisphosphate 
carboxylase (EC 4.1.1.39) small chain precursor - maize 

>gij22474 i emb j CAA2 9 7 8 4 | (X06535) 

- % Identity: 92.9 

- Alignment Length: 5 6 

- Location of Alignment in SEQ ID NO 2038: from 97 to 151 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2039 

- Ceres seq_id 1599831 

- Location of start within SEQ ID NO 2037: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 97 9 

- gi No. 132147 

- Description: RIBULOSE BISPHOSPHATE CARBOXYLASE SMALL CHAIN 
PRECURSOR (RUBISCO SMALL SUBUNIT) >gi | 68 0 8 9 j pir | i RKZMS ribulose-bisphosphate 
carboxylase (EC 4.1.1.39) small chain precursor - maize 

>gi | 2247 4 j emb | CAA2 97 8 4 j (X0 6535) 

- % Identity: 98.7 

- Alignment Length: 7 6 

- Location of Alignment in SEQ ID NO 2039: from 21 to 96 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2040 

- Ceres seq__id 1599832 

- Location of start within SEQ ID NO 2037: at 63 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 98 0 

- gi No. 132147 

- Description: RIBULOSE BISPHOSPHATE CARBOXYLASE SMALL CHAIN 
PRECURSOR (RUBISCO SMALL SUBUNIT) >gi i 68 08 9 i pir j | RKZMS ribulose-bisphosphate 
carboxylase (EC 4.1.1.39) small chain precursor - maize 

>gi|22474 | emb | CAA297 84 | (X06535) 

- % Identity: 98.7 

- Alignment Length: 7 6 

- Location of Alignment in SEQ ID NO 2040: from 1 to 76 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324832 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2041 

- Ceres seq_id 1599833 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2042 

- Ceres seq_id 1599834 

- Location of start within SEQ ID NO 2041: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s) 

- 11-S plant seed storage protein 

- Location within SEQ ID NO 2042: from 56 to 113 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 981 

- gi No. 4506461 

- Description: ref | NP_0028 95 . 1 | pRD i Radin blood group 
>gi | 86883 ! pir | j JH0189 arginine/aspartate-rich 37. 3K protein - human 
>gi | 35913 [ emb | CAA3 4 2 3 1 I (X16105) RD protein (AA 1-325) [Homo sapiens] 

- % Identity: 71.1 

- Alignment Length: 38 

- Location of Alignment in SEQ ID NO 2042: from 52 to 89 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2043 

- Ceres seq_id 1599835 

- Location of start within SEQ ID NO 2041: at 58 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 11-S plant seed storage protein 

- Location within SEQ ID NO 2043: from 37 to 94 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 982 

- gi No. 4506461 

- Description: ref | NP__0028 95 . 1 i pRD j Radin blood group 
>gi 1 86883 ipir | ] JH0189 arginine/aspartate-rich 37 . 3K protein - human 
>gi|35913]embtCAA342311 (X16105) RD protein {AA 1-325) [Homo sapiens] 

- % Identity: 71.1 

- Alignment Length: 38 

- Location of Alignment in SEQ ID NO 2043: from 33 to 70 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2044 

- Ceres seq_id 1599836 

- Location of start within SEQ ID NO 2041: at 139 nt . 

<C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 11-S plant seed storage protein 

- Location within SEQ ID NO 2044: from 10 to 67 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 983 

- gi No. 4506461 

- Description: ref 1 NP_0028 95 . 1 I pRD I Radin blood group 
>gi i 86883 Ipir | | JH0189 arginine/aspartate-rich 37. 3K protein - human 
>gi[35913iemb|CAA34231i (X16105) RD protein (AA 1-325) [Homo sapiens] 

- % Identity: 71.1 

- Alignment Length: 38 

- Location of Alignment in SEQ ID NO 2044: from 6 to 43 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324871 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2045 

- Ceres seq__id 1599853 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2046 

- Ceres seq_id 1599854 

- Location of start within SEQ ID NO 2045: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Mucin-like glycoprotein 

- Location within SEQ ID NO 2046: from 40 to 135 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 98 4 

- gi No. 688080 

- Description: lectin=chitin-binding protein [Solanum 
tuberosum^potatoes, tubers, Peptide Partial, 27 aa] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2046: from 81 to 91 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2047 

- Ceres seq_id 1599855 

- Location of start within SEQ ID NO 2045: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2048 

- Ceres seq_id 1599856 

- Location of start within SEQ ID NO 2045: at 65 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

324904 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2049 

- Ceres seq__id 1599861 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2050 

- Ceres seq_id 1599862 

- Location of start within SEQ ID NO 2049: at 101 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2051 

- Ceres seq__id 1599863 

- Location of start within SEQ ID NO 2049: at 261 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Hsp90 protein 

- Location within SEQ ID NO 2051: from 1 to 64 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 98 5 

- gi No. 417154 

- Description: HEAT SHOCK PROTEIN 82 >gi | 100685 | pir i | S25541 heat 
shock protein 82 - rice (strain Taichung Native One) >gi j 20256 j emb I CAA77 97 8 | 
(Z11920) heat shock protein 82 (HSP82) [Oryza sativa] 

- % Identity: 95.2 

- Alignment Length: 8 4 

- Location of Alignment in SEQ ID NO 2051: from 1 to 64 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2052 

- Ceres seq__id 1599864 

- Location of start within SEQ ID NO 2049: at 273 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- Hsp90 protein 

- Location within SEQ ID NO 2052: from 1 to 60 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 98 6 

- gi No. 417154 

- Description: HEAT SHOCK PROTEIN 82 >gi | 1 00 685 i pir j i S255 4 1 heat 
shock protein 82 - rice (strain Taichung Native One) >gi j 20256 1 emb 1 CAA77 978 | 
(Z11920) heat shock protein 82 (HSP82) [Oryza sativa] 

- % Identity: 95.2 

- Alignment Length: 8 4 

- Location of Alignment in SEQ ID NO 2052: from 1 to 60 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324954 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2053 

- Ceres seq_id 1599885 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2054 

- Ceres seq_id 1599886 

- Location of start within SEQ ID NO 2053: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide {s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 987 

- gi No. 3342242 

- Description: (AF030421) cell wall invertase; beta- 
fructofuranosidase; fructosidase [Triticum aestivum] 

- % Identity: 78.1 

- Alignment Length: 64 

- Location of Alignment in SEQ ID NO 2054: from 1 to 64 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324988 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2055 

- Ceres seq_id 1599891 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2056 

- Ceres seq_id 1599892 

- Location of start within SEQ ID NO 2055; at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2057 

- Ceres seq_id 1599893 

- Location of start within SEQ ID NO 2055: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

{Dp) Related Amino Acid Sequences 
- Alignment No. 98 8 
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- gi No. 4512263 

- Description: (AB018526) H+/Ca2+ exchanger 2 [Ipomoea nil] 

- % Identity: 83.7 

- Alignment Length: 4 3 

- Location of Alignment in SEQ ID NO 2057: from 1 to 43 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2058 

- Ceres seq_id 1599894 

- Location of start within SEQ ID NO 2055: at 24 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 98 9 

- gi No. 4512263 

- Description: (AB018526) H+/Ca2+ exchanger 2 [Ipomoea nil] 

- % Identity: 83.7 

- Alignment Length: 4 3 

- Location of Alignment in SEQ ID NO 2058: from 1 to 36 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325040 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2059 

- Ceres seq_id 1599910 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2060 

- Ceres seq_id 1599911 

- Location of start within SEQ ID NO 2059: at 145 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Heme-binding domain in cytochrome b5 and oxidoreductases 

- Location within SEQ ID NO 2060: from 7 to 69 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 990 

- gi No. 729252 

- Description: CYTOCHROME B5 >gi 1167140 (M87514) cytochrome b-5 
[Brassica oleracea] >gi [ 384338 I prf I | 190542 6A cytochrome b5 [Brassica 
oleracea] 

- % Identity: 77.1 

- Alignment Length: 7 0 

- Location of Alignment in SEQ ID NO 2060: from 1 to 69 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325061 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2061 

- Ceres seq_id 1599912 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 20 62 

- Ceres seq_id 1599913 

- Location of start within SEQ ID NO 2061: at 2 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2063 

- Ceres seq__id 1599914 

- Location of start within SEQ ID NO 2061: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2064 

- Ceres seq_id 1599915 

- Location of start within SEQ ID NO 2061: at 256 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

(Dp) Related Amino Acid Sequences 

- Alignment No. 991 

- gi No. 1085973 

- Description: isopentyl pyrophosphate isomerase - Clarkia brewe 
(fragment) >gi | 572635 | emb 1 CAA57 947 | (X82627) isopentenyl pyrophosphate 
isomerase [Clarkia breweri] 

- % Identity: 91.7 

- Alignment Length: 60 

- Location of Alignment in SEQ ID NO 2064: from 1 to 21 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325141 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2065 

- Ceres seq_id 1599942 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2066 

- Ceres seq_id 1599943 

- Location of start within SEQ ID NO 2065: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2067 

- Ceres seq_id 1599944 

- Location of start within SEQ ID NO 2065: at 2 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2068 

- Ceres seq_id 1599945 

- Location of start within SEQ ID NO 2065: at 174 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 992 

- gi No. 4038471 

- Description: (AF111029) 40S ribosomal protein S27 homolog [Zea 

mays ] 

- % Identity: 100 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 2068: from 1 to 86 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325151 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2069 

- Ceres seq^id 1599954 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2070 

- Ceres seq_id 1599955 

- Location of start within SEQ ID NO 2069: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 993 

- gi No. 2668750 

- Description: (AF034 949) ribosomal protein L30 [Zea mays] 

- % Identity: 100 

- Alignment Length: 57 

- Location of Alignment in SEQ ID NO 2070: from 32 to 88 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2071 

- Ceres seq_id 1599956 

- Location of start within SEQ ID NO 2069: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2072 

- Ceres seq_id 1599957 

- Location of start within SEQ ID NO 2069: at 94 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 994 

- gi No. 2668750 

- Description: (AF034949) ribosomal protein L30 [Zea mays] 

- % Identity: 100 

- Alignment Length: 57 

- Location of Alignment in SEQ ID NO 2072: from 1 to 57 

Maximum Length Sequence : 

related to: 
Clone IDs: 

325227 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2073 

- Ceres seq_id 1599969 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 207 4 

- Ceres seq_id 1599970 

- Location of start within SEQ ID NO 2073: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 995 

- gi No. 293338 

- Description: (L12703) engrailed protein [Mus musculus] 

- % Identity: 91.7 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 2074: from 4 to 15 

{ B ) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2075 

- Ceres seq_id 1599971 

- Location of start within SEQ ID NO 2073: at 124 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325241 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2076 

- Ceres seq_id 1599976 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2077 

- Ceres seq_id 1599977 

- Location of start within SEQ ID NO 207 6: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ADP-ribosylation factor family 

- Location within SEQ ID NO 2077: from 32 to 103 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 996 

- gi No. 2293566 

- Description: (AF012896) ADP-ribosylation factor 1 [Oryza sativa] 

- % Identity: 95.9 

- Alignment Length: 7 4 

- Location of Alignment in SEQ ID NO 2077: from 31 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2078 

- Ceres seq_id 1599978 

- Location of start within SEQ ID NO 207 6: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 9 97 
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- gi No. 4262637 

- Description: (AF125964) contains similarity to collagens 
[Caenorhabditis elegans] 

- % Identity: 83.3 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 2078: from 26 to 37 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 207 9 

- Ceres seq_id 1599979 

- Location of start within SEQ ID NO 2076: at 91 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- ADP-ribosylation factor family 

- Location within SEQ ID NO 2079: from 2 to 73 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 998 

- gi No. 2293566 

- Description: (AF012896) ADP-ribosylation factor 1 [Oryza sativa] 

- % Identity: 95.9 

- Alignment Length: 7 4 

- Location of Alignment in SEQ ID NO 207 9: from 1 to 73 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325289 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2080 

- Ceres seq_id 1599983 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2081 

- Ceres seq__id 1599984 

- Location of start within SEQ ID NO 2080: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2082 

- Ceres seq_id 1599985 

- Location of start within SEQ ID NO 2080: at 72 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 999 

- gi No. 2500378 

- Description: 60S RIBOSOMAL PROTEIN L37 

- % Identity: 80.5 

- Alignment Length: 82 

- Location of Alignment in SEQ ID NO 2082: from 1 to 82 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325339 

(Ac) cDNA Polynucleotide Sequence 
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- Pat. Appln. SEQ ID NO 2083 

- Ceres seq__id 1599992 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2084 

- Ceres seq_id 1599993 

- Location of start within SEQ ID NO 2083: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 100 0 

- gi No. 3747050 

- Description: (AF093540) ribosomal protein L26 [Zea mays] 

- % Identity: 100 

- Alignment Length: 33 

- Location of Alignment in SEQ ID NO 2084: from 48 to 80 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 08 5 

- Ceres seq_id 1599994 

- Location of start within SEQ ID NO 2083: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1001 

- gi No. 3747050 

- Description: (AF093540) ribosomal protein L26 [Zea mays] 

- % Identity: 94.1 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 2085: from 30 to 46 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2086 

- Ceres seq_id 1599995 

- Location of start within SEQ ID NO 2083: at 90 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1002 

- gi No. 3747050 

- Description: (AF093540) ribosomal protein L26 [Zea mays] 

- % Identity: 94.1 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 208 6: from 1 to 17 

Maximum Length Sequence : 

related to: 
Clone IDs: 

325551 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2087 

- Ceres seq__id 1600027 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2088 

- Ceres seq__id 1600028 

- Location of start within SEQ ID NO 2087: at 1 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 1003 

- gi No. 3122673 

- Description: 60S RIBOSOMAL PROTEIN LIS 

>gi | 2245027 ! emb | CAB10447 . 1 [ (Z97341) ribosoraal protein [Arabidopsis thaliana] 

- % Identity: 91.7 

- Alignment Length: 24 

- Location of Alignment in SEQ ID NO 2088: from 29 to 51 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2089 

- Ceres seq_id 1600029 

- Location of start within SEQ ID NO 2087: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325605 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2090 

- Ceres seq__id 1600039 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2091 

- Ceres seq_id 1600040 

- Location of start within SEQ ID NO 2090: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1004 

- gi No. 1350720 

- Description: 60S RIBOSOMAL PROTEIN L32 

- % Identity: 74.1 

- Alignment Length: 5 4 

- Location of Alignment in SEQ ID NO 2091: from 18 to 71 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2092 

- Ceres seq_Id 1600041 

- Location of start within SEQ ID NO 2090: at 188 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1005 

- gi No. 1350720 

- Description: 60S RIBOSOMAL PROTEIN L32 

- % Identity: 85.4 

- Alignment Length: 41 

- Location of Alignment in SEQ ID NO 2092: from 11 to 50 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325656 

(Ac) cDNA Polynucleotide Sequence 
- Pat. Appln. SEQ ID NO 2093 
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- Ceres seq_ici 1600059 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2094 

- Ceres seq_id 1600060 

- Location of start within SEQ ID NO 2093: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2095 

- Ceres seq_id 1600061 

- Location of start within SEQ ID NO 2093: at 172 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EF hand 

- Location within SEQ ID NO 2095: from 41 to 69 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 100 6 

- gi No. 1864024 

- Description: (U35683) calcium-binding pollen allergen [Cynodon 
dactylon] >gi | 1871507 ! emb | CAA62634 | (X91256) calcium-binding pollen allergen 
[Cynodon dactylon] 

- % Identity: 90 

- Alignment Length: 8 0 

- Location of Alignment in SEQ ID NO 2095: from 1 to 80 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2096 

- Ceres seq_id 1600062 

- Location of start within SEQ ID NO 2093: at 190 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- EF hand 

- Location within SEQ ID NO 2096: from 35 to 63 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1007 

- gi No. 1864024 

- Description: (U35683) calcium-binding pollen allergen [Cynodon 
dactylon] >gi \ 1871507 j emb I CAA62634 I (X91256) calcium-binding pollen allergen 
[Cynodon dactylon] 

- % Identity: 90 

- Alignment Length: 8 0 

- Location of Alignment in SEQ ID NO 2096: from 1 to 74 

Maximum Length Sequence : 

related to: 
Clone IDs: 

325682 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2097 

- Ceres seq_id 1600071 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2098 

- Ceres seq_id 1600072 

- Location of start within SEQ ID NO 2097: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s } 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2099 

- Ceres seq_id 1600073 

- Location of start within SEQ ID NO 2097: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Peroxidase 

- Location within SEQ ID NO 2099: from 87 to 143 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1008 

- gi No. 1076800 

- Description: L-ascorbate peroxidase (EC 1.11.1.11), cytosolic 
isozyme - maize >gi I 600116 | emb | CAA84406 I (Z34934) cytosolic ascorbate 
peroxidase [Zea mays] >gi I 1096503 I prf | j 2111423A ascorbate peroxidase [Zea 
mays] 

- % Identity: 82.5 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 2099: from 30 to 143 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2100 

- Ceres seq_id 1600074 

- Location of start within SEQ ID NO 2097: at 89 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Peroxidase 

- Location within SEQ ID NO 2100: from 58 to 114 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 09 

- gi No. 1076800 

- Description: L-ascorbate peroxidase (EC 1.11.1.11), cytosolic 
isozyme - maize >gi | 600116 [ emb | CAA84406 I (Z34934) cytosolic ascorbate 
peroxidase [Zea mays] >gi i 1096503 i prf | 1 2111423A ascorbate peroxidase [Zea 
mays] 

- % Identity: 82.5 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 2100: from 1 to 114 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325698 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2101 

- Ceres seq_id 1600077 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2102 

- Ceres seq_id 1600078 

- Location of start within SEQ ID NO 2101: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- Ubiquitin family 

- Location within SEQ ID NO 2102: from 22 to 74 aa . 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2103 

- Ceres seq_id 1600079 

- Location of start within SEQ ID NO 2101: at 20 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ubiquitin family 

- Location within SEQ ID NO 2103: from 16 to 68 aa . 
{Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2104 

- Ceres seq__id 1600080 

- Location of start within SEQ ID NO 2101: at 152 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325745 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2105 

- Ceres seq_id 1600093 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2106 

- Ceres seq_id 1600094 

- Location of start within SEQ ID NO 2105: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2107 

- Ceres seq_id 1600095 

- Location of start within SEQ ID NO 2105: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1010 

- gi No. 225142 

- Description: chorion protein A6 [Bombyx mori] 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 2107: from 25 to 38 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2108 

- Ceres seq_id 1600096 

- Location of start within SEQ ID NO 2105: at 67 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325777 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2109 

- Ceres seq__id 1600105 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2110 

- Ceres seq_id 1600106 

- Location of start within SEQ ID NO 2109: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1011 

- gi No. 2605619 

- Description: (D88618) OSMYB2 [Oryza sativa] 

- % Identity: 91.1 

- Alignment Length: 4 5 

- Location of Alignment in SEQ ID NO 2110: from 53 to 97 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2111 

- Ceres seq_id 1600107 

- Location of start within SEQ ID NO 2109: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1012 

- gi No. 2605619 

- Description: (D88618) OSMYB2 [Oryza sativa] 

- % Identity: 81.3 

- Alignment Length: 16 

- Location of Alignment in SEQ ID NO 2111: from 125 to 139 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2112 

- Ceres seq_id 1600108 

- Location of start within SEQ ID NO 2109: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Myb-like DNA-binding domain 

- Location within SEQ ID NO 2112: from 78 to 112 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1013 

- gi No. 2605619 

- Description: (D88618) OSMYB2 [Oryza sativa] 

- % Identity: 96.4 

- Alignment Length: 28 

- Location of Alignment in SEQ ID NO 2112: from 97 to 124 
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Maximum Length Sequence: 
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related to: 
Clone IDs: 

325798 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2113 

- Ceres seq_id 1600113 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2114 

- Ceres seq_id 1600114 

- Location of start within SEQ ID NO 2113: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2115 

- Ceres seq_id 1600115 

- Location of start within SEQ ID NO 2113: at 93 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2116 

- Ceres seq_id 1600116 

- Location of start within SEQ ID NO 2113: at 139 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Protamine PI 

- Location within SEQ ID NO 2116: from 3 to 45 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1014 

- gi No. 266845 

- Description: PROTAMINE Z3 (SCYLLIORHININE Z3) 
>gi I 422544 i pir M S29829 protamine Z3 - Scyliorhinus canicula 

>gi | 299275 |bbs | 127327 protamine Z3 [ Scylliorhinus caniculus=dog-f ish, 
Peptide, 37 aa] >gi I 4 45971 j prf | 1 1911213A protamine Z3 [Scyliorhinus canicula] 

- % Identity: 70.4 

- Alignment Length: 27 

- Location of Alignment in SEQ ID NO 2116: from 22 to 46 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325807 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2117 

- Ceres seq_id 1600117 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2118 

- Ceres seq_id 1600118 

- Location of start within SEQ ID NO 2117: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2119 

- Ceres seq_id 1600119 

- Location of start within SEQ ID NO 2117: at 214 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2120 

- Ceres seq_id 1600120 

- Location of start within SEQ ID NO 2117: at 308 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1015 

- gi No. 3334346 

- Description: PROTEIN TRANSLATION FACTOR SUI1 HOMOLOG 
>gi|2852445 |dfoj 1BAA24697 j (AB003378) SUI1 homo log [Salixbakko] 

- % Identity: 79.2 

- Alignment Length: 2 4 

- Location of Alignment in SEQ ID NO 2120: from 11 to 33 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325891 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2121 

- Ceres seq_id 1600147 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2122 

- Ceres seq_id 1600148 

- Location of start within SEQ ID NO 2121: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1016 

- gi No. 1277180 

- Description: (U52099) ZmRB [Zea mays] 

- % Identity: 88.6 

- Alignment Length: 7 9 

- Location of Alignment in SEQ ID NO 2122: from 2 to 80 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2123 

- Ceres seq_id 1600149 

- Location of start within SEQ ID NO 2121: at 189 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1017 

- gi No. 1277180 

- Description: (U52099) ZmRB [Zea mays] 

- % Identity: 87.8 

- Alignment Length: 4 9 

- Location of Alignment in SEQ ID NO 2123: from 20 to 67 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2124 

- Ceres seq__id 1600150 

- Location of start within SEQ ID NO 2121: at 213 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1018 

- gi No. 1277180 

- Description: (U52099) ZmRB [Zea mays] 

- % Identity: 87.8 

- Alignment Length: 4 9 

- Location of Alignment in SEQ ID NO 2124: from 12 to 59 



Maximum Length Sequence: 

related to: 
Clone IDs: 

325914 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2125 

- Ceres seq_id 1600160 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2126 

- Ceres seq_id 1600161 

- Location of start within SEQ ID NO 2125: at 104 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1019 

- gi No. 5091623 

- Description: (AC007454) Similar to gb|U93048 somatic 
embryogenesis receptor-like kinase from Daucus carota, contains 4 PF ! 00560 
Leucine Rich Repeat domains and a PF | 00069 Eukaryotic protein kinase domain. 
[Arabidopsis thaliana] 

- % Identity: 72.1 

- Alignment Length: 10 4 

- Location of Alignment in SEQ ID NO 212 6: from 1 to 92 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2127 

- Ceres seq_id 1600162 

- Location of start within SEQ ID NO 2125: at 212 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1020 

- gi No. 5091623 

- Description: (AC007454) Similar to gb|U93048 somatic 
embryogenesis receptor-like kinase from Daucus carota, contains 4 PF 100560 
Leucine Rich Repeat domains and a PF| 00069 Eukaryotic protein kinase domain. 
[Arabidopsis thaliana] 

- % Identity: 72.1 

- Alignment Length: 104 

- Location of Alignment in SEQ ID NO 2127: from 1 to 56 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2128 
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- Ceres seq_id 1600163 
~ Location of start within SEQ ID NO 2125: at 230 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1021 

- gi No. 5091623 

- Description: (AC007454) Similar to gb|U93048 somatic 
embryogenesis receptor-like kinase from Daucus carota, contains 4 PF j 005 60 
Leucine Rich Repeat domains and a PF j 00069 Eukaryotic protein kinase domain. 
[Arabidopsis thaliana] 

- % Identity: 72.1 

- Alignment Length: 104 

- Location of Alignment in SEQ ID NO 2128: from 1 to 50 

Maximum Length Sequence : 

related to: 
Clone IDs: 

325973 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2129 

- Ceres seq_id 1600189 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2130 

- Ceres seq_id 1600190 

- Location of start within SEQ ID NO 2129: at 91 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1022 

- gi No. 2257756 

- Description: (U82815) nucleolar histone deacetylase HD2-p39 [Zea 
mays] >gi 13650466 (AF026917) histone deacetylase HD2-p39 [Zea mays] 

- % Identity: 99.1 

- Alignment Length: 10 9 

- Location of Alignment in SEQ ID NO 2130: from 1 to 108 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2131 

- Ceres seq__id 1600191 

- Location of start within SEQ ID NO 2129: at 214 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 102 3 

- gi No. 2257756 

- Description: (U82815) nucleolar histone deacetylase HD2-p39 [Zea 
mays] >gi i 3650466 (AF026917) histone deacetylase HD2-p39 [Zea mays] 

- % Identity: 99.1 

- Alignment Length: 109 

- Location of Alignment in SEQ ID NO 2131: from 1 to 67 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325987 

(Ac) cDNA Polynucleotide Sequence 
- Pat. Appln. SEQ ID NO 2132 
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- Ceres seq_id 1600196 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2133 

- Ceres seq_id 1600197 

- Location of start within SEQ ID NO 2132: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2134 

- Ceres seq_id 1600198 

- Location of start within SEQ ID NO 2132: at 103 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 1024 

- gi No. 2130122 

- Description: cyclin III - maize >gi| 516548 (U10076) cyclin IlIZm 



[Zea mays] 

- % Identity: 92.7 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 2134: from 1 to 107 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2135 

- Ceres seq_id 1600199 

- Location of start within SEQ ID NO 2132: at 145 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 1025 

- gi No. 2130122 

- Description: cyclin III - maize >gi 1516548 (U10076) cyclin IlIZm 



Maximum Length Sequence : 

related to: 
Clone IDs: 

326018 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2136 

- Ceres seq_id 1600201 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2137 

- Ceres seq_id 1600202 

- Location of start within SEQ ID NO 2136: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribosomal S17 

- Location within SEQ ID NO 2137: from 39 to 107 aa. 



[Zea mays] 



% Identity: 92.7 
Alignment Length: 110 

Location of Alignment in SEQ ID NO 2135: from 1 to 93 



(Dp) Related Amino Acid Sequences 
- Alignment No. 102 6 
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- gi No. 1350944 

- Description: 40S RIBOSOMAL PROTEIN S17 

- % Identity: 88.7 

- Alignment Length: 71 

- Location of Alignment in SEQ ID NO 2137: from 38 to 107 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2138 

- Ceres seq_id 1600203 

- Location of start within SEQ ID NO 2136: at 113 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ribosomal S17 

- Location within SEQ ID NO 2138: from 2 to 70 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1027 

- gi No. 1350944 

- Description: 40S RIBOSOMAL PROTEIN S17 

- % Identity: 88.7 

- Alignment Length: 71 

- Location of Alignment in SEQ ID NO 2138: from 1 to 70 

Maximum Length Sequence: 

related to: 
Clone IDs: 

326124 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2139 

- Ceres seq_id 1600215 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2140 

- Ceres seq_id 1600216 

- Location of start within SEQ ID NO 2139: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2141 

- Ceres seq_id 1600217 

- Location of start within SEQ ID NO 2139: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2142 

- Ceres seq_id 1600218 

- Location of start within SEQ ID NO 2139: at 137 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 



- SRF-type transcription factor (DNA-binding and dimerisation 



domain) 



- Location within SEQ ID NO 2142: from 1 to 59 aa. 



(Dp) Related Amino Acid Sequences 
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Alignment No. 1028 
gi No. 497147 

Description: (U07334) MADS box domain [Asparagus officinalis] 
% Identity: 73.1 
Alignment Length: 2 6 

Location of Alignment in SEQ ID NO 2142: from 10 to 35 



Maximum Length Sequence: 

related to: 
Clone IDs: 

326193 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2143 

- Ceres seq_id 1600233 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2144 

- Ceres seq_id 1600234 

- Location of start within SEQ ID NO 2143: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2145 

- Ceres seq_id 1600235 

- Location of start within SEQ ID NO 2143: at 142 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Subtilase family 

- Location within SEQ ID NO 2145: from 1 to 92 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2146 

- Ceres seq_id 1600236 

- Location of start within SEQ ID NO 2143: at 238 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Subtilase family 

- Location within SEQ ID NO 214 6: from 1 to 60 aa. 



(Dp) Related Amino Acid Sequences 



Maximum Length Sequence: 

related to: 
Clone IDs: 

326304 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2147 

- Ceres seq_id 1600249 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2148 

- Ceres seq__id 1600250 

- Location of start within SEQ ID NO 2147: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 
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- Pollen allergen 

- Location within SEQ ID NO 2148: from 49 to 127 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2149 

- Ceres seq_id 1600251 

- Location of start within SEQ ID NO 2147: at 68 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Pollen allergen 

- Location within SEQ ID NO 2149: from 27 to 105 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2150 

- Ceres seq_id 1600252 

- Location of start within SEQ ID NO 2147: at 89 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Pollen allergen 

- Location within SEQ ID NO 2150: from 20 to 98 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

326502 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2151 

- Ceres seq_id 1600256 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2152 

- Ceres seq_id 1600257 

- Location of start within SEQ ID NO 2151: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 



- EF hand 

- Location within SEQ ID NO 2152: from 134 to 161 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 102 9 

- gi No. 5326544 

- Description: (Y18055) calcium dependent protein kinase [Arachis 



- % Identity: 73.4 

- Alignment Length: 15 8 

- Location of Alignment in SEQ ID NO 2152: from 1 to 158 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2153 

- Ceres seq_id 1600258 

- Location of start within SEQ ID NO 2151: at 7 nt . 



hypogaea] 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EF hand 

- Location within SEQ ID NO 2153: from 132 to 159 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1030 

- gi No. 5326544 

- Description: (Y18055) calcium dependent protein kinase [Arachis 

hypogaea] 

- % Identity: 73.4 

- Alignment Length: 158 

- Location of Alignment in SEQ ID NO 2153: from 1 to 156 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2154 

- Ceres seq_id 1600259 

- Location of start within SEQ ID NO 2151: at 85 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EF hand 

- Location within SEQ ID NO 2154: from 106 to 133 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1031 

- gi No. 5326544 

- Description: (Y18055) calcium dependent protein kinase [Arachis 

hypogaea] 

- % Identity: 73.4 

- Alignment Length: 158 

- Location of Alignment in SEQ ID NO 2154: from 1 to 130 



Maximum Length Sequence: 

related to: 
Clone IDs: 

326626 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2155 

- Ceres seq_id 1600269 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2156 

- Ceres seq_id 1600270 

- Location of start within SEQ ID NO 2155: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Polyprenyl synthetases 

- Location within SEQ ID NO 2156: from 3 to 152 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1032 

- gi No. 1346033 

- Description: FARNESYL PYROPHOSPHATE SYNTHETASE (FPP SYNTHETASE) 
(FPS) (FARNESYL DIPHOSPHATE SYNTHETASE) ( DIME THYLALLYL TRANSFERASE / 
GERANYLTRANS TRANSFERASE >gi 1662368 (L39789) farnesyl pyrophosphate synthetase 
[Zea mays] 

- % Identity: 93 

- Alignment Length: 15 8 

- Location of Alignment in SEQ ID NO 2156: from 3 to 152 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

326761 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2157 

- Ceres seq_id 1600285 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2158 

- Ceres seq_id 1600286 

- Location of start within SEQ ID NO 2157: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2159 

- Ceres seq_id 1600287 

- Location of start within SEQ ID NO 2157: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1033 

- gi No. 1944132 

- Description: (AB002560) CUC2 [Arabidopsis thaliana] 

- % Identity: 79.2 

- Alignment Length: 2 4 

- Location of Alignment in SEQ ID NO 2159: from 56 to 79 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2160 

- Ceres seq_id 1600288 

- Location of start within SEQ ID NO 2157: at 138 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1034 

- gi No. 1944132 

- Description: (AB002560) CUC2 [Arabidopsis thaliana] 

- % Tdentity: 79.2 

- Alignment Length: 24 

- Location of Alignment in SEQ ID NO 2160: from 11 to 34 

Maximum Length Sequence: 

related to: 
Clone IDs: 

326764 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2161 

- Ceres seq_id 1600289 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2162 

- Ceres seq_id 1600290 

- Location of start within SEQ ID NO 2161: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Carboxyl transferase domain 
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- Location within SEQ ID NO 2162: from 1 to 104 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2163 

- Ceres seq_id 1600291 

- Location of start within SEQ ID NO 2161; at 33 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Carboxyl transferase domain 

- Location within SEQ ID NO 2163: from 1 to 94 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2164 

- Ceres seq_id 1600292 

- Location of start within SEQ ID NO 2161: at 84 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Carboxyl transferase domain 

- Location within SEQ ID NO 2164: from 1 to 77 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

326767 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2165 

- Ceres seq_id 1600293 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2166 

- Ceres seq_id 1600294 

- Location of start within SEQ ID NO 2165: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Thiolase 

- Location within SEQ ID NO 2166: from 13 to 100 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1035 

- gi No. 5531937 

- Description: (AF113522) acetoacetyl CoA thiolase [Zea mays] 

- % Identity: 98.1 

- Alignment Length: 54 

- Location of Alignment in SEQ ID NO 2166: from 44 to 97 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2167 

- Ceres seq_id 1600295 

- Location of start within SEQ ID NO 2165: at 36 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Thiolase 
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- Location within SEQ ID NO 2167: from 2 to 89 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1036 

- gi No. 5531937 

- Description: (AF113522) acetoacetyl CoA thiolase [Zea mays] 

- % Identity: 98.1 

- Alignment Length: 54 

- Location of Alignment in SEQ ID NO 2167: from 33 to 8 6 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2168 

- Ceres seq_id 1600296 

- Location of start within SEQ ID NO 2165: at 87 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Thiolase 

- Location within SEQ ID NO 2168: from 1 to 72 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1037 

- gi No. 5531937 

- Description: (AF113522) acetoacetyl CoA thiolase [Zea mays] 

- % Identity: 98.1 

- Alignment Length: 54 

- Location of Alignment in SEQ ID NO 2168: from 16 to 69 

Maximum Length Sequence : 

related to: 
Clone IDs: 

326933 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2169 

- Ceres seq_id 1600310 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2170 

- Ceres seq_id 1600311 

- Location of start within SEQ ID NO 2169: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2171 

- Ceres seq_id 1600312 

- Location of start within SEQ ID NO 2169: at 245 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1038 

- gi No. 4164473 

- Description: (AF061157) negatively light-regulated protein 
[Vernicia fordii] 

- % Identity: 78.7 

- Alignment Length: 4 7 

- Location of Alignment in SEQ ID NO 2171: from 38 to 83 
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(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2172 

- Ceres seq__id 1600313 

- Location of start within SEQ ID NO 2169: at 302 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1039 

- gi No. 4164473 

- Description: (AF061157) negatively light-regulated protein 
[Vernicia fordii] 

- % Identity: 78.7 

- Alignment Length: 4 7 

- Location of Alignment in SEQ ID NO 2172: from 19 to 64 

Maximum Length Sequence : 

related to: 
Clone IDs: 

327124 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2173 

- Ceres seq__id 1600322 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2174 

- Ceres seq_id 1600323 

- Location of start within SEQ ID NO 2173: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Ribosomal protein L14 

- Location within SEQ ID NO 2174: from 27 to 148 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 104 0 

- gi No. 730536 

- Description: 60S RIBOSOMAL PROTEIN L23 >gi 1310933 (L16915) 60S 
ribosomal protein subunit L17 [Nicotiana tabacum] 

- % Identity: 97.1 

- Alignment Length: 140 

- Location of Alignment in SEQ ID NO 2174: from 9 to 148 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2175 

- Ceres seq_id 1600324 

- Location of start within SEQ ID NO 2173: at 26 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L14 

- Location within SEQ ID NO 2175: from 19 to 140 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1041 

- gi No. 730536 

- Description: 60S RIBOSOMAL PROTEIN L23 >gi 1310933 (L18915) 60S 
ribosomal protein subunit L17 [Nicotiana tabacum] 

- % Identity: 97.1 

- Alignment Length: 14 0 

- Location of Alignment in SEQ ID NO 2175: from 1 to 140 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2176 

- Ceres seq_id 1600325 

- Location of start within SEQ ID NO 2173: at 71 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L14 

- Location within SEQ ID NO 217 6: from 4 to 125 aa. 



{Dp} Related Amino Acid Sequences 

- Alignment No. 1042 

- gi No. 730536 

- Description: 60S RIBOSOMAL PROTEIN L23 >gi 1310933 (L18915) 60S 
ribosomal protein subunit L17 [Nicotiana tabacum] 

- % Identity: 97.1 

- Alignment Length: 14 0 

- Location of Alignment in SEQ ID NO 217 6: from 1 to 125 

Maximum Length Sequence : 

related to: 
Clone IDs: 

327312 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2177 

- Ceres seq_id 1600326 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2178 

- Ceres seq_id 1600327 

- Location of start within SEQ ID NO 2177: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2179 

- Ceres seq__id 1600328 

- Location of start within SEQ ID NO 2177: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2180 

- Ceres seq_id 1600329 

- Location of start within SEQ ID NO 2177: at 119 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Thiolase 

- Location within SEQ ID NO 2180: from 48 to 119 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1043 

- gi No. 393707 

- Description: (X67696) acetyl-CoA acyltransf erase [Cucumis 

sativus] 

- % Identity: 70.6 

- Alignment Length: 119 

- Location of Alignment in SEQ ID NO 2180: from 1 to 119 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

327404 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2181 

- Ceres seq_id 1600333 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2182 

- Ceres seq_id 1600334 

- Location of start within SEQ ID NO 2181: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 2Fe-2S iron-sulfur cluster binding domains 

- Location within SEQ ID NO 2182: from 45 to 109 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2183 

- Ceres seq_id 1600335 

- Location of start within SEQ ID NO 2181: at 44 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2184 

- Ceres seq__id 1600336 

- Location of start within SEQ ID NO 2181: at 47 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

327734 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2185 

- Ceres seq_id 1600346 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2186 

- Ceres seq_id 1600347 

- Location of start within SEQ ID NO 2185: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2187 

- Ceres seq_id 1600348 

- Location of start within SEQ ID NO 2185: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- Nuclear transition protein 2 

- Location within SEQ ID NO 2187: from 6 to 89 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2188 

- Ceres seq__id 1600349 

- Location of start within SEQ ID NO 2185: at 83 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

327995 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2189 

- Ceres seq_id 1600361 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2190 

- Ceres seq_id 1600362 

- Location of start within SEQ ID NO 2189: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2191 

- Ceres seq_id 1600363 

- Location of start within SEQ ID NO 2189: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Histone-like transcription factors (CBF/NF-Y) and archaeal 

histones . 

- Location within SEQ ID NO 2191: from 111 to 160 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 104 4 

- gi No. 5257260 

- Description: (AP000364) Similar to sequence of BAC F7G19 from 
Arabidopsis thaliana. (AC000106) [Oryza sativa] 

- % Identity: 73.4 

- Alignment Length: 143 

- Location of Alignment in SEQ ID NO 2191: from 29 to 160 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2192 

- Ceres seq_id 1600364 

- Location of start within SEQ ID NO 2189: at 86 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Histone-like transcription factors { CBF/NF-Y) and archaeal 

histones . 

- Location within SEQ ID NO 2192: from 83 to 132 aa . 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 1045 

- gi No. 5257260 

- Description: (AP000364) Similar to sequence of BAC F7G19 from 
Arabidopsis thaliana. (AC000106) [Oryza sativa] 

- % Identity: 73.4 

- Alignment Length: 143 

- Location of Alignment in SEQ ID NO 2192: from 1 to 132 

Maximum Length Sequence: 

related to: 
Clone IDs: 

328211 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2193 

- Ceres seq_id 1600383 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2194 

- Ceres seq_id 1600384 

- Location of start within SEQ ID NO 2193: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2195 

- Ceres seq_id 1600385 

- Location of start within SEQ ID NO 2193: at 221 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2196 

- Ceres seq__id 1600386 

- Location of start within SEQ ID NO 2193: at 288 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 104 6 

- gi No. 119111 

- Description: EBNA-2 NUCLEAR PROTEIN >gi | 1 083 964 i pir | 1 S424 42 EBNA2 
protein - human herpesvirus 4 >gi | 1632787 | emb | CAA24877 . 1 | (V01555) BYRF1, 
encodes EBNA-2 (Dambaugh et al, 1984; Dillner et al, 1984) [Human herpesvirus 
4] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 2196: from 44 to 55 



Maximum Length Sequence : 

related to: 
Clone IDs: 

328408 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2197 

- Ceres seq_id 1600395 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2198 
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- Ceres seq_id 1600396 

- Location of start within SEQ ID NO 2197: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide <s) 

- Ribosomal S3Ae family 

- Location within SEQ ID NO 2198: from 37 to 110 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 104 7 

- gi No. 1350986 

- Description: 40S RIBOSOMAL PROTEIN S3A (CYC07 PROTEIN) 
>gi!483431|dbj [BAA05059I (D26060) cyc07 [Oryza sativa] 

- % Identity: 93.2 

- Alignment Length: 4 4 

- Location of Alignment in SEQ ID NO 2198: from 26 to 69 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2199 

- Ceres seq_id 1600397 

- Location of start within SEQ ID NO 2197: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Protamine PI 

- Location within SEQ ID NO 2199: from 10 to 61 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 104 8 

- gi No. 464638 

- Description: 60S RIBOSOMAL PROTEIN L41 >gi | 481283 i pir || S38425 
ribosomal protein GL41 - upland cotton >gi | 407801 I emb | CAA53175 | (X75423) 
ribosomal protein 41, 60S subunit [Gossypium hirsutum] >gi 1825784 (U26255] 
ribosomal protein L41 

- % Identity: 100 

- Alignment Length: 25 

- Location of Alignment in SEQ ID NO 2199: from 89 to 113 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2200 

- Ceres seq_id 1600398 

- Location of start within SEQ ID NO 2197: at 77 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribosomal S3Ae family 

- Location within SEQ ID NO 2200: from 12 to 85 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1049 

- gi No. 1350986 

- Description: 40S RIBOSOMAL PROTEIN S3A (CYC07 PROTEIN) 
>gi [ 483431 | dbj |BAA05059| (D26060) cyc07 [Oryza sativa] 

- % Identity: 93.2 

- Alignment Length: 4 4 

- Location of Alignment in SEQ ID NO 2200: from 1 to 44 

Maximum Length Sequence: 

related to: 
Clone IDs: 

328415 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2201 

- Ceres seq_id 1600399 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2202 

- Ceres seq_id 1600400 

- Location of start within SEQ ID NO 2201: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2203 

- Ceres seq_id 1600401 

- Location of start within SEQ ID NO 2201: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin family 

- Location within SEQ ID NO 2203: from 36 to 87 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1050 

- gi No. 899608 

- Description: (U29158) polyubiquitin [Zea mays] 

- % Identity: 83.6 

- Alignment Length: 67 

- Location of Alignment in SEQ ID NO 2203: from 21 to 87 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2204 

- Ceres seq_id 1600402 

- Location of start within SEQ ID NO 2201: at 108 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ubiquitin family 

- Location within SEQ ID NO 2204: from 1 to 52 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1051 

- gi No. 899608 

- Description: (U29158) polyubiquitin [Zea mays] 

- % Identity: 83.6 

- Alignment Length: 67 

- Location of Alignment in SEQ ID NO 2204: from 1 to 52 

Maximum Length Sequence: 

related to: 
Clone IDs: 

328682 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2205 

- Ceres seq_id 1600434 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2206 

- Ceres seq_id 1600435 

- Location of start within SEQ ID NO 2205: at 38 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- WD domain, G-beta repeat 

- Location within SEQ ID NO 2206: from 44 to 81 aa. 



(Dp) Related Amino Acid Sequences 



Maximum Length Sequence: 

related to: 
Clone IDs: 

328731 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2207 

- Ceres seq_id 1600436 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2208 

- Ceres seq_id 1600437 

- Location of start within SEQ ID NO 2207: at 93 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- S-adenosylmethionine synthetase 

- Location within SEQ ID NO 2208: from 5 to 119 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1052 

- gi No. 3024122 

- Description: S-ADENOSYLMETHIONINE SYNTHETASE 2 (METHIONINE 
ADENOSYLTRANSFERASE 2) {ADOMET SYNTHETASE 2) >gi 11778821 (U82833) S-adenosyl- 
L-methionine synthetase [Oryza sativa] 

- % Identity: 96.7 

- Alignment Length: 120 

- Location of Alignment in SEQ ID NO 2208: from 1 to 119 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2209 

- Ceres seq_id 1600438 

- Location of start within SEQ ID NO 2207: at 188 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

» Pat. Appln. SEQ ID NO 2210 

- Ceres seq_id 1600439 

- Location of start within SEQ ID NO 2207: at 246 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- S-adenosylmethionine synthetase 

- Location within SEQ ID NO 2210: from 1 to 68 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1053 

- gi No. 3024122 

- Description: S-ADENOSYLMETHIONINE SYNTHETASE 2 (METHIONINE 
ADENOSYLTRANSFERASE 2) {ADOMET SYNTHETASE 2) >gi| 1778821 (U82833) S-adenosyl- 
L-methionine synthetase [Oryza sativa] 

- % Identity: 96.7 

- Alignment Length: 120 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 4 64 



- Location of Alignment in SEQ ID NO 2210: from 1 to 68 



Maximum Length Sequence: 

related to: 
Clone IDs: 

328859 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2211 

- Ceres seq_id 1600444 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2212 

- Ceres seq_id 1600445 

- Location of start within SEQ ID NO 2211: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2213 

- Ceres seq_id 1600446 

- Location of start within SEQ ID NO 2211: at 170 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sm protein 

- Location within SEQ ID NO 2213: from 9 to 82 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

328868 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2214 

- Ceres seq_id 1600451 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2215 

- Ceres seq_id 1600452 

- Location of start within SEQ ID NO 2214: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ribosomal protein S12 

- Location within SEQ ID NO 2215: from 44 to 158 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1054 

- gi No. 2829742 

- Description: 40S RIBOSOMAL PROTEIN S23 >gi 11754684 (U81008) 
ribosomal protein S23 [Brugia malayi] 

- % Identity: 79.7 

- Alignment Length: 123 

- Location of Alignment in SEQ ID NO 2215: from 38 to 158 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2216 

- Ceres seq__id 1600453 

- Location of start within SEQ ID NO 2214: at 112 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S12 

- Location within SEQ ID NO 2216: from 7 to 121 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1055 

- gi No. 2829742 

- Description: 40S RIBOSOMAL PROTEIN S23 >gi| 1754684 (U81008) 
ribosomal protein S23 [Brugia malayi] 

- % Identity: 79.7 

- Alignment Length: 123 

- Location of Alignment in SEQ ID NO 2216: from 1 to 121 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2217 

- Ceres seq_id 1600454 

- Location of start within SEQ ID NO 2214: at 130 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S12 

- Location within SEQ ID NO 2217: from 1 to 115 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1056 

- gi No. 2829742 

- Description: 40S RIBOSOMAL PROTEIN S23 >gi 11754684 (U81008) 
ribosomal protein S23 [Brugia malayi] 

- % Identity: 79.7 

- Alignment Length: 123 

- Location of Alignment in SEQ ID NO 2217: from 1 to 115 

Maximum Length Sequence : 

related to: 
Clone IDs: 

329154 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2218 

- Ceres seq_id 1600459 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2219 

- Ceres seq_id 1600460 

- Location of start within SEQ ID NO 2218: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypept ide ( s ) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 2219: from 95 to 166 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1057 

- gi No. 2500522 

- Description: EUKARYOTIC INITIATION FACTOR 4A (EIF-4A) >gi i 6031 
(U17979) translation initiation factor eIF-4A [Zea mays] 

- % Identity: 100 

- Alignment Length: 122 

- Location of Alignment in SEQ ID NO 2219: from 46 to 166 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2220 
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- Ceres seq_id 1600461 

- Location of start within SEQ ID NO 2218: at 136 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 2220: from 50 to 121 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1058 

- gi No. 2500522 

- Description: EUKARYOTIC INITIATION FACTOR 4A {EIF-4A) >gi 1603190 
(U17979) translation initiation factor eIF-4A [Zea mays] 

- % Identity: 100 

- Alignment Length: 122 

- Location of Alignment in SEQ ID NO 2220: from 1 to 121 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2221 

- Ceres seq_id 1600462 

- Location of start within SEQ ID NO 2218: at 193 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 2221: from 31 to 102 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1059 

- gi No. 2500522 

- Description: EUKARYOTIC INITIATION FACTOR 4A (EIF-4A) >gi| 603190 
(U17979) translation initiation factor eIF-4A [Zea mays] 

- % Identity: 100 

- Alignment Length: 122 

- Location of Alignment in SEQ ID NO 2221: from 1 to 102 



Maximum Length Sequence: 

related to: 
Clone IDs: 

329163 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2222 

- Ceres seq_id 1600463 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2223 

- Ceres seq__id 1600464 

- Location of start within SEQ ID NO 2222: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2224 

- Ceres seq__id 1600465 

- Location of start within SEQ ID NO 2222: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- 7 transmembrane receptor {rhodopsin family) 

- Location within SEQ ID NO 2224: from 19 to 115 aa. 
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(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2225 

- Ceres seq_id 1600466 

- Location of start within SEQ ID NO 2222: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

329265 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 222 6 

- Ceres seq_id 1600471 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2227 

- Ceres seq__id 1600472 

- Location of start within SEQ ID NO 2226: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 60 

- gi No. 1765899 

- Description: (Y07 917) Spot 3 protein [Arabidopsis thaliana] 
>gi! 1839244 (U86700) EGF receptor like protein [Arabidopsis thaliana] 

- % Identity: 81.7 

- Alignment Length: 131 

- Location of Alignment in SEQ ID NO 2227: from 1 to 131 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2228 

- Ceres seq_id 1600473 

- Location of start within SEQ ID NO 2226: at 136 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 61 

- gi No. 1765899 

- Description: (Y07917) Spot 3 protein [Arabidopsis thaliana] 
>gi[ 1839244 (U86700) EGF receptor like protein [Arabidopsis thaliana] 

- % Identity: 81.7 

- Alignment Length: 131 

- Location of Alignment in SEQ ID NO 2228: from 1 to 8 6 

Maximum Length Sequence: 

related to: 
Clone IDs: 

329340 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2229 

- Ceres seq_id 1600478 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2230 

- Ceres seq_id 1600479 
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- Location of start within SEQ ID NO 2229: at 34 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Chalcone and stilbene synthases 

- Location within SEQ ID NO 2230: from 17 to 151 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1062 

- gi No. 5123702 

- Description: (AL079347) chalcone synthase-like protein 
[Arabidopsis thaliana] 

- % Identity: 73 

- Alignment Length: 137 

- Location of Alignment in SEQ ID NO 2230: from 16 to 151 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2231 

- Ceres seq_id 1600480 

- Location of start within SEQ ID NO 222 9: at 67 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

- Chalcone and stilbene synthases 

- Location within SEQ ID NO 2231: from 6 to 140 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 63 

- gi No. 5123702 

- Description: (AL079347) chalcone synthase-like protein 
[Arabidopsis thaliana] 

- % Identity: 73 

- Alignment Length: 137 

- Location of Alignment in SEQ ID NO 2231: from 5 to 140 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2232 

- Ceres seq___id 1600481 

- Location of start within SEQ ID NO 2229: at 136 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Chalcone and stilbene synthases 

- Location within SEQ ID NO 2232: from 1 to 117 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 64 

- gi No. 5123702 

- Description: (AL079347) chalcone synthase-like protein 
[Arabidopsis thaliana] 



- % Identity: 73 

- Alignment Length: 137 

- Location of Alignment in SEQ ID NO 2232: from 1 to 117 



Maximum Length Sequence: 

related to: 
Clone IDs: 

329475 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2233 

- Ceres seq__id 1600496 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2234 

- Ceres seq_id 1600497 

- Location of start within SEQ ID NO 2233: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2235 

- Ceres seq_id 1600498 

- Location of start within SEQ ID NO 2233: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2236 

- Ceres seq_id 1600499 

- Location of start within SEQ ID NO 2233: at 109 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 65 

- gi No. 2117937 

- Description: UTP — glucose-l-phosphate uridylyltransf erase (EC 
2.7.7.9) - barley >gi i 1212 996 | emb j CAA62 68 9 | (X91347) UDP-glucose 
pyrophosphorylase [Hordeum vulgare] 

- % Identity: 86.6 

- Alignment Length: 134 

- Location of Alignment in SEQ ID NO 2236: from 1 to 133 

Maximum Length Sequence : 

related to: 
Clone IDs: 

329488 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2237 

- Ceres seq_id 1600500 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2238 

- Ceres seq_id 1600501 

- Location of start within SEQ ID NO 2237: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2239 

- Ceres seq__id 1600502 

- Location of start within SEQ ID NO 2237: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Zinc-binding dehydrogenases 

- Location within SEQ ID NO 2239: from 63 to 151 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 1066 

- gi No. 3913182 

- Description: C I NNAMYL -ALCOHOL DEHYDROGENASE (CAD) 

>gi 1 2239258 | emb | CAA74070 i (Y13733) cinnamyl alcohol dehydrogenase [Zea mays] 

- % Identity: 99 

- Alignment Length: 10 4 

- Location of Alignment in SEQ ID NO 2239: from 50 to 151 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2240 

- Ceres seq_id 1600503 

- Location of start within SEQ ID NO 2237: at 149 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Zinc-binding dehydrogenases 

- Location within SEQ ID NO 2240: from 14 to 102 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 67 

- gi No. 3913182 

7 Description: CINNAMYL -ALCOHOL DEHYDROGENASE (CAD) 
>gi | 2239258 | emb ! CAA74070 | (Y13733) cinnamyl alcohol dehydrogenase [Zea mays] 

- % Identity: 99 

- Alignment Length: 104 

- Location of Alignment in SEQ ID NO 2240: from 1 to 102 



Maximum Length Sequence : 

related to: 
Clone IDs: 

329498 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2241 

- Ceres seq_id 1600504 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2242 

- Ceres seq_id 1600505 

- Location of start within SEQ ID NO 2241: at 1 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 68 

- gi No. 5123787 

- Description: (AJ007041) trithorax homologue 2 [Homo sapiens] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 2242: from 144 to 154 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2243 

- Ceres seq_id 1600506 

- Location of start within SEQ ID NO 2241: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2243: from 2 to 124 aa. 



(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 224 4 

- Ceres seq_id 1600507 

- Location of start within SEQ ID NO 2241: at 71 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

329518 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2245 

- Ceres seq__id 1600512 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2246 

- Ceres seq_id 1600513 

- Location of start within SEQ ID NO 2245: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2247 

- Ceres seq_id 1600514 

- Location of start within SEQ ID NO 2245: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2248 

- Ceres seq__id 1600515 

- Location of start within SEQ ID NO 2245: at 305 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1069 

- gi No. 2894607 

- Description: (AL02188 9) NAM (no apical meristem) - like protein 
[Arabidopsis thaliana] 

- % Identity: 75 

- Alignment Length: 4 4 

- Location of Alignment in SEQ ID NO 2248: from 6 to 48 



Maximum Length Sequence: 

related to: 
Clone IDs: 

319810 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 224 9 

- Ceres seq_id 1600528 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2250 

- Ceres seq_id 1600529 
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- Location of start within SEQ ID NO 2249: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 107 0 

- gi No. 1143511 

- Description: (Z4707 6) Ser/Thr protein phosphatase homologous to 
PPX [Malus domestical >gi I 1586034 i prf | | 2202340A Ser/Thr protein phosphatase 
[Malus domestical 

- % Identity: 92 

- Alignment Length: 8 8 

- Location of Alignment in SEQ ID NO 2250: from 7 9 to 95 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2251 

- Ceres seq_id 1600530 

- Location of start within SEQ ID NO 2249: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

319967 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2252 

- Ceres seq_id 1600535 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2253 

- Ceres seq_id 1600536 

- Location of start within SEQ ID NO 2252: at 132 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- emp24/gp25L/p24 family 

- Location within SEQ ID NO 2253: from 8 to 122 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

320043 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2254 

- Ceres seq_id 1600537 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2255 

- Ceres seq_id 1600538 

- Location of start within SEQ ID NO 2254: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2256 

- Ceres seq_id 1600539 
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- Location of start within SEQ ID NO 2254: at 2 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- short chain dehydrogenase 

- Location within SEQ ID NO 2256: from 66 to 129 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2257 

- Ceres seq_id 1600540 

- Location of start within SEQ ID NO 2254: at 149 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- short chain dehydrogenase 

- Location within SEQ ID NO 2257: from 17 to 80 aa. 



Maximum Length Sequence : 

related to: 
Clone IDs: 

320898 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2258 

- Ceres seq__id 1600552 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2259 

- Ceres seq_id 1600553 

- Location of start within SEQ ID NO 2258: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2260 

- Ceres seq_id 1600554 

- Location of start within SEQ ID NO 2258: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1071 

- gi No. 3132288 

- Description: (AB011490) short ORF [TT virus] 

- % Identity: 70.6 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 2260: from 121 to 137 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2261 

- Ceres seq_id 1600555 

- Location of start within SEQ ID NO 2258: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2261: from 6 to 138 aa. 



(Dp) Related Amino Acid Sequences 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 1072 

- gi No. 624076 

- Description: (U42580) contains Pro-rich Px motifs: SPKPP (20X) 
PEPPA (9X); similar to soybean pro-rich cell wall protein, corresponds to 
Swiss-Prot Accession Number P13993 [Paramecium bursaria Chlorella virus 1] 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 2261: from 82 to 95 

Maximum Length Sequence: 

related to: 
Clone IDs: 

321554 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 22 62 

- Ceres seq_id 1600559 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2263 

- Ceres seq_id 1600560 

- Location of start within SEQ ID NO 22 62: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1073 

- gi No. 1170401 

- Description: SPERM PROTAMINE PI >gi | 598338 (L35446) protamine 
[Notoryctes typhlops] >gi | 1582122 | prf | ! 2117429Q protamine PI [Notoryctes 
typhi ops] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2263: from 65 to 75 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2264 

- Ceres seq_id 1600561 

- Location of start within SEQ ID NO 2262: at 109 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

321625 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2265 

- Ceres seq_id 1600562 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2266 

- Ceres seq_id 1600563 

- Location of start within SEQ ID NO 2265: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2267 

- Ceres seq__id 1600564 

- Location of start within SEQ ID NO 2265; at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide {s} 

- Tubulin 

- Location within SEQ ID NO 2267: from 43 to 106 aa. 

{Dp) Related Amino Acid Sequences 

- Alignment No. 1074 

- gi No. 401161 

- Description: TUBULIN ALP HA- 5 CHAIN >gi | 32287 9 I pir M S28 982 tubulin 
alpha-5 chain - maize >gi | 2215 6 | emb | CAA4 4 8 62 | (X63177) alpha-tubulin #5 [Zea 
mays] >gij450293 (L27815) alpha-tubulin [Zea mays] >gi|452474 (U05258) alpha- 
tubulin [Zea mays] 

- % Identity: 98.4 

- Alignment Length: 64 

- Location of Alignment in SEQ ID NO 2267: from 43 to 106 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2268 

- Ceres seq_id 1600565 

- Location of start within SEQ ID NO 2265: at 225 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Tubulin 

- Location within SEQ ID NO 2268: from 31 to 91 aa . 

{Dp) Related Amino Acid Sequences 

- Alignment No. 1075 

- gi No. 401161 

- Description: TUBULIN ALPHA-5 CHAIN >gi i 32287 9 | pir | | S2 8 982 tubulin 
alpha-5 chain - maize >gi | 2215 6 i emb | CAA4 4 8 62 i (X63177) alpha-tubulin #5 [Zea 
mays] >gi|450293 (L27815) alpha-tubulin [Zea mays] >gi|452474 {U05258) alpha- 
tubulin [Zea mays] 

- % Identity: 100 

- Alignment Length: 61 

- Location of Alignment in SEQ ID NO 2268: from 31 to 91 

Maximum Length Sequence: 

related to: 
Clone IDs: 

321788 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2269 

- Ceres seq_id 1600574 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2270 

- Ceres seq_id 1600575 

- Location of start within SEQ ID NO 2269: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- linker histone HI and H5 family 

- Location within SEQ ID NO 2270: from 33 to 107 aa. 
(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 476 



- Pat. Appln. SEQ ID NO 2271 

- Ceres seq_id 1600576 

- Location of start within SEQ ID NO 2269: at 49 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- linker histone HI and H5 family 

- Location within SEQ ID NO 2271: from 17 to 91 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2272 

- Ceres seq_id 1600577 

- Location of start within SEQ ID NO 2269: at 112 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- linker histone HI and H5 family 

- Location within SEQ ID NO 2272: from 1 to 70 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

322757 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2273 

- Ceres seq_id 1600599 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2274 

- Ceres seq_id 1600600 

- Location of start within SEQ ID NO 2273: at 135 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Proteasome A-type and B-type 

- Location within SEQ ID NO 2274: from 34 to 140 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 107 6 

- gi No. 2511574 

- Description: (Y13176) multicatalytic endopeptidase [Arabidopsis 
thaliana] >gi I 3421075 (AF043520) 20S proteasome subunit PAB1 [Arabidopsis 
thaliana] >gi | 4 966368 | gb | AAD34 699 . 1 | AC00634 1_27 (AC006341) Identical to 
gb|Y1317 6 Arabidopsis thaliana mRNA for 

- % Identity: 78.7 

- Alignment Length: 142 

- Location of Alignment in SEQ ID NO 2274: from 1 to 140 

Maximum Length Sequence: 

related to: 
Clone IDs: 

322831 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2275 

- Ceres seq_id 1600603 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2276 

- Ceres seq_id 1600604 
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- Location of start within SEQ ID NO 2275: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2277 

- Ceres seq_id 1600605 

- Location of start within SEQ ID NO 2275: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 14-3-3 proteins 

- Location within SEQ ID NO 2277: from 44 to 109 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2278 

- Ceres seq_id 1600606 

- Location of start within SEQ ID NO 2275: at 122 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

- 14-3-3 proteins 

- Location within SEQ ID NO 2278: from 4 to 69 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

323674 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 227 9 

- Ceres seq_id 1600624 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2280 

- Ceres seq_id 1600625 

- Location of start within SEQ ID NO 227 9: at 82 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1077 

- gi No. 543708 

- Description: wheat aluminum induced protein wall 3 - wheat 
>gi 1170791 (L11881) wali3 [Triticum aestivum] 

- % Identity: 82.4 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 2280: from 1 to 17 

Maximum Length Sequence : 

related to: 
Clone IDs: 

323807 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2281 

- Ceres seq_id 1600634 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2282 

- Ceres seq_id 1600635 

- Location of start within SEQ ID NO 2281: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2283 

- Ceres seq_id 1600636 

- Location of start within SEQ ID NO 2281: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2283: from 41 to 118 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2284 

- Ceres seq_id 1600637 

- Location of start within SEQ ID NO 2281: at 119 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

323813 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2285 

- Ceres seq_id 1600638 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2286 

- Ceres seq_id 1600639 

- Location of start within SEQ ID NO 2285: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2287 

- Ceres seq_id 1600640 

- Location of start within SEQ ID NO 2285: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2288 

- Ceres seq_id 1600641 

- Location of start within SEQ ID NO 2285: at 3 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 107 8 

- gi No. 465445 

- Description: PROBABLE NUCLEAR ANTIGEN >gi | 4187 08 | pir | | B4 534 4 
probable nuclear antigen - suid herpesvirus 1 (strain Kaplan) >gi 1334072 
(M34651) ORF-3 protein [ Pseudorabies virus] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2288: from 42 to 52 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324439 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2289 

- Ceres seq_id 1600644 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2290 

- Ceres seq_id 1600645 

- Location of start within SEQ ID NO 2289: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Ribosomal protein L19e 

- Location within SEQ ID NO 2290: from 26 to 148 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 107 9 

- gi No. 3377797 

- Description: (AF075597) Similar to 60S ribosome protein L19; 
coded for by A. thaliana cDNA T04719; coded for by A. thaliana cDNA H3604 6; 
coded for by A. thaliana cDNA T44067; coded for by A. thaliana cDNA 

- % Identity: 79.2 

- Alignment Length: 125 

- Location of Alignment in SEQ ID NO 2290: from 25 to 148 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2291 

- Ceres seq_id 1600646 

- Location of start within SEQ ID NO 2289: at 74 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L19e 

- Location within SEQ ID NO 2291: from 2 to 124 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1080 

- gi No. 3377797 

- Description: (AF075597) Similar to 60S ribosome protein L19; 
coded for by A. thaliana cDNA T04719; coded for by A. thaliana cDNA H3604 6; 
coded for by A. thaliana cDNA T44 067; coded for by A. thaliana cDNA 

- % Identity: 79.2 

- Alignment Length: 125 

- Location of Alignment in SEQ ID NO 2291: from 1 to 124 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 22 92 

- Ceres seq_id 1600647 

- Location of start within SEQ ID NO 2289: at 173 nt . 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 4 80 



(C) Nomination and Annotation of Domains within Predicted 
Polypept ide ( s ) 

- Ribosomal protein L19e 

- Location within SEQ ID NO 2292: from 1 to 91 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1081 

- gi No. 3377797 

- Description: (AF075597) Similar to 60S ribosome protein L19; 
coded for by A. thaliana cDNA T04719; coded for by A. thaliana cDNA H3604 6; 
coded for by A. thaliana cDNA T44 067; coded for by A. thaliana cDNA 

- % Identity: 79.2 

- Alignment Length: 125 

- Location of Alignment in SEQ ID NO 2292: from 1 to 91 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324519 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2293 

- Ceres seq_id 1600652 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2294 

- Ceres seq_id 1600653 

- Location of start within SEQ ID NO 2293: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2295 

- Ceres seq_id 1600654 

- Location of start within SEQ ID NO 22 93: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1082 

- gi No. 3249109 

- Description: (AC003114) Contains similarity to pre-mRNA splicing 
factor (SF2) , P33 subunit gb|M72709 from Homo sapiens. ESTs gbjT42588 and 
gb|R65514 come from this gene. [Arabidopsis thaliana] 

- % Identity: 85.7 

- Alignment Length: 4 9 

- Location of Alignment in SEQ ID NO 2295: from 30 to 78 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2296 

- Ceres seq_id 1600655 

- Location of start within SEQ ID NO 2293: at 90 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1083 

- gi No. 3249109 
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- Description: (AC003114) Contains similarity to pre-mRNA splicing 
factor (SF2), P33 subunit gb|M72709 from Homo sapiens. ESTs gb|T42588 and 
gb[R65514 come from this gene. [Arabidopsis thaliana] 

- % Identity: 85.7 

- Alignment Length: 4 9 

- Location of Alignment in SEQ ID NO 2296: from 1 to 49 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324538 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2297 

- Ceres seq_id 1600660 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2298 

- Ceres seq_id 1600661 

- Location of start within SEQ ID NO 2297: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2299 

- Ceres seq_id 1600662 

- Location of start within SEQ ID NO 2297: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2300 

- Ceres seq_id 1600663 

- Location of start within SEQ ID NO 2297: at 221 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 108 4 

- gi No. 3421096 

- Description: (AF043528) 20S proteasorne subunit PAG1 [Arabidopsis 
thaliana] >gi | 3885332 (AC005623) proteasorne component [Arabidopsis thaliana] 

- % Identity: 91.8 

- Alignment Length: 4 9 

- Location of Alignment in SEQ ID NO 2300: from 1 to 4 9 

Maximum Length Sequence : 

related to: 
Clone IDs: 

325303 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2301 

- Ceres seq_id 1600688 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2302 

- Ceres seq_id 1600689 

- Location of start within SEQ ID NO 2301: at 1 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2303 

- Ceres seq_id 1600690 

- Location of start within SEQ ID NO 2301: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1085 

- gi No. 4996642 

- Description: (AB028130) Dof zinc finger protein [Oryza sativa] 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 2303: from 12 to 25 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2304 

- Ceres seq_id 1600691 

- Location of start within SEQ ID NO 2301: at 101 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325363 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2305 

- Ceres seq_id 1600707 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2306 

- Ceres seq_id 1600708 

- Location of start within SEQ ID NO 2305: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Mucin-like glycoprotein 

- Location within SEQ ID NO 2306: from 20 to 79 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2307 

- Ceres seq_id 1600709 

- Location of start within SEQ ID NO 2305: at 85 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Mucin-like glycoprotein 

- Location within SEQ ID NO 2307: from 1 to 51 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 
related to: 
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Clone IDs: 

325453 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2308 

- Ceres seq_id 1600721 
{ B ) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2309 

- Ceres seq_id 1600722 

- Location of start within SEQ ID NO 2308: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1086 

- gi No. 404466 

- Description: (S64499) H3 histone [Styela plicata, sperm, Peptide, 
136 aa] [Styela plicata] 

- % Identity: 94.4 

- Alignment Length: 18 

- Location of Alignment in SEQ ID NO 2309: from 26 to 43 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2310 

- Ceres seq_id 1600723 

- Location of start within SEQ ID NO 2308: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2311 

- Ceres seq_id 1600724 

- Location of start within SEQ ID NO 2308: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325548 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2312 

- Ceres seq_id 1600739 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2313 

- Ceres seq_id 1600740 

- Location of start within SEQ ID NO 2312: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1087 

- gi No. 3688319 

- Description: (Y17562) surface protein [Hepatitis B virus] 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 2313: from 34 to 45 



Table 1 
Page 483 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325632 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2314 

- Ceres seq_id 1600744 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2315 

- Ceres seq_id 1600745 

- Location of start within SEQ ID NO 2314: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2316 

- Ceres seq_id 1600746 

- Location of start within SEQ ID NO 2314: at 148 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2317 

- Ceres seq_id 1600747 

- Location of start within SEQ ID NO 2314: at 24 9 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 108 8 

- gi No. 1917019 

- Description: (U92045) ribosomal protein S6 RPS6-1 [Zea mays] 

- % Identity: 94.5 

- Alignment Length: 91 

- Location of Alignment in SEQ ID NO 2317: from 1 to 39 

Maximum Length Sequence : 

related to: 
Clone IDs: 

327242 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2318 

- Ceres seq_id 1600768 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2319 

- Ceres seq_id 1600769 

- Location of start within SEQ ID NO 2318: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ras family 

- Location within SEQ ID NO 2319: from 51 to 167 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 1089 

- gi No. 3024501 
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- Description: RAS -RELATED PROTEIN RAB11C >gi I 137014 6 j emb 1 CAA98 17 9 I 
(Z73951) RAB11C [Lotus japonicus] 

- % Identity: 91.5 

- Alignment Length: 12 9 

- Location of Alignment in SEQ ID NO 2319: from 38 to 166 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2320 

- Ceres seq_id 1600770 

- Location of start within SEQ ID NO 2318: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2321 

- Ceres seq_id 1600771 

- Location of start within SEQ ID NO 2318: at 112 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ras family 

- Location within SEQ ID NO 2321: from 14 to 130 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1090 

- gi No. 3024501 

- Description: RAS -RELATED PROTEIN RAB11C >gi ! 137 014 6 | emb | CAA9817 9 j 
(Z73951) RAB11C [Lotus japonicus] 

- % Identity: 91.5 

- Alignment Length: 12 9 

- Location of Alignment in SEQ ID NO 2321: from 1 to 129 



Maximum Length Sequence : 

related to: 
Clone IDs: 

328029 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2322 

- Ceres seq_id 1600772 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2323 

- Ceres seq_id 1600773 

- Location of start within SEQ ID NO 2322: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Oxidoreductase family 

- Location within SEQ ID NO 2323: from 16 to 121 aa. 



(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2324 

- Ceres seq_id 1600774 

- Location of start within SEQ ID NO 2322: at 217 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Oxidoreductase family 
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- Location within SEQ ID NO 2324: from 1 to 88 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2325 

- Ceres seq_id 1600775 

- Location of start within SEQ ID NO 2322: at 223 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Oxidoreductase family 

- Location within SEQ ID NO 2325: from 1 to 86 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

328100 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2326 

- Ceres seq__id 1600776 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2327 

- Ceres seq_id 1600777 

- Location of start within SEQ ID NO 2326: at 120 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 91 

- gi No. 1710509 

- Description: 60S RIBOSOMAL PROTEIN L18A 

- % Identity: 89.5 

- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 2327: from 69 to 8 6 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2328 

- Ceres seq_id 1600778 

- Location of start within SEQ ID NO 2326: at 204 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 10 92 

- gi No. 1710509 

- Description: 60S RIBOSOMAL PROTEIN L18A 

- % Identity: 89.5 

- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 2328: from 41 to 58 



Maximum Length Sequence : 

related to: 
Clone IDs: 

328245 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 232 9 

- Ceres seq_id 1600782 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2330 

- Ceres seq_id 1600783 

- Location of start within SEQ ID NO 2329: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2331 

- Ceres seq_id 1600784 

- Location of start within SEQ ID NO 2329: at 117 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Thiolase 

- Location within SEQ ID NO 2331: from 8 to 121 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1093 

- gi No. 1542941 

- Description: (X78116) Acetoacetyl-coenzyme A thiolase [Raphanus 

sativus ] 

- % Identity: 77.4 

- Alignment Length: 12 4 

- Location of Alignment in SEQ ID NO 2331: from 1 to 121 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2332 

- Ceres seq_id 1600785 

- Location of start within SEQ ID NO 2329: at 177 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Thiolase 

- Location within SEQ ID NO 2332: from 1 to 101 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 10 94 

- gi No. 1542941 

- Description: (X78116) Acetoacetyl-coenzyme A thiolase [Raphanus 

sativus ] 

- % Identity: 77.4 

- Alignment Length: 12 4 

- Location of Alignment in SEQ ID NO 2332: from 1 to 101 



Maximum Length Sequence: 

related to: 
Clone IDs: 

328310 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2333 

- Ceres seq__id 1600786 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2334 

- Ceres seq_id 1600787 

- Location of start within SEQ ID NO 2333: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin family 
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- Location within SEQ ID NO 2334: from 16 to 91 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 95 

- gi No. 3335355 

- Description: (AC004512) Match to polyubiquitin DNA gb[L05401 from 
A. thaliana. Contains insertion of mitochondrial NADH dehydrogenase gb 1X82 618 
and gb 1X98301. May be a pseudogene with an expressed insert. EST 
gblAA586248 comes from this region. [Arabi . . . 

- % Identity: 98.8 

- Alignment Length: 160 

- Location of Alignment in SEQ ID NO 2334: from 1 to 160 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2335 

- Ceres seq_id 1600788 

- Location of start within SEQ ID NO 2333: at 4 6 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin family 

- Location within SEQ ID NO 2335: from 1 to 76 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1096 

- gi No. 3335355 

- Description: (AC004512) Match to polyubiquitin DNA gb|L05401 from 
A. thaliana. Contains insertion of mitochondrial NADH dehydrogenase gb 1X82618 
and gb 1X98301. May be a pseudogene with an expressed insert. EST 
gb|AA586248 comes from this region. [Arabi... 

- % Identity: 98.8 

- Alignment Length: 160 

- Location of Alignment in SEQ ID NO 2335: from 1 to 145 

Maximum Length Sequence: 

related to: 
Clone IDs: 

328900 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2336 

- Ceres seq_id 1600790 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2337 

- Ceres seq_id 1600791 

- Location of start within SEQ ID NO 2336: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2338 

- Ceres seq__id 1600792 

- Location of start within SEQ ID NO 2336: at 74 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2339 
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- Ceres seq_id 1600793 

- Location of start within SEQ ID NO 2336: at 304 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1097 

- gi No. 2655291 

- Description: (AF032974) germin-like protein 4 [Oryza sativa] 

- % Identity: 76.9 

- Alignment Length: 6 6 

- Location of Alignment in SEQ ID NO 2339: from 1 to 61 

Maximum Length Sequence: 

related to: 
Clone IDs: 

328909 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2340 

- Ceres seq_id 1600794 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2341 

- Ceres seq_id 1600795 

- Location of start within SEQ ID NO 2340: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2342 

- Ceres seq_id 1600796 

- Location of start within SEQ ID NO 2340: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 98 

- gi No. 2275260 

- Description: (U92813) tractin [Hirudo medicinalis] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2342: from 126 to 136 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2343 

- Ceres seq_id 1600797 

- Location of start within SEQ ID NO 2340: at 145 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

329231 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 234 4 

- Ceres seq_id 1600812 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2345 

- Ceres seq_id 1600813 

- Location of start within SEQ ID NO 2344: at 3 nt, 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin family 

- Location within SEQ ID NO 2345: from 28 to 103 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 10 99 

- gi No. 100934 

- Description: ubiquitin precursor Ubi-1 - maize 

>gi | 422037 jpir | | S20926 ubiquitin precursor Ubi-2 - maize >gi I 24 8337 | bbs | 94 4 65 
(S94464) polyubiquitin (ubiquitin) [maize, Peptide, 533 aa] [Zea mays] 
Peptide, 533 aa] [Zea mays] 

- % Identity: 95.3 

- Alignment Length: 85 

- Location of Alignment in SEQ ID NO 2345: from 24 to 107 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 234 6 

- Ceres seq__id 1600814 

- Location of start within SEQ ID NO 2344: at 84 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin family 

- Location within SEQ ID NO 234 6: from 1 to 7 6 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1100 

- gi No. 100934 

- Description: ubiquitin precursor Ubi-1 - maize 

>gi I 422037 |pir | 1 S20926 ubiquitin precursor Ubi-2 - maize >gi | 24 8 337 | bbs | 94 4 65 
(S94464) polyubiquitin (ubiquitin) [maize, Peptide, 533 aa] [Zea mays] 
Peptide, 533 aa] [Zea mays] 

- % Identity: 95.3 

- Alignment Length: 85 

- Location of Alignment in SEQ ID NO 234 6: from 1 to 80 

Maximum Length Sequence: 

related to: 
Clone IDs: 

325815 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2347 

- Ceres seq_id 1600820 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2348 

- Ceres seq_id 1600821 

- Location of start within SEQ ID NO 2347: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Cytochrome P4 50 

- Location within SEQ ID NO 2348: from 95 to 145 aa . 
(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2349 

- Ceres seq_id 1600822 

- Location of start within SEQ ID NO 2347: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Cytochrome P450 

- Location within SEQ ID NO 2349: from 48 to 103 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2350 

- Ceres seq_id 1600823 

- Location of start within SEQ ID NO 2347: at 97 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cytochrome P450 

- Location within SEQ ID NO 2350: from 63 to 113 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

324931 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2351 

- Ceres seq_id 1600835 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2352 

- Ceres seq_id 1600836 

- Location of start within SEQ ID NO 2351: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypept ide ( s ) 

- Zinc finger, C2H2 type 

- Location within SEQ ID NO 2352: from 54 to 75 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2353 

- Ceres seq_id 1600837 

- Location of start within SEQ ID NO 2351: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2354 

- Ceres seq_id 1600838 

- Location of start within SEQ ID NO 2351: at 148 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Zinc finger, C2H2 type 

- Location within SEQ ID NO 2354: from 5 to 26 aa . 
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(Dp) Related Amino Acid Sequences 



Maximum Length Sequence : 

related to: 
Clone IDs: 

326685 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2355 

- Ceres seq_id 1600844 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2356 

- Ceres seq_id 1600845 

- Location of start within SEQ ID NO 2355: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 2356: from 59 to 126 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2357 

- Ceres seq_id 1600846 

- Location of start within SEQ ID NO 2355: at 154 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 2357: from 8 to 75 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

329608 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2358 

- Ceres seq_id 1600851 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2359 

- Ceres seq_id 1600852 

- Location of start within SEQ ID NO 2358: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2360 

- Ceres seq_id 1600853 

- Location of start within SEQ ID NO 2358: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1101 

- gi No. 1346559 
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- Description: DNA-BINDING PROTEIN MNB1A >gi | 2130126 | pir | | S66358 
DNA-binding protein MNBla - maize >gi j 517258 | emb | CAA4 6875 1 (X66076) DNA- 
binding protein [Zea mays] 

- % Identity: 90.2 

- Alignment Length: 104 

- Location of Alignment in SEQ ID NO 2360: from 40 to 140 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2361 

- Ceres seq_id 1600854 

- Location of start within SEQ ID NO 2358: at 91 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

329612 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2362 

- Ceres seq_id 1600855 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2363 

- Ceres seq_id 1600856 

- Location of start within SEQ ID NO 2362: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1102 

- gi No. 548492 

- Description: EXO POLYGALACTURONASE PRECURSOR (EXOPG) { PECTINASE) 
( GAL AC T U RAN 1 , 4-ALPHA-GALACTURONI DASE) >gi i 62 98 53 | pir | IS30066 

polygalacturonase - maize >gi | 28 8 37 9 i emb | CAA4 575 1 | (X64408) polygalacturonase 
[Zea mays] 

- % Identity: 94 

- Alignment Length: 117 

- Location of Alignment in SEQ ID NO 2363: from 19 to 106 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2364 

- Ceres seq_id 1600857 

- Location of start within SEQ ID NO 2362: at 27 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2365 

- Ceres seq_id 1600858 

- Location of start within SEQ ID NO 2362: at 48 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 
related to: 
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Clone IDs: 

329620 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2366 

- Ceres seq_id 1600859 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2367 

- Ceres seq__id 1600860 

- Location of start within SEQ ID NO 2366: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2368 

- Ceres seq_id 1600861 

- Location of start within SEQ ID NO 2366: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2369 

- Ceres seq_id 1600862 

- Location of start within SEQ ID NO 2366: at 284 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- LIM domain containing proteins 

- Location within SEQ ID NO 2369: from 12 to 57 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1103 

- gi No. 4914322 

- Description: (AC005489) F14N23.8 [Arabidopsis thaliana] 

- % Identity: 89.1 

- Alignment Length: 55 

- Location of Alignment in SEQ ID NO 2369: from 4 to 57 

Maximum Length Sequence: 

related to: 
Clone IDs: 

329758 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2370 

- Ceres seq_id 1600868 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2371 

- Ceres seq_id 1600869 

- Location of start within SEQ ID NO 2370: at 305 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1104 

- gi No. 3122060 

- Description: ELONGATION FACTOR 1 -ALPHA (EF-1 -ALPHA) 

>gi | 2598657 | emb 1 CAA10847 i (AJ222579) elongation factor 1-alpha (EFl-a) [Vicia 
faba] 
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- % Identity: 100 

- Alignment Length: 4 9 

- Location of Alignment in SEQ ID NO 2371: from 1 to 48 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2372 

- Ceres seq_id 1600870 

- Location of start within SEQ ID NO 2370: at 348 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

329844 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2373 

- Ceres seq_id 1600885 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 237 4 

- Ceres seq__id 1600886 

- Location of start within SEQ ID NO 2373: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2375 

- Ceres seq__id 1600887 

- Location of start within SEQ ID NO 2373: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1105 

- gi No. 422850 

- Description: homeobox protein H6 - human 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 2375: from 16 to 29 

Maximum Length Sequence: 

related to: 
Clone IDs: 

329897 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2376 

- Ceres seq_id 1600896 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2377 

- Ceres seq_id 1600897 

- Location of start within SEQ ID NO 2376: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- metallopeptidase family M24 

- Location within SEQ ID NO 2377: from 1 to 93 aa . 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 110 6 

- gi No. 4006893 

- Description: (Z99708) aminopeptidase-like protein [Arabidopsis 

thaliana] 

- % Identity: 80.8 

- Alignment Length: 104 

- Location of Alignment in SEQ ID NO 2377: from 1 to 104 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2378 

- Ceres seq_id 1600898 

- Location of start within SEQ ID NO 2376: at 180 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1107 

- gi No. 4006893 

- Description: (Z99708) aminopeptidase-like protein [Arabidopsis 

thaliana] 

- % Identity: 80.8 

- Alignment Length: 104 

- Location of Alignment in SEQ ID NO 2378: from 1 to 45 

Maximum Length Sequence: 

related to: 
Clone IDs: 

329931 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 237 9 

- Ceres seq__id 1600899 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2380 

- Ceres seq_id 1600900 

- Location of start within SEQ ID NO 237 9: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ubiquit in-conjugating enzyme 

- Location within SEQ ID NO 2380: from 25 to 168 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1108 

- gi No. 2668744 

- Description: (AF034946) ubiquitin conjugating enzyme [Zea mays] 

- % Identity: 99.3 

- Alignment Length: 14 8 

- Location of Alignment in SEQ ID NO 2380: from 25 to 172 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 23 81 

- Ceres seq__id 1600901 

- Location of start within SEQ ID NO 2379: at 74 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ubiquitin-conjugating enzyme 

- Location within SEQ ID NO 2381: from 1 to 144 aa. 



(Dp) Related Amino Acid Sequences 
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- Alignment No. 110 9 

- gi No. 2668744 

- Description: (AF034946) ubiquitin conjugating enzyme [Zea mays] 

- % Identity: 99.3 

- Alignment Length: 14 8 

- Location of Alignment in SEQ ID NO 2381: from 1 to 148 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2382 

- Ceres seq_id 1600902 

- Location of start within SEQ ID NO 237 9: at 161 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ubiquit in-conjugating enzyme 

- Location within SEQ ID NO 2382: from 1 to 115 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1110 

- gi No. 2668744 

- Description: (AF034946) ubiquitin conjugating enzyme [Zea mays] 

- % Identity: 99.3 

- Alignment Length: 148 

Location of Alignment in SEQ ID NO 2382: from 1 to 119 



Maximum Length Sequence: 

related to: 
Clone IDs: 

329932 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2383 

- Ceres seq_id 1600903 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2384 

- Ceres seq_id 1600904 

- Location of start within SEQ ID NO 2383: at 3 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1111 

- gi No. 5360221 

- Description: (AB011262) nuclear transport factor 2 (NTF2) [Oryza 

sativa] 

- % Identity: 83.3 

- Alignment Length: 90 

- Location of Alignment in SEQ ID NO 2384: from 60 to 149 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2385 

- Ceres seq_id 1600905 

- Location of start within SEQ ID NO 2383: at 174 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1112 

- gi No. 5360221 

- Description: (AB011262) nuclear transport factor 2 (NTF2) [Oryza 

sativa] 

- % Identity: 83.3 
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- Alignment Length: 90 

- Location of Alignment in SEQ ID NO 2385: from 3 to 92 

(B } Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2386 

- Ceres seq_id 1600906 

- Location of start within SEQ ID NO 2383: at 180 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1113 

- gi No. 5360221 

- Description: (AB011262) nuclear transport factor 2 (NTF2) [Oryza 

sativa] 

- % Identity: 83.3 

- Alignment Length: 90 

- Location of Alignment in SEQ ID NO 238 6: from 1 to 90 

Maximum Length Sequence: 

related to: 
Clone IDs: 

330274 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2387 

- Ceres seq_id 1600932 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2388 

- Ceres seq_id 1600933 

- Location of start within SEQ ID NO 2387: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2389 

- Ceres seq_id 1600934 

- Location of start within SEQ ID NO 2387: at 190 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1114 

- gi No. 600771 

- Description: (L35844) G protein alpha subunit [Oryza sativa] 

- % Identity: 70.6 

- Alignment Length: 34 

- Location of Alignment in SEQ ID NO 2389: from 11 to 44 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2390 

- Ceres seq_id 1600935 

- Location of start within SEQ ID NO 2387: at 220 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1115 

- gi No. 600771 

- Description: (L35844) G protein alpha subunit [Oryza sativa] 
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- % Identity: 70.6 

- Alignment Length: 34 

- Location of Alignment in SEQ ID NO 2390: from 1 to 34 



Maximum Length Sequence: 

related to: 
Clone IDs: 

330499 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2391 

- Ceres seq_id 1600949 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2392 

- Ceres seq_id 1600950 

- Location of start within SEQ ID NO 2391: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- 7 transmembrane receptor {rhodopsin family) 

- Location within SEQ ID NO 2392: from 15 to 120 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2393 

- Ceres seq_id 1600951 

- Location of start within SEQ ID NO 2391: at 101 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

330524 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2394 

- Ceres seq_id 1600952 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2395 

- Ceres seq_id 1600953 

- Location of start within SEQ ID NO 2394: at 29 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Polygalacturonase (pectinase) 

- Location within SEQ ID NO 2395: from 68 to 154 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1116 

- gi No. 548493 

- Description: EXOPOLYGALACTURONASE PRECURSOR (EXOPG) (PECTINASE) 
(GALACTURAN 1 , 4 -ALPHA-GALACTURONI DASE ) >gi t 62 9854 | pir | [S30067 

polygalacturonase - maize >gi i 288 612 i emb | CAA47 052 | (X66422) polygalacturonase 



[Zea mays] 



- % Identity: 85.4 

- Alignment Length: 158 

- Location of Alignment in SEQ ID NO 2395: from I to 154 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2396 

- Ceres seq_ici 1600954 

- Location of start within SEQ ID NO 2394: at 50 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Polygalacturonase (pectinase) 

- Location within SEQ ID NO 2396: from 61 to 147 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1117 

- gi No. 548493 

- Description: EXOPOLYGALACTURONASE PRECURSOR (EXOPG) (PECTINASE) 
(GALACTURAN 1 , 4-ALPHA-GALACTURONIDASE ) >gi | 62 98 5 4 | pir | IS30067 

polygalacturonase - maize >gi | 28 8 612 | emb j CAA4 7 0 5 2 | (X66422) polygalacturonase 
[Zea mays] 

- % Identity: 85.4 

- Alignment Length: 15 8 

- Location of Alignment in SEQ ID NO 2396: from 1 to 147 

Maximum Length Sequence: 

related to: 
Clone IDs: 

330528 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2397 

- Ceres seq_id 1600955 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2398 

- Ceres seq_id 1600956 

- Location of start within SEQ ID NO 2397: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2399 

- Ceres seq_id 1600957 

- Location of start within SEQ ID NO 2397: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2399: from 48 to 143 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2400 

- Ceres seq_id 1600958 

- Location of start within SEQ ID NO 2397: at 50 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2400: from 32 to 127 aa . 

(Dp) Related Amino Acid Sequences 



Maximum Length Sequence: 
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related to: 
Clone IDs: 

330599 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2401 

- Ceres seq_id 1600963 
{ B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2402 

- Ceres seq_id 1600964 

- Location of start within SEQ ID NO 2401: at 215 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- WD domain, G-beta repeat 

- Location within SEQ ID NO 2402: from 44 to 81 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1118 

- gi No. 2281093 

- Description: (AC002333) beta transducin isolog [Arabidopsis 

thaliana] 

- % Identity: 70.6 

- Alignment Length: 85 

- Location of Alignment in SEQ ID NO 2402: from 8 to 91 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2403 

- Ceres seq_id 1600965 

- Location of start within SEQ ID NO 2401: at 242 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- WD domain, G-beta repeat 

- Location within SEQ ID NO 2403: from 35 to 72 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1119 

- gi No. 2281093 

- Description: (AC002333) beta transducin isolog [Arabidopsis 

thaliana] 

- % Identity: 70.6 

- Alignment Length: 8 5 

- Location of Alignment in SEQ ID NO 2403: from 1 to 82 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2404 

- Ceres seq_id 1600966 

- Location of start within SEQ ID NO 2401: at 269 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- WD domain, G-beta repeat 

- Location within SEQ ID NO 2404: from 26 to 63 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1120 

- gi No. 2281093 

- Description: (AC002333) beta transducin isolog [Arabidopsis 

thaliana] 

- % Identity: 70.6 

- Alignment Length: 85 
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- Location of Alignment in SEQ ID NO 2404: from 1 to 73 

Maximum Length Sequence : 

related to: 
Clone IDs: 

330624 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2405 

- Ceres seq__id 1600975 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2406 

- Ceres seq_id 1600976 

- Location of start within SEQ ID NO 2405: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1121 

- gi No. 1762949 

- Description: (U66271) ORF; able to induce HR-like lesions 
[Nicotiana tabacum] 

- % Identity: 89.5 

- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 2406: from 54 to 72 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2407 

- Ceres seq_id 1600977 

- Location of start within SEQ ID NO 24 05: at 24 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1122 

- gi No. 1762949 

- Description: (U66271) ORF; able to induce HR-like lesions 
[Nicotiana tabacum] 

- % Identity: 89.5 

- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 2407: from 47 to 65 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2408 

- Ceres seq_id 1600978 

- Location of start within SEQ ID NO 2405: at 162 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1123 

- gi No. 1762949 

- Description: (U66271) ORF; able to induce HR-like lesions 
[Nicotiana tabacum] 

- % Identity: 89.5 

- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 2408: from 1 to 19 

Maximum Length Sequence : 

related to: 
Clone IDs: 

330717 
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(Ac) cDNA Polynucleotide Sequence 
- Pat. Appln. SEQ ID NO 2409 
» Ceres seq_id 1600993 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2410 

- Ceres seq_id 1600994 

- Location of start within SEQ ID NO 2409: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2411 

- Ceres seq_id 1600995 

- Location of start within SEQ ID NO 2409: at 123 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S8 

- Location within SEQ ID NO 2411: from 6 to 85 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2412 

- Ceres seq_id 1600996 

- Location of start within SEQ ID NO 2409: at 159 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S8 

- Location within SEQ ID NO 2412: from I to 73 aa. 



Maximum Length Sequence: 

related to: 
Clone IDs: 

330781 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2413 

- Ceres seq_id 1601004 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2414 

- Ceres seq_id 1601005 

- Location of start within SEQ ID NO 2413: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 112 4 

- gi No. 121950 

- Description: HISTONE HI >gi ! 2232 1 i emb 1 CAA4 03 62 | (X57077) HI 



histone [Zea mays] 

- % Identity: 94.8 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 2414: from 32 to 108 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2415 



(Dp) Related Amino Acid Sequences 
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- Ceres seq_id 1601006 

- Location of start within SEQ ID NO 2413: at 96 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1125 

- gi No. 121950 

- Description: HISTONE HI >gi 1 22321 I emb I CAA40362 | (X57077) HI 
histone [Zea mays] 

- % Identity: 94.8 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 2415: from 1 to 77 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2416 

- Ceres seq_id 1601007 

- Location of start within SEQ ID NO 2413: at 136 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- linker histone HI and H5 family 

- Location within SEQ ID NO 2416: from 53 to 104 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

330855 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2 417 

- Ceres seq_id 1601012 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2418 

- Ceres seq_id 1601013 

- Location of start within SEQ ID NO 2417: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 112 6 

- gi No. 1350720 

- Description: 60S RIBOSOMAL PROTEIN L32 

- % Identity: 82 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 2418: from 35 to 145 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2419 

- Ceres seq_id 1601014 

- Location of start within SEQ ID NO 2417: at 99 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1127 

- gi No. 1350720 

- Description: 60S RIBOSOMAL PROTEIN L32 

- % Identity: 82 

- Alignment Length: 111 
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- Location of Alignment in SEQ ID NO 2419: from 3 to 113 

Maximum Length Sequence: 

related to: 
Clone IDs: 

330858 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2420 

- Ceres seq__id 1601015 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2421 

- Ceres seq_id 1601016 

- Location of start within SEQ ID NO 2420: at 1 nt, 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2422 

- Ceres seq_id 1601017 

- Location of start within SEQ ID NO 2420: at 152 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 112 8 

- gi No. 2499334 

- Description: NADH -UBIQUINONE OXIDOREDUCTASE 11 KD SUBUNIT 
(COMPLEX I-11KD) (CI-11KD) 

- % Identity: 83.3 

- Alignment Length: 18 

- Location of Alignment in SEQ ID NO 2422: from 2 to 19 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2423 

- Ceres seq_id 1601018 

- Location of start within SEQ ID NO 2420: at 164 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1129 

- gi No. 2499334 

- Description: NADH- UBIQUINONE OXIDOREDUCTASE 11 KD SUBUNIT 
(COMPLEX I-11KD) (CI-11KD) 

- % Identity: 83.3 

- Alignment Length: 18 

- Location of Alignment in SEQ ID NO 2423: from 1 to 15 

Maximum Length Sequence: 

related to: 
Clone IDs: 

331317 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2424 

- Ceres seq__id 1601029 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2425 

- Ceres seq_id 1601030 

- Location of start within SEQ ID NO 2424: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1130 

- gi No. 3098458 

- Description: (AF040712) 60S ribosomal protein L37A [Cryptochiton 

stelleri] 

- % Identity: 81.8 

- Alignment Length: 33 

- Location of Alignment in SEQ ID NO 2425: from 31 to 63 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2426 

- Ceres seq_id 1601031 

- Location of start within SEQ ID NO 2424: at 131 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2427 

- Ceres seq_id 1601032 

- Location of start within SEQ ID NO 2424: at 143 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 
^related to: 
Clone IDs: 

331369 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2428 

- Ceres seq_id 1601040 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2429 

- Ceres seq_id 1601041 

- Location of start within SEQ ID NO 2428: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Thioredoxin 

- Location within SEQ ID NO 2429: from 55 to 160 aa. 

Related Amino Acid Sequences 
Alignment No. 1131 
gi No. 99991 

Description: protein disulf ide-isomerase (EC 5.3.4.1) - alfalfa 
( fragments ) 
% Identity: 88.2 
Alignment Length: 17 

Location of Alignment in SEQ ID NO 2429: from 7 8 to 94 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2430 

- Ceres seq_id 1601042 

- Location of start within SEQ ID NO 2428: at 59 nt . 



(Dp) 



(clone GI) 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Thioredoxin 

- Location within SEQ ID NO 2430: from 36 to 141 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1132 

- gi No. 99991 

- Description: protein disulf ide-isomerase (EC 5.3.4.1) - alfali 
(clone GI) (fragments) 

- % Identity: 88.2 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 2430: from 59 to 75 



Maximum Length Sequence: 

related to: 
Clone IDs: 

331380 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2431 

- Ceres seq_id 1601051 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2432 

- Ceres seq_id 1601052 

- Location of start within SEQ ID NO 2431: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2433 

- Ceres seq__id 1601053 

- Location of start within SEQ ID NO 2431: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2434 
~ Ceres seq_id 1601054 

- Location of start within SEQ ID NO 2431: at 110 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Sugar {and other) transporter 

- Location within SEQ ID NO 2434: from 22 to 92 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1133 

- gi No. 2160188 

- Description: (AC000132) Similar to Vicia sucrose transport 
protein (gb|Z93774). [Arabidopsis thaliana] 

- % Identity: 71.6 

- Alignment Length: 98 

- Location of Alignment in SEQ ID NO 2434: from 2 to 92 



Maximum Length Sequence : 

related to: 
Clone IDs: 
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331422 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2435 

- Ceres seq_id 1601061 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2436 

- Ceres seq_id 1601062 

- Location of start within SEQ ID NO 2435: at 106 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1134 

- gi No. 4204270 

- Description: (AC005223) branched-chain alpha-keto acid 
decarboxylase El beta subunit [Arabidopsis thaliana] 

- % Identity: 74.6 

- Alignment Length: 119 

- Location of Alignment in SEQ ID NO 2436: from 14 to 130 

Maximum Length Sequence: 

related to: 
Clone IDs: 

331444 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2437 

- Ceres seq_id 1601064 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2438 

- Ceres seq_id 1601065 

- Location of start within SEQ ID NO 2437: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1135 

- gi No. 3122234 

- Description: EUKARYOTIC TRANSLATION INITIATION FACTOR 2 BETA 
SUBUNIT (EIF-2-BETA) (P38) >gi I 2306768 (U87163) eIF-2 beta subunit [Triticum 
aestivum] 

- % Identity: 72.9 

- Alignment Length: 85 

- Location of Alignment in SEQ ID NO 2438: from 50 to 133 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2439 

- Ceres seq__id 1601066 

- Location of start within SEQ ID NO 2437: at 149 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1136 

- gi No. 3122234 

- Description: EUKARYOTIC TRANSLATION INITIATION FACTOR 2 BETA 
SUBUNIT ( EIF— 2-BETA) (P38) >gi 12306768 (U87163) eIF-2 beta subunit [Triticum 
aestivum] 

- % Identity: 72.9 

- Alignment Length: 85 

- Location of Alignment in SEQ ID NO 2439: from 1 to 84 
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Maximum Length Sequence : 

related to: 
Clone IDs: 

331533 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2440 

- Ceres seq_id 1601078 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2441 

- Ceres seq_id 1601079 

- Location of start within SEQ ID NO 2440: at 100 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ser/Thr protein phosphatase 

- Location within SEQ ID NO 2441: from 6 to 131 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1137 

- gi No. 130709 

- Description: SERINE /THREONINE PROTEIN PHOSPHATASE PPl 

>gi | 322875 Ipir | | S29317 phosphoprotein phosphatase (EC 3.1.3.16} 1 - maize 
>gi|168723 (M60215) protein phosphatase-1 [Zea mays] >gi I 44558 6 I prf M 1909338A 
protein phosphatase 1 [Zea mays] 

- % Identity: 80.3 

- Alignment Length: 132 

- Location of Alignment in SEQ ID NO 2441: from 1 to 131 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2442 

- Ceres seq__id 1601080 

- Location of start within SEQ ID NO 2440: at 185 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

331626 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2443 

- Ceres seq_id 1601089 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 24 4 4 

- Ceres seq_id 1601090 

- Location of start within SEQ ID NO 2443: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2445 

- Ceres seq_id 1601091 

- Location of start within SEQ ID NO 2443: at 193 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- SRF-type transcription factor (DNA-binciing and dimerisation 

domain) 

- Location within SEQ ID NO 2445: from 1 to 59 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1138 

- gi No. 5295986 

- Description: (AB003326) MADS box-like protein [Oryza sativa] 

- % Identity: 81.5 

- Alignment Length: 92 

- Location of Alignment in SEQ ID NO 2445: from 1 to 92 

Maximum Length Sequence: 

related to: 
Clone IDs: 

331654 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2446 

- Ceres seq_id 1601092 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2447 

- Ceres seq_id 1601093 

- Location of start within SEQ ID NO 2446: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1139 

- gi No. 66616 

- Description: glutathione transferase (EC 2.5.1.18) III (version 
1) - maize >gi ! 22280 | emb J CAA28053 | (X04455) GSTIII (aa 1-220) [Zea mays] 
>gi | 22319 I emb | CAA27 957 | (X04375) GSTIII (aa 1-220) [Zea mays] 

- % Identity: 80.8 

- Alignment Length: 2 6 

- Location of Alignment in SEQ ID NO 2447: from 27 to 52 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2448 

- Ceres seq__id 1601094 

- Location of start within SEQ ID NO 2446: at 79 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 114 0 

- gi No. 66616 

- Description: glutathione transferase (EC 2.5.1.18) III (version 
1) - maize >gi I 2228 0 I emb i CAA28053 | (X04455) GSTIII (aa 1-220) [Zea mays] 
>gi ! 22319 j emb | CAA27 957 | (X04375) GSTIII (aa 1-220) [Zea mays] 

- % Identity: 80.8 

- Alignment Length: 2 6 

- Location of Alignment in SEQ ID NO 2448: from 1 to 26 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2449 

- Ceres seq_id 1601095 

- Location of start within SEQ ID NO 2446: at 103 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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- Alignment No. 1141 

- gi No* 66616 

- Description: glutathione transferase (EC 2.5.1.18) III (version 
1) - maize >gi 1 22280 i emb [ CAA28 053 | (X04455) GSTIII (aa 1-220) [Zea mays] 
>gif22319|embiCAA27957| (X04375) GSTIII (aa 1-220) [Zea mays] 

- % Identity: 80.8 

- Alignment Length: 2 6 

- Location of Alignment in SEQ ID NO 244 9: from 1 to 18 

Maximum Length Sequence : 

related to: 
Clone IDs: 

332032 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2450 

- Ceres seq_id 1601118 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2451 

- Ceres seq_id 1601119 

- Location of start within SEQ ID NO 2450: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1142 

- gi No. 1666173 

- Description: (Y09106) transcription factor [Nicotiana 
plumbagini folia] 

- % Identity: 74.2 

- Alignment Length: 128 

- Location of Alignment in SEQ ID NO 2451: from 44 to 170 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2452 

- Ceres seq_id 1601120 

- Location of start within SEQ ID NO 2450: at 131 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 114 3 

- gi No. 1666173 

- Description: (Y09106) transcription factor [Nicotiana 
plumbagini folia] 

- % Identity: 74.2 

- Alignment Length: 128 

- Location of Alignment in SEQ ID NO 2452: from 1 to 127 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2453 

- Ceres seq_id 1601121 

- Location of start within SEQ ID NO 2450: at 155 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 114 4 

- gi No. 1666173 

- Description: (Y09106) transcription factor [Nicotiana 
plumbagini f olia] 

- % Identity: 74.2 



Table 1 
Page 511 



Attorney Docket No. 2750-1237P Table 1 

Client Docket No. 80146.003 Page 512 

- Alignment Length: 128 

- Location of Alignment in SEQ ID NO 2453: from 1 to 119 

Maximum Length Sequence: 

related to: 
Clone IDs: 

332203 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2454 

- Ceres seq_id 1601122 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2455 

- Ceres seq_id 1601123 

- Location of start within SEQ ID NO 2454: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 2455: from 62 to 154 aa. 
{Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2456 

- Ceres seq_id 1601124 

- Location of start within SEQ ID NO 2454: at 140 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 2456: from 16 to 108 aa. 

{Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

332204 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2457 

- Ceres seq_id 1601125 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2458 

- Ceres seq_id 1601126 

- Location of start within SEQ ID NO 2457: at 1 nt . 

{C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Bacterial mutT protein 

- Location within SEQ ID NO 2458: from 93 to 132 aa. 

{Dp) Related Amino Acid Sequences 

- Alignment No. 1145 

- gi No. 2564253 

- Description: (Z99996) diadenosine 5 1 , 5 ' 1 ' -PI, P4-tetraphosphate 
hydrolase [Hordeum vulgare] 

- % Identity: 70 

- Alignment Length: 130 

- Location of Alignment in SEQ ID NO 2458: from 14 to 143 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2459 

- Ceres seq_id 1601127 

- Location of start within SEQ ID NO 2457: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Collagen triple helix repeat (20 copies) 

- Location within SEQ ID NO 2459: from 3 to 49 aa . 



(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2460 

- Ceres seq_id 1601128 

- Location of start within SEQ ID NO 2457: at 31 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Bacterial mutT protein 

- Location within SEQ ID NO 24 60: from 83 to 122 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 114 6 

- gi No. 2564253 

- Description: (Z99996) diadenosine 5 ' , 5 1 T ' -PI , P4-tetraphosphate 
hydrolase [Hordeum vulgare] 

- % Identity: 7 0 

- Alignment Length: 130 

- Location of Alignment in SEQ ID NO 24 60: from 4 to 133 



Maximum Length Sequence : 

related to: 
Clone IDs: 

332362 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 24 61 

- Ceres seq_Id 1601141 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2462 

- Ceres seq_id 1601142 

- Location of start within SEQ ID NO 24 61: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2463 

- Ceres seq_id 1601143 

- Location of start within SEQ ID NO 24 61: at 70 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 2463: from 4 to 121 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1147 

- gi No. 1154954 

- Description: (X94693) histone H2A [Triticum aestivum] 

- % Identity: 96.9 
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- Alignment Length: 130 

- Location of Alignment in SEQ ID NO 24 63: from 1 to 128 

Maximum Length Sequence: 

related to: 
Clone IDs: 

332381 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2464 

- Ceres seq_id 1601144 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2465 

- Ceres seq__id 1601145 

- Location of start within SEQ ID NO 24 64: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- short chain dehydrogenase 

- Location within SEQ ID NO 2465: from 95 to 154 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 24 66 

- Ceres seq_id 1601146 

- Location of start within SEQ ID NO 2464 : at 186 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- short chain dehydrogenase 

- Location within SEQ ID NO 2466: from 34 to 93 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

332440 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2467 

- Ceres seq_id 1601155 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 24 68 

- Ceres seq_id 1601156 

- Location of start within SEQ ID NO 24 67: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2469 

- Ceres seq_id 1601157 

- Location of start within SEQ ID NO 2467: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1148 

- gi No. 2134211 

- Description: protamine II-5 - painted turtle 
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- % Identity: 78.6 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 2469: from 80 to 93 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2470 

- Ceres seq__id 1601158 

- Location of start within SEQ ID NO 24 67: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

332522 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2471 

- Ceres seq_id 1601162 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2472 

- Ceres seq_id 1601163 

- Location of start within SEQ ID NO 2471: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide {s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2473 

- Ceres seq__id 1601164 

- Location of start within SEQ ID NO 2471: at 253 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Tubulin 

- Location within SEQ ID NO 2473: from 1 to 52 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1149 

- gi No. 135399 

- Description: TUBULIN ALPHA- 1 CHAIN >gi I 100716 I pir || S20758 tubulin 
alpha-1 chain - rice >gi I 2037 9 i emb | CAA77 98 8 i (Z11931) alpha 1 tubulin [Oryza 
sativa] >gi i 1136124 | emb | CAA62 918 | (X91808) alfa-tubulin [Oryza sativa] 

- % Identity: 100 

- Alignment Length: 52 

- Location of Alignment in SEQ ID NO 2473: from 1 to 52 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2474 

- Ceres seq_id 1601165 

- Location of start within SEQ ID NO 2471: at 350 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1150 

- gi No. 135399 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 

- Description: TUBULIN ALPHA- 1 CHAIN >gi i 10071 6 I pir [| S20758 tubulin 
alpha-1 chain - rice >gi | 2037 9 | ertib I CAA77 98 8 1 (Z11931) alpha 1 tubulin [Oryza 
sativa] >gi j 1136124 | emb j CAA62918 I (X91808) alfa-tubulin [Oryza sativa] 

- % Identity: 97.1 

- Alignment Length: 35 

- Location of Alignment in SEQ ID NO 2474: from 21 to 54 

Maximum Length Sequence: 

related to: 
Clone IDs: 

332563 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2475 

- Ceres seq_id 1601168 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 247 6 

- Ceres seq__id 1601169 

- Location of start within SEQ ID NO 2475: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2477 

- Ceres seq_id 1601170 

- Location of start within SEQ ID NO 2475: at 98 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DEAD/ DEAH box helicase 

- Location within SEQ ID NO 2477: from 47 to 136 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1151 

- gi No. 1170507 

- Description: EUKARYOTIC INITIATION FACTOR 4A-3 (EIF-4A-3) 

>gi I 100276 ipir t | S22579 translation initiation factor eIF-4A - curled-leaved 
tobacco >gi | 19699 iembi CAA43514 | (X61206) nicotiana eukaryotic translation 
initiation factor 4A [Nicotiana 

- % Identity: 80.2 

- Alignment Length: 121 

- Location of Alignment in SEQ ID NO 2477: from 17 to 136 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2478 

- Ceres seq_id 1601171 

- Location of start within SEQ ID NO 2475: at 146 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 2478: from 31 to 120 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1152 

- gi No. 1170507 

- Description: EUKARYOTIC INITIATION FACTOR 4A-3 (EIF-4A-3) 

>gi I 100276 Ipir | | S22579 translation initiation factor eIF-4A - curled-leaved 
tobacco >gi | 1 9 6 9 9 | emb I CAA4 3 5 1 4 | (X61206) nicotiana eukaryotic translation 
initiation factor 4A [Nicotiana 
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- % Identity: 80.2 

- Alignment Length: 121 

- Location of Alignment in SEQ ID NO 2478: from 1 to 120 



Maximum Length Sequence: 

related to: 
Clone IDs: 

332589 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2479 

- Ceres seq__id 1601183 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 24 8 0 

- Ceres seq_id 1601184 

- Location of start within SEQ ID NO 2479: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2481 

- Ceres seq_id 1601185 

- Location of start within SEQ ID NO 2479: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2482 

- Ceres seq_id 1601186 

- Location of start within SEQ ID NO 2479: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1153 

- gi No. 1330351 

- Description: (U58755) C34D4.1 gene product [Caenorhabditis 

elegans ] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2482: from 4 6 to 56 



Maximum Length Sequence : 

related to: 
Clone IDs: 

332598 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2483 

- Ceres seq_id 1601191 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2484 

- Ceres seq_id 1601192 

- Location of start within SEQ ID NO 2483: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 
- Alignment No. 115 4 
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- gi No. 465445 

- Description: PROBABLE NUCLEAR ANTIGEN >gi i 4 18708 | pir | IB45344 
probable nuclear antigen - suid herpesvirus 1 (strain Kaplan) >gi i 334072 
(M34651) ORF-3 protein [ Pseudorabies virus] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 2484: from 26 to 37 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2485 

- Ceres seq_id 1601193 

- Location of start within SEQ ID NO 2483: at 398 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1155 

- gi No. 2239260 

- Description: (Y13734) cinnamoyl CoA reductase [Zea mays] 

- % Identity: 99.3 

- Alignment Length: 140 

- Location of Alignment in SEQ ID NO 2485: from 1 to 31 

Maximum Length Sequence : 

related to: 
Clone IDs: 

332962 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 24 8 6 

- Ceres seq_id 1601207 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2487 

- Ceres seq_id 1601208 

- Location of start within SEQ ID NO 2486: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2488 

- Ceres seq_id 1601209 

- Location of start within SEQ ID NO 2486: at 70 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 2488: from 4 to 57 aa. 



(Dp) Related Amino Acid Sequences 



Maximum Length Sequence: 

related to: 
Clone IDs: 

333025 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2489 

- Ceres seq_id 1601210 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 24 90 

- Ceres seq_id 1601211 
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- Location of start within SEQ ID NO 2489: at 114 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L24e 

- Location within SEQ ID NO 2490: from 3 to 73 aa. 

{Dp) Related Amino Acid Sequences 

- Alignment No. 115 6 

- gi No. 1710521 

- Description: 60S RIBOSOMAL PROTEIN L24 >gi I 1154859 | emb ! CAA63960 | 
(X94296) L24 ribosomal protein [Hordeum vulgare] 

- % Identity: 94.9 

- Alignment Length: 138 

- Location of Alignment in SEQ ID NO 2490: from 1 to 138 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2491 

- Ceres seq_id 1601212 

- Location of start within SEQ ID NO 2489: at 279 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1157 

- gi No. 1710521 

- Description: 60S RIBOSOMAL PROTEIN L24 >gi j 1154 85 9 1 emb ! CAA63960 | 
(X94296) L24 ribosomal protein [Hordeum vulgare] 

- % Identity: 94.9 

- Alignment Length: 138 

- Location of Alignment in SEQ ID NO 24 91: from 1 to 83 



Maximum Length Sequence: 

related to: 
Clone IDs: 

333252 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 24 92 

- Ceres seq_id 1601224 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 4 93 

- Ceres seq_id 1601225 

- Location of start within SEQ ID NO 2492: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2493: from 28 to 154 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1158 

- gi No. 2358287 

- Description: (AF010404) ALR [Homo sapiens] 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 2493: from 145 to 157 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2494 

- Ceres seq_id 1601226 

- Location of start within SEQ ID NO 2492: at 3 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp} Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2495 

- Ceres seq_id 1601227 

- Location of start within SEQ ID NO 2492: at 114 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

333283 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2496 

- Ceres seq_id 1601232 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 24 97 

- Ceres seq_id 1601233 

- Location of start within SEQ ID NO 2496: at 112 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 115 9 

- gi No. 3550985 

- Description: (AB010740) OsSSa [Oryza sativa] 

- % Identity: 89.5 

- Alignment Length: 143 

- Location of Alignment in SEQ ID NO 2497: from 1 to 143 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2498 

- Ceres seq_id 1601234 

- Location of start within SEQ ID NO 2496: at 130 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1160 

- gi No. 3550985 

- Description: (AB010740) OsS5a [Oryza sativa] 

- % Identity: 89.5 

- Alignment Length: 14 3 

- Location of Alignment in SEQ ID NO 2498: from 1 to 137 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 24 99 

- Ceres seq__id 1601235 

- Location of start within SEQ ID NO 2496: at 157 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1161 

- gi No. 3550985 
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- Description: (AB010740) OsS5a [Oryza sativa] 

- % Identity: 89.5 

- Alignment Length: 143 

- Location of Alignment in SEQ ID NO 24 99: from 1 to 128 



Maximum Length Sequence: 

related to: 
Clone IDs: 

333570 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2500 

- Ceres seq_id 1601272 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2501 

- Ceres seq_id 1601273 

- Location of start within SEQ ID NO 2500: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2502 

- Ceres seq_id 1601274 

- Location of start within SEQ ID NO 2500: at 150 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EF hand 

- Location within SEQ ID NO 2502: from 48 to 76 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1162 

- gi No. 115511 

- Description: CALMODULIN >gi | 231682 | sp | P2 9612 | CALM__ORYSA 
CALMODULIN >gi I 71682 | pir M MCBH calmodulin - barley >gi i 100666 j pir \ \ S24 952 
calmodulin 1 (clone lambda DASH) - rice >gi I 20188 | emb | CAA78287 | (Z12827) 
calmodulin [Oryza sativa] 

- % Identity: 100 

- Alignment Length: 7 9 

- Location of Alignment in SEQ ID NO 2502: from 1 to 78 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 25 03 

- Ceres seq_id 1601275 

- Location of start within SEQ ID NO 2500: at 258 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EF hand 

- Location within SEQ ID NO 2503: from 12 to 40 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1163 

- gi No. 115511 

- Description: CALMODULIN >gi 1 231682 j sp | P29612 | CALMJORYSA 
CALMODULIN >gi I 7 1 68 2 | pir | ! MCBH calmodulin - barley >gi 1 100666 i pir | 1 S24 952 
calmodulin 1 (clone lambda DASH) - rice >gi 1 20188 1 emb | CAA78287 | (Z12827) 
calmodulin [Oryza sativa] 

- % Identity: 100 

- Alignment Length: 7 9 
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- Location of Alignment in SEQ ID NO 2503; from 1 to 42 



Maximum Length Sequence: 

related to: 
Clone IDs: 

333637 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2504 

- Ceres seq_id 1601276 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2505 

- Ceres seq_id 1601277 

- Location of start within SEQ ID NO 2504: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2506 

- Ceres seq_id 1601278 

- Location of start within SEQ ID NO 2504: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein family Lie 

- Location within SEQ ID NO 2506: from 72 to 136 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1164 

- gi No. 3914700 

- Description: PROBABLE 60S RIBOSOMAL PROTEIN LI >gi 11947110 
(AF000196) Similar to ribosomal protein LI; coded for by C. elegans cDNA 
CEMSF45F; coded for by C. elegans cDNA cmllcS; coded for by C. for by C. 
elegans cDNA yk82h2.3; coded ... 

- % Identity: 76.7 

- Alignment Length: 60 

- Location of Alignment in SEQ ID NO 2506: from 72 to 131 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2507 

- Ceres seq_id 1601279 

- Location of start within SEQ ID NO 2504: at 47 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein family Lie 

- Location within SEQ ID NO 2507: from 57 to 121 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1165 

- gi No. 3914700 

- Description: PROBABLE 60S RIBOSOMAL PROTEIN LI >gi| 1947110 
(AF000196) Similar to ribosomal protein LI; coded for by C. elegans cDNA 
CEMSF45F; coded for by C. elegans cDNA cmllcS; coded for by C. for by C. 
elegans cDNA yk82h2.3; coded ... 

- % Identity: 76.7 

- Alignment Length: 60 

- Location of Alignment in SEQ ID NO 2507: from 57 to 116 



Maximum Length Sequence : 
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related to: 
Clone IDs: 

333643 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2508 

- Ceres seq__id 1601280 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2509 

- Ceres seq_id 1601281 

- Location of start within SEQ ID NO 2508: at 236 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 2509: from 6 to 74 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1166 

- gi No. 4006877 

- Description: (Z99707) RNA-binding like protein [Arabidopsis 

thaliana] 

- % Identity: 75.9 

- Alignment Length: 58 

- Location of Alignment in SEQ ID NO 2509: from 1 to 58 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2510 

- Ceres seq_id 1601282 

- Location of start within SEQ ID NO 2508: at 242 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 2510: from 4 to 72 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1167 

- gi No. 4006877 

- Description: (Z99707) RNA-binding like protein [Arabidopsis 

thaliana] 

- % Identity: 75.9 

- Alignment Length: 5 8 

- Location of Alignment in SEQ ID NO 2510: from 1 to 56 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2511 

- Ceres seq_id 1601283 

- Location of start within SEQ ID NO 2508: at 302 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 2511: from 1 to 52 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1168 

- gi No. 4006877 

- Description: (Z99707) RNA-binding like protein [Arabidopsis 

thaliana] 

- % Identity: 75.9 

- Alignment Length: 58 
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- Location of Alignment in SEQ ID NO 2511: from 1 to 36 



Maximum Length Sequence: 

related to: 
Clone IDs: 

333746 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2512 

- Ceres seq_id 1601288 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2513 

- Ceres seq_id 1601289 

- Location of start within SEQ ID NO 2512: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s } 

- Mov34 family 

- Location within SEQ ID NO 2513: from 59 to 163 aa. 
(Dp) Related Amino Acid Sequences 



- Alignment No. 1169 

- gi No. 5031981 

- Description: ref 1 NP_005796 . 1 [ pPOHl 1 26S proteasome-associated 



padl homolog >gi I 1923256 (U86782) 26S proteasome-associated padl homolog 
[Homo sapiens] 

- % Identity: 86.3 

- Alignment Length: 131 

- Location of Alignment in SEQ ID NO 2513: from 36 to 163 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2514 

- Ceres seq_id 1601290 

- Location of start within SEQ ID NO 2512: at 108 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Mov34 family 

- Location within SEQ ID NO 2514: from 24 to 128 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 117 0 

- gi No. 5031981 

- Description: ref | NP__0057 96 . 1 [ pPOHl I 26S proteasome-associated 
padl homolog >gi 11923256 (U86782) 26S proteasome-associated padl homolog 

[Homo sapiens] 

- % Identity: 86.3 

- Alignment Length: 131 

- Location of Alignment in SEQ ID NO 2514: from 1 to 128 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2515 

- Ceres seq_id 1601291 

- Location of start within SEQ ID NO 2512: at 144 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Mov34 family 

- Location within SEQ ID NO 2515: from 12 to 116 aa. 



(Dp) Related Amino Acid Sequences 
- Alignment No. 1171 
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- gi No. 5031981 

- Description: ref | NPJD057 96 . 1 | pPOHl | 26S proteasome-associated 
padl homolog >gi 11923256 (U86782) 26S proteasome-associated padl homolog 
[Homo sapiens] 

- % Identity: 86.3 

- Alignment Length: 131 

- Location of Alignment in SEQ ID NO 2515: from 1 to 116 

Maximum Length Sequence : 

related to: 
Clone IDs: 

333790 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2516 

- Ceres seq_id 1601294 
<B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2517 

- Ceres seq_id 1601295 

- Location of start within SEQ ID NO 2516: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s } 

(Dp) Related Amino Acid Sequences 

- Alignment No. 117 2 

- gi No. 1711036 

- Description: (U78952) hydroxyproline rich glycoprotein PsHRGPl 
[Pisum sativum] 

- % Identity: 81.4 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 2517: from 44 to 160 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2518 

- Ceres seq_id 1601296 

- Location of start within SEQ ID NO 2516: at 133 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1173 

- gi No. 1711036 

- Description: (U7 8 952) hydroxyproline rich glycoprotein PsHRGPl 
[Pisum sativum] 

- % Identity: 81.4 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 2518: from 1 to 116 

Maximum Length Sequence: 

related to: 
Clone IDs: 

334021 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2519 

- Ceres seq_id 1601303 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2520 

- Ceres seq__id 1601304 

- Location of start within SEQ ID NO 2519: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 
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- Sm protein 

- Location within SEQ ID NO 2520: from 49 to 114 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1174 

- gi No. 4263519 

- Description: (AC004044) small nuclear riboprotein Sm-Dl 
[Arabidopsis thaliana] 

- % Identity: 92.1 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 2520: from 45 to 156 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2521 

- Ceres seq_id 1601305 

- Location of start within SEQ ID NO 2519: at 133 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sm protein 

- Location within SEQ ID NO 2521: from 5 to 70 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1175 

- gi No. 4263519 

- Description: (AC004044) small nuclear riboprotein Sm-Dl 
[Arabidopsis thaliana] 

- % Identity: 92.1 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 2521: from 1 to 112 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2522 

- Ceres seq__id 1601306 

- Location of start within SEQ ID NO 2519: at 154 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Sm protein 

- Location within SEQ ID NO 2522: from 1 to 63 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 117 6 

- gi No. 4263519 

- Description: (AC004044) small nuclear riboprotein Sm-Dl 
[Arabidopsis thaliana] 

- % Identity: 92.1 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 2522: from 1 to 105 

Maximum Length Sequence: 

related to: 
Clone IDs: 

334204 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2523 

- Ceres seq_id 1601317 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2524 

- Ceres seq_id 1601318 

- Location of start within SEQ ID NO 2523: at 1 nt . 
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<C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2525 

- Ceres seq_id 1601319 

- Location of start within SEQ ID NO 2523: at 113 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ser/Thr protein phosphatase 

- Location within SEQ ID NO 2525: from 8 to 91 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 117 7 

- gi No. 4530611 

- Description: (AF134552) serine /threonine protein phosphatase 
PP2A-2 catalytic subunit [Oryza sativa subsp. indica] 

- % Identity: 98.9 

- Alignment Length: 92 

- Location of Alignment in SEQ ID NO 2525: from 1 to 91 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2526 

- Ceres seq__id 1601320 

- Location of start within SEQ ID NO 2523: at 158 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ser/Thr protein phosphatase 

- Location within SEQ ID NO 2526: from 1 to 76 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 117 8 

- gi No* 4530611 

- Description: (AF134552) serine/threonine protein phosphatase 
PP2A-2 catalytic subunit [Oryza sativa subsp. indica] 

- % Identity: 98.9 

- Alignment Length: 92 

- Location of Alignment in SEQ ID NO 2526: from 1 to 76 

Maximum Length Sequence: 

related to: 
Clone IDs: 

334309 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2527 

- Ceres seq_id 1601325 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2528 

- Ceres seq_id 1601326 

- Location of start within SEQ ID NO 2527: at 89 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Acetyltransf erase (GNAT) family 

- Location within SEQ ID NO 2528: from 4 to 113 aa. 



(Dp) Related Amino Acid Sequences 
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- Alignment No. 117 9 

- gi No. 728880 

- Description: N-TERMINAL ACETYLTRANSFERASE COMPLEX ARD1 SUBUN1T 
HOMOLOG >gi|517485|emb|CAA54691[ (X77588) ARD1 N-acetyl transferase homologue 
[Homo sapiens] >gi ! 1302661 (U52112) ARD1 N-acetyl transferase related protein 
[Homo sapiens] 

- % Identity: 72.7 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 2528: from 4 to 113 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2529 

- Ceres seq__id 1601327 

- Location of start within SEQ ID NO 2527: at 131 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Acetyltransferase (GNAT) family 

- Location within SEQ ID NO 2529: from 1 to 99 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 118 0 

- gi No. 728880 

- Description: N-TERMINAL ACETYLTRANSFERASE COMPLEX ARDl SUBUNIT 
HOMOLOG >gi 1517485 | emblCAA54691i (X77588) ARDl N-acetyl transferase homologue 
[Homo sapiens] >gi j 1302661 (U52112) ARDl N-acetyl transferase related protein 
[Homo sapiens] 

- % Identity: 72.7 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 252 9: from 1 to 99 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2530 

- Ceres seq_id 1601328 

- Location of start within SEQ ID NO 2527: at 149 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Acetyltransferase (GNAT) family 

- Location within SEQ ID NO 2530: from 1 to 93 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1181 

- gi No. 728880 

- Description: N-TERMINAL ACETYLTRANSFERASE COMPLEX ARDl SUBUNIT 
HOMOLOG >gi| 517485 |emb!CAA54691 I (X77588) ARDl N-acetyl transferase homologue 
[Homo sapiens] >gi 11302661 (U52112) ARDl N-acetyl transferase related protein 
[Homo sapiens] 

- % Identity: 72.7 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 2530: from 1 to 93 



Maximum Length Sequence: 

related to: 
Clone IDs: 

334318 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2531 

- Ceres seq_id 1601333 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2532 
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- Ceres seq_id 1601334 

- Location of start within SEQ ID NO 2531: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2533 

- Ceres seq_id 1601335 

- Location of start within SEQ ID NO 2531: at 66 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Pathogenesis-related protein Bet v I family 

- Location within SEQ ID NO 2533: from 1 to 122 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

334350 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2534 

- Ceres seq_id 1601340 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2535 

- Ceres seq_id 1601341 

- Location of start within SEQ ID NO 2534: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1182 

- gi No. 665960 

- Description: (U20496) ribosomal protein S20 homolog [Aeromonas 
hydrophila] 

- % Identity: 72.2 

- Alignment Length: 18 

- Location of Alignment in SEQ ID NO 2535: from 57 to 72 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2536 

- Ceres seq_id 1601342 

- Location of start within SEQ ID NO 2534: at 116 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2536: from 51 to 119 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1183 

- gi No. 4138732 

- Description: (Y17332) proline-rich protein [Zea mays] 

- % Identity: 96.7 

- Alignment Length: 120 

- Location of Alignment in SEQ ID NO 2536: from 1 to 119 
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(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2537 

- Ceres seq_id 1601343 

- Location of start within SEQ ID NO 2534: at 24 9 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

334533 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2538 

- Ceres seq_id 1601351 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2539 

- Ceres seq_id 1601352 

- Location of start within SEQ ID NO 2538: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 118 4 

- gi No. 2494041 

- Description: DIAMINOPIMELATE EPIMERASE >gi I 1653875 | dbj | BAA18785 | 
(D90917) diaminopimelate epimerase [ Synechocystis sp.] 

- % Identity: 70.8 

- Alignment Length: 8 9 

- Location of Alignment in SEQ ID NO 2539: from 74 to 162 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 254 0 

- Ceres seq_id 1601353 

- Location of start within SEQ ID NO 2538: at 24 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 118 5 

- gi No. 2494041 

- Description: DIAMINOPIMELATE EPIMERASE >gi i 1653875 | dbj | BAA18785 [ 
(D90917) diaminopimelate epimerase [Synechocystis sp.] 

- % Identity: 70.8 

- Alignment Length: 8 9 

- Location of Alignment in SEQ ID NO 2540: from 67 to 155 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2541 

- Ceres seq_id 1601354 

- Location of start within SEQ ID NO 2538: at 156 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 118 6 

- gi No. 2494041 

- Description: DIAMINOPIMELATE EPIMERASE >gi I 1653875 | dbj 1 BAA18785 j 
(D90917) diaminopimelate epimerase [Synechocystis sp . ] 

- % Identity: 70.8 

- Alignment Length: 8 9 
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- Location of Alignment in SEQ ID NO 2541: from 23 to 111 

Maximum Length Sequence: 

related to: 
Clone IDs: 

334640 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2542 

- Ceres seq_id 1601362 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2543 

- Ceres seq_id 1601363 

- Location of start within SEQ ID NO 2542: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2544 

- Ceres seq_id 1601364 

- Location of start within SEQ ID NO 2542: at 182 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2545 

- Ceres seq_id 1601365 

- Location of start within SEQ ID NO 2542: at 252 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1187 

- gi No. 3123264 

- Description: 60S RIBOSOMAL PROTEIN L27 

>gi I 2244857 | emb | CAB10279. 1 | (Z97337) ribosomal protein [Arabidopsis thaliana] 

- % Identity: 82.7 

- Alignment Length: 81 

- Location of Alignment in SEQ ID NO 2545: from 1 to 81 

Maximum Length Sequence: 

related to: 
Clone IDs: 

334748 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2546 

- Ceres seq_id 1601378 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2547 

- Ceres seq_id 1601379 

- Location of start within SEQ ID NO 254 6: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2548 
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- Ceres seq_id 1601380 

- Location of start within SEQ ID NO 2546: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin-conjugating enzyme 

- Location within SEQ ID NO 2548: from 62 to 116 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1188 

- gi No. 3834310 

- Description: (AC005679) Similar to Ubiquitin-conjugating enzyme 
E2-17 KD gb|D83004 from Homo sapiens. ESTs gblT88233, gbiZ24464, gbfN37265, 
gb|H36151, gb|Z34711, gb|AA040983, and gbjT22122 come from this gene. 
[Arabidopsis thaliana] 

- % Identity: 93.1 

- Alignment Length: 58 

- Location of Alignment in SEQ ID NO 2548: from 59 to 116 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 25 4 9 

- Ceres seq_id 1601381 

- Location of start within SEQ ID NO 2546: at 177 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin-conjugating enzyme 

- Location within SEQ ID NO 254 9: from 4 to 58 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1189 

- gi No. 3834310 

- Description: (AC005679) Similar to Ubiquitin-conjugating enzyme 
E2-17 KD gb|D83004 from Homo sapiens. ESTs gb|T88233, gbjZ24464, gb|N37265, 
gb|H36151, gbjZ34711, gbjAA040983, and gb|T22122 come from this gene. 
[Arabidopsis thaliana] 

- % Identity: 93.1 

- Alignment Length: 58 

- Location of Alignment in SEQ ID NO 254 9: from 1 to 58 

Maximum Length Sequence: 

related to: 
Clone IDs: 

334818 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2550 

- Ceres seq_id 1601386 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2551 

- Ceres seq_id 1601387 

- Location of start within SEQ ID NO 2550: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2552 

- Ceres seq_id 1601388 

- Location of start within SEQ ID NO 2550: at 2 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2553 

- Ceres seq_id 1601389 

- Location of start within SEQ ID NO 2550: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2553: from 2 to 109 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

334853 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2554 

- Ceres seq_id 1601394 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2555 

- Ceres seq_id 1601395 

- Location of start within SEQ ID NO 2554: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1190 

- gi No. 3046693 

- Description: (AL022140) receptor like protein (fragment) 
[Arabidopsis thaliana] 

- % Identity: 71.4 

- Alignment Length: 28 

- Location of Alignment in SEQ ID NO 2555: from 55 to 81 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 255 6 

- Ceres seq_id 1601396 

- Location of start within SEQ ID NO 2554: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2557 

- Ceres seq_id 1601397 

- Location of start within SEQ ID NO 2554: at 88 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1191 

- gi No. 3046693 

- Description: (AL022140) receptor like protein (fragment) 
[Arabidopsis thaliana] 

- % Identity: 71.4 
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- Alignment Length: 28 

- Location of Alignment in SEQ ID NO 2557: from 26 to 52 

Maximum Length Sequence : 

related to: 
Clone IDs: 

334919 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2558 

- Ceres seq_id 1601400 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2559 

- Ceres seq_id 1601401 

- Location of start within SEQ ID NO 2558: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- SCP-like extracellular protein 

- Location within SEQ ID NO 2559: from 74 to 132 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2560 

- Ceres seq_id 1601402 

- Location of start within SEQ ID NO 2558: at 17 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- SCP-like extracellular protein 

- Location within SEQ ID NO 2560: from 69 to 127 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 25 61 

- Ceres seq_id 1601403 

- Location of start within SEQ ID NO 2558: at 105 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

334987 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2562 

- Ceres seq_id 1601408 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 25 63 

- Ceres seq__id 1601409 

- Location of start within SEQ ID NO 2562: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- tRNA synthetases class I (R) 

- Location within SEQ ID NO 2563: from 1 to 127 aa . 



(Dp) Related Amino Acid Sequences 
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- Alignment No. 1192 

- gi No. 2632105 

- Description: (Z987 60) arginyl-tRNA synthetase [Arabidopsis 
thaliana] >gi 1 4539426 | emb | CAB38 95 9 . 1 | (AL049171) arginyl-tRNA synthetase 
[Arabidopsis thaliana] 

- % Identity: 74 

- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 2563: from 1 to 127 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2564 

- Ceres seq_id 1601410 

- Location of start within SEQ ID NO 2562: at 42 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- tRNA synthetases class I (R) 

- Location within SEQ ID NO 2564: from 1 to 114 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1193 

- gi No. 2632105 

- Description: (Z98760) arginyl-tRNA synthetase [Arabidopsis 
thaliana] >gi i 4539426 j emb j CAB38 959 . 1 ! (AL049171) arginyl-tRNA synthetase 
[Arabidopsis thaliana] 

- % Identity: 74 

- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 2564: from 1 to 114 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2565 

- Ceres seq_id 1601411 

- Location of start within SEQ ID NO 2562: at 63 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- tRNA synthetases class I (R) 

- Location within SEQ ID NO 2565: from 1 to 107 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1194 
« gi No. 2632105 

- Description: (Z98760) arginyl-tRNA synthetase [Arabidopsis 
thaliana] >gi ! 4539426 | emb | CAB38959 . 1 | (AL049171) arginyl-tRNA synthetase 
[Arabidopsis thaliana] 

- % Identity: 74 

- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 2565: from 1 to 107 

Maximum Length Sequence: 

related to: 
Clone IDs: 

334992 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 25 66 

- Ceres seq_id 1601416 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 25 67 

- Ceres seq_id 1601417 

- Location of start within SEQ ID NO 2566: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2567: from 35 to 120 aa . 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2568 

- Ceres seq_id 1601418 

- Location of start within SEQ ID NO 2566: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 25 69 

- Ceres seq_id 1601419 

- Location of start within SEQ ID NO 2566: at 123 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

334998 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2570 

- Ceres seq__id 1601420 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2571 

- Ceres seq_id 1601421 

- Location of start within SEQ ID NO 2570: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2572 

- Ceres seq__id 1601422 

- Location of start within SEQ ID NO 2570: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1195 

- gi No. 1869883 

- Description: (Z86099) RSI [human herpesvirus 2] 
>gi 1 1869897 | emb | CAB06701 I (Z86099) RSI [human herpesvirus 2] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2572: from 152 to 162 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2573 

- Ceres seq_id 1601423 

- Location of start within SEQ ID NO 2570: at 85 nt . 



Table 1 
Page 536 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 537 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

{Dp} Related Amino Acid Sequences 



Maximum Length Sequence: 

related to: 
Clone IDs: 

335025 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2574 

- Ceres seq__id 1601424 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2575 

- Ceres seq_id 1601425 

- Location of start within SEQ ID NO 2574: at 1 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- WD domain, G-beta repeat 

- Location within SEQ ID NO 2575: from 89 to 127 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1196 

- gi No. 1346109 

- Description: GUANINE NUCLEOT I DE-BINDING PROTEIN BETA SUBUNIT-LIKE 
PROTEIN (GPB-LR) (RWD) >gi | 540535 | dbj i BAA07404 i (D38231) RWD [Oryza sativa] 

- % Identity: 85.8 

- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 2575: from 25 to 150 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2576 

- Ceres seq_id 1601426 

- Location of start within SEQ ID NO 2574: at 73 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s) 

- WD domain, G-beta repeat 

- Location within SEQ ID NO 2576: from 65 to 103 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1197 

- gi No. 1346109 

- Description: GUANINE NUCLEOTIDE -BINDING PROTEIN BETA SUBUNIT-LIKE 
PROTEIN (GPB-LR) (RWD) >gi | 54 0535 ! db j | BAA07 4 04 | (D38231) RWD [Oryza sativa] 

- % Identity: 85.8 

- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 257 6: from 1 to 126 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2577 

- Ceres seq_id 1601427 

- Location of start within SEQ ID NO 2574: at 112 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- WD domain, G-beta repeat 

- Location within SEQ ID NO 2577: from 52 to 90 aa. 



(Dp) Related Amino Acid Sequences 
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- Alignment No. 1198 

- gi No. 1346109 

- Description: GUANINE NUCLEOTIDE -BINDING PROTEIN BETA SUBUNIT-LIKE 
PROTEIN (GPB-LR) (RWD) >gi | 540535 [ dfoj | BAA07404 i (D38231) RWD [Oryza sativa] 

- % Identity: 85.8 

- Alignment Length: 127 

- Location of Alignment in SEQ ID NO 2577: from 1 to 113 

Maximum Length Sequence: 

related to: 
Clone IDs: 

335043 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2578 

- Ceres seq__id 1601428 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2579 

- Ceres seq_id 1601429 

- Location of start within SEQ ID NO 2578: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2580 

- Ceres seq_id 1601430 

- Location of start within SEQ ID NO 257 8: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2581 

- Ceres seq_id 1601431 

- Location of start within SEQ ID NO 2578: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1199 

- gi No. 2494503 

- Description: HEPATOCYTE NUCLEAR FACTOR 3 FORKHEAD HOMOLOG 1 (HFH- 
1) >gi 1 2143790 ipir | | 160916 HNF-3/f orkhead homolog-1 - rat >git550513 (L13201) 
HNF-3/f orkhead homolog-1 [Rattus norvegicus] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2581: from 4 to 14 

Maximum Length Sequence: 

related to: 
Clone IDs: 

335084 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2582 

- Ceres seq_id 1601445 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2583 

- Ceres seq_id 1601446 

- Location of start within SEQ ID NO 2582: at 74 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1200 

- gi No. 99678 

- Description: chlorophyll a/b-binding protein type III - 
Arabidopsis thaliana (fragments) 

- % Identity: 81.8 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2583: from 4 9 to 59 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2584 

- Ceres seq_id 1601447 

- Location of start within SEQ ID NO 2582: at 89 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1201 

- gi No. 99678 

- Description: chlorophyll a/b-binding protein type III - 
Arabidopsis thaliana (fragments) 

- % Identity: 81.8 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2584: from 44 to 54 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2585 

- Ceres seq_id 1601448 

- Location of start within SEQ ID NO 2582: at 38 6 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 02 

- gi No. 169061 

- Description: (M8 690 6) chlorophyll a/b-binding protein [Pisum 

sativum] 

- % Identity: 83.8 

- Alignment Length: 37 

- Location of Alignment in SEQ ID NO 2585: from 1 to 36 

Maximum Length Sequence: 

related to: 
Clone IDs: 

335166 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2586 

- Ceres seq_id 1601453 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2587 

- Ceres seq_id 1601454 

- Location of start within SEQ ID NO 2586: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2588 

- Ceres seq_id 1601455 

- Location of start within SEQ ID NO 2586: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1203 

- gi No. 4506613 

- Description: ref [ NP_00 0 97 4 . 1 j pRPL22 1 ribosomal protein L22 

>gi | 4 64628 i sp | P35268 | RL22_HUMAN 60S RIBOSOMAL PROTEIN L22 ( EPSTEIN-BARR VIRUS 
SMALL RNA ASSOCIATED PROTEIN) { EBER ASSOCIATED PROTEIN} (EAP) (HEPARIN 
BINDING PROTEIN HBP15) HBpl5/L22 [Homo sapiens] 

- % Identity: 72.4 

- Alignment Length: 9 9 

- Location of Alignment in SEQ ID NO 2588: from 43 to 140 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2589 

- Ceres seq_id 1601456 

- Location of start within SEQ ID NO 2586: at 84 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1204 

- gi No. 4506613 

- Description: ref | NP_000 97 4 . 1 | pRPL22 | ribosomal protein L22 

>gi | 4 64 628 | sp | P352 68 j RL22_HUMAN 60S RIBOSOMAL PROTEIN L22 (EPSTEIN-BARR VIRUS 
SMALL RNA ASSOCIATED PROTEIN) (EBER ASSOCIATED PROTEIN) (EAP) (HEPARIN 
BINDING PROTEIN HBP15) HBpl5/L22 [Homo sapiens] 

- % Identity: 72.4 

- Alignment Length: 99 

- Location of Alignment in SEQ ID NO 2589: from 16 to 113 

Maximum Length Sequence: 

related to: 
Clone IDs: 

335179 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2590 

- Ceres seq_id 1601457 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2591 

- Ceres seq_id 1601458 

- Location of start within SEQ ID NO 2590: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1205 

- gi No. 1143705 

- Description: (X89760) Hox2a [Zea mays] 

- % Identity: 98.8 

- Alignment Length: 163 

- Location of Alignment in SEQ ID NO 2591: from 1 to 162 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2592 

- Ceres seq_id 1601459 

- Location of start within SEQ ID NO 2590: at 77 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1206 

- gi No. 1143705 

- Description: (X89760) Hox2a [Zea mays] 

- % Identity: 98.8 

- Alignment Length: 163 

- Location of Alignment in SEQ ID NO 2592: from 1 to 137 

Maximum Length Sequence: 

related to: 
Clone IDs: 

335373 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2593 

- Ceres seq_id 1601476 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2594 

- Ceres seq__id 1601477 

- Location of start within SEQ ID NO 2593: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 25 95 

- Ceres seq_id 1601478 

- Location of start within SEQ ID NO 2593: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2596 

- Ceres seq_Id 1601479 

- Location of start within SEQ ID NO 2593: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- DnaJ domain 

- Location within SEQ ID NO 2596: from 92 to 159 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

335381 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2597 

- Ceres seq__id 1601480 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2598 

- Ceres seq_id 1601481 

- Location of start within SEQ ID NO 2597: at 98 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Phosphoribosyl pyrophosphate synthetase 

- Location within SEQ ID NO 2598: from 83 to 130 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 07 

- gi No. 4902849 

- Description: (AJ00 694 0) phosphoribosyl pyrophosphate synthase 
[Spinacia oleracea] 

- % Identity: 88.9 

- Alignment Length: 5 4 

- Location of Alignment in SEQ ID NO 2598: from 78 to 130 

Maximum Length Sequence: 

related to: 
Clone IDs: 

335569 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2599 

- Ceres seq_id 1601483 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2600 

- Ceres seq_id 1601484 

- Location of start within SEQ ID NO 2599: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2601 

- Ceres seq_id 1601485 

- Location of start within SEQ ID NO 2599: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 2601: from 27 to 134 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1208 

- gi No. 462234 

- Description: HISTONE H2A >gi | 4 1 97 4 1 j pir 1 j S30155 histone H2A - 
Norway spruce >gi S 297871 [ emb | CAA48030 | (X67819) histone H2A [Picea abies] 

- % Identity: 90.9 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 2601: from 26 to 134 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 602 

- Ceres seq_id 1601486 

- Location of start within SEQ ID NO 2599: at 77 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 2602: from 2 to 109 aa. 



(Dp) Related Amino Acid Sequences 
- Alignment No. 12 0 9 
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- gi No. 462234 

- Description: HISTONE H2A >gi | 419741 Ipir | | S30155 histone H2A - 
Norway spruce >gi I 29787 1 1 emb | CAA48030 | (X67819) histone H2A [Picea abies] 

- % Identity: 90.9 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 2602: from 1 to 109 

Maximum Length Sequence: 

related to: 
Clone IDs: 

335697 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2603 

- Ceres seq_id 1601495 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2604 

- Ceres seq__id 1601496 

- Location of start within SEQ ID NO 2603: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1210 

- gi No. 400879 

- Description: PHOTOS YSTEM I REACTION CENTRE SUBUNIT PSAN PRECURSOR 
(PSI-N) >gi 1 479690 jpir | 1S35159 photosystem I chain psaN - barley 

>gi I 19095 | emb | CAA47056 ! (X66428) photosystem I subunit N [Hordeum vulgare] 

- % Identity: 75.2 

- Alignment Length: 130 

- Location of Alignment in SEQ ID NO 2604: from 32 to 159 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2605 

- Ceres seq__id 1601497 

- Location of start within SEQ ID NO 2603: at 95 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1211 

- gi No. 400879 

- Description: PHOTOSYSTEM I REACTION CENTRE SUBUNIT PSAN PRECURSOR 
(PSI-N) >gi!479690ipir|jS35159 photosystem I chain psaN - barley 

>gi | 19095 | emb ! CAA47056 | (X66428) photosystem I subunit N [Hordeum vulgare] 

- % Identity: 75.2 

- Alignment Length: 130 

- Location of Alignment in SEQ ID NO 2605: from 1 to 128 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2606 

- Ceres seq_id 1601498 

- Location of start within SEQ ID NO 2603: at 167 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1212 

- gi No. 400879 

- Description: PHOTOSYSTEM I REACTION CENTRE SUBUNIT PSAN PRECURSOR 
(PSI-N) >gi i 479690 Ipir | 1 S35159 photosystem I chain psaN - barley 

>gi i 19095 | emb | CAA47056 ! (X66428) photosystem I subunit N [Hordeum vulgare] 
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- % Identity: 75.2 

- Alignment Length: 130 

- Location of Alignment in SEQ ID NO 2606: from 1 to 104 



Maximum Length Sequence: 

related to: 
Clone IDs: 

335824 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2607 

- Ceres seq__id 1601499 
{ B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2608 

- Ceres seq_id 1601500 

- Location of start within SEQ ID NO 2607: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2609 

- Ceres seq_id 1601501 

- Location of start within SEQ ID NO 2607: at 270 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 1213 

- gi No. 4539677 

- Description: (AF061282) patatin-like protein [Sorghum bicolor] 

- % Identity: 94.1 

- Alignment Length: 68 

- Location of Alignment in SEQ ID NO 2609: from 1 to 67 



Maximum Length Sequence: 

related to: 
Clone IDs: 

335869 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2610 

- Ceres seq_id 1601502 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2611 

- Ceres seq_id 1601503 

- Location of start within SEQ ID NO 2610: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 2611: from 102 to 166 aa . 
(Dp) Related Amino Acid Sequences 



- Alignment No. 1214 

- gi No. 2341061 

- Description: (U73459) translational initiation factor eIF-4A [Zea 



- % Identity: 100 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 2 611: from 4 9 to 166 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2612 

- Ceres seq_id 1601504 

- Location of start within SEQ ID NO 2610: at 145 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 2612: from 54 to 118 aa. 
(Dp) Related Amino Acid Sequences 



- Alignment No. 1215 

- gi No. 2341061 

- Description: (U73459) translational initiation factor eIF-4A [Zea 



- % Identity: 100 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 2612: from 1 to 118 



(B) Polypeptide Sequence 

» Pat. Appln. SEQ ID NO 2613 

- Ceres seq__id 1601505 

- Location of start within SEQ ID NO 2610: at 154 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- DEAD/DEAH box helicase 

- Location within SEQ ID NO 2613: from 51 to 115 aa. 
(Dp) Related Amino Acid Sequences 



- Alignment No. 1216 

- gi No. 2341061 

- Description: (U73459) translational initiation factor eIF-4A [Zea 



- % Identity: 100 

- Alignment Length: 118 

- Location of Alignment in SEQ ID NO 2613: from 1 to 115 



Maximum Length Sequence : 

related to: 
Clone IDs: 

335893 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2614 

- Ceres seq_id 1601510 
(B) Polypeptide Sequence 



- Pat. Appln. SEQ ID NO 2615 

- Ceres seq_id 1601511 

- Location of start within SEQ ID NO 2614: at 107 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ubiquitin family 

- Location within SEQ ID NO 2615: from 1 to 76 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1217 

- gi No. 82733 

- Description: ubiquitin fusion protein UBF9 - maize >gi 1168651 
(M68937) ubiquitin fusion protein [Zea mays] >gi[ 902527 (U29161) ubiquitin 



mays] 
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fusion protein [Zea mays] >gi \ 1 58 938 8 | prf | |2211240B ubiquitin fusion protein 
[Zea mays] 

- % Identity: 99.1 

- Alignment Length: 116 

- Location of Alignment in SEQ ID NO 2 615: from 1 to 116 

Maximum Length Sequence: 

related to: 
Clone IDs: 

335977 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2616 

- Ceres seq__id 1601516 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2617 

- Ceres seq_id 1601517 

- Location of start within SEQ ID NO 2616: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

- PX domain 

- Location within SEQ ID NO 2617: from 57 to 135 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2618 

- Ceres seq_id 1601518 

- Location of start within SEQ ID NO 2616: at 108 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- PX domain 

- Location within SEQ ID NO 2618: from 22 to 100 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2619 

- Ceres seq_id 1601519 

- Location of start within SEQ ID NO 2616: at 201 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypept ide ( s ) 

- PX domain 

- Location within SEQ ID NO 2619: from 1 to 69 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336051 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2620 

- Ceres seq_id 1601523 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2621 

- Ceres seq_id 1601524 

- Location of start within SEQ ID NO 2620: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L44 

- Location within SEQ ID NO 2621: from 43 to 119 aa. 

{Dp) Related Amino Acid Sequences 

- Alignment No. 1218 

- gi No. 2500380 

- Description: 60S RIBOSOMAL PROTEIN L44 >gi j 211 912 8 i pir M JC4 923 
ribosomal protein RL44 - upland cotton >gi| 1553129 {U64677) ribosomal protein 
L44 isoform a [Gossypium hirsutum] >gi | 1553131 (U64678) ribosomal protein L44 
isoform b [Gossypium hirsutum] 

- % Identity: 95.2 

- Alignment Length: 105 

- Location of Alignment in SEQ ID NO 2621: from 25 to 129 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2622 

- Ceres seq_id 1601525 

- Location of start within SEQ ID NO 2620: at 73 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Ribosomal protein L4 4 

- Location within SEQ ID NO 2622: from 19 to 95 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1219 

- gi No. 2500380 

- Description: 60S RIBOSOMAL PROTEIN L44 >gi j 2 119128 | pir M JC4 923 
ribosomal protein RL44 - upland cotton >gi I 1553129 (U64677) ribosomal protein 
L44 isoform a [Gossypium hirsutum] >gi i 1553131 (U64678) ribosomal protein L44 
isoform b [Gossypium hirsutum] 

- % Identity: 95.2 

- Alignment Length: 105 

- Location of Alignment in SEQ ID NO 2622: from 1 to 105 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336053 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2623 

- Ceres seq_id 1601526 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2624 

- Ceres seq_id 1601527 

- Location of start within SEQ ID NO 2623: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2625 

- Ceres seq_id 1601528 

- Location of start within SEQ ID NO 2623: at 103 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

- Adhesion lipoprotein 
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- Location within SEQ ID NO 2625: from 20 to 87 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1220 

- gi No. 1644232 

- Description: (D67066) N-WASP 

- % Identity: 71.4 

- Alignment Length: 21 

- Location of Alignment in SEQ 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2626 

- Ceres seq_id 1601529 

- Location of start within SEQ ID NO 2623: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Adhesion lipoprotein 

- Location within SEQ ID NO 2626: from 15 to 82 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1221 

- gi No. 1644232 

- Description: (D67066) N-WASP [Bos taurus] 

- % Identity: 71.4 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 2626: from 60 to 79 

Maximum Length Sequence : 

related to: 
Clone IDs: 

336112 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2627 

- Ceres seq_id 1601536 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 62 8 

- Ceres seq_id 1601537 

- Location of start within SEQ ID NO 2627: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2629 

- Ceres seq_id 1601538 

- Location of start within SEQ ID NO 2627: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1222 

- gi No. 123696 

- Description: SPERM PROTAMINE PI (CYSTEINE- RICH PROTAMINE) 
>gi I 348569 Ipir M S22582 protamine 1 - Saguinus imperator 

>gi I 4494091 1 emb [CAA43853. 1 i (X61678) protamine 1 [Saguinus imperator] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 262 9: from 54 to 64 
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[Bos taurus] 

ID NO 2625: from 65 to 84 
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{B} Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 630 

- Ceres seq_id 1601539 

- Location of start within SEQ ID NO 2627: at 165 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s } 

- Xanthine/uracil permeases family 

- Location within SEQ ID NO 2630: from 42 to 108 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

336122 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2 631 

- Ceres seq_id 1601540 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2632 

- Ceres seq__id 1601541 

- Location of start within SEQ ID NO 2631: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2633 

- Ceres seq__id 1601542 

- Location of start within SEQ ID NO 2631: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Aldehyde dehydrogenase 

- Location within SEQ ID NO 2633: from 55 to 138 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 634 

- Ceres seq_id 1601543 

- Location of start within SEQ ID NO 2631: at 78 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Aldehyde dehydrogenase 

- Location within SEQ ID NO 2634: from 30 to 113 aa . 



Maximum Length Sequence: 

related to: 
Clone IDs: 

336189 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2635 

- Ceres seq_id 1601547 
(B) Polypeptide Sequence 



(Dp) Related Amino Acid Sequences 



- Pat. Appln. SEQ ID NO 2636 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 550 



- Ceres seq__id 1601548 

- Location of start within SEQ ID NO 2635: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 7 transmembrane receptor {rhodopsin family) 

- Location within SEQ ID NO 2636: from 75 to 142 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2637 

- Ceres seq_id 1601549 

- Location of start within SEQ ID NO 2635: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2638 

- Ceres seq__id 1601550 

- Location of start within SEQ ID NO 2635: at 83 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2638: from 48 to 115 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336221 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2639 

- Ceres seq_id 1601559 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 64 0 

- Ceres seq_id 1601560 

- Location of start within SEQ ID NO 2639: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2641 

- Ceres seq__id 1601561 

- Location of start within SEQ ID NO 2639: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2642 

- Ceres seq_id 1601562 

- Location of start within SEQ ID NO 2639: at 184 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1223 

- gi No. 119958 

- Description: FERREDOXIN III PRECURSOR (FD III) >gi 1168473 
(M73831) ferredoxin [Zea mays] >gi | 1864001 i dbj ! BAA19251 | (AB001387) Fd III 
[Zea mays] >gi I 4 4 4 68 6 | prf | I 1 907324C ferredoxin : ISOTYPE=II I [Zea mays] 

- % Identity: 100 

- Alignment Length: 87 

- Location of Alignment in SEQ ID NO 2642: from 1 to 87 

Maximum Length Sequence : 

related to: 
Clone IDs: 

336233 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2 64 3 

- Ceres seq_id 1601567 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2644 

- Ceres seq_id 1601568 

- Location of start within SEQ ID NO 2643: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1224 

- gi No. 121950 

- Description: HISTONE HI >gi | 22321 \ emb | CAA40362 | (X57077) HI 
his tone [Zea mays] 

- % Identity: 95.2 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 2644: from 32 to 52 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 64 5 

- Ceres seq_id 1601569 

- Location of start within SEQ ID NO 2643: at 95 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1225 

- gi No. 121950 

- Description: HISTONE Hi >gi I 22321 i emb i CAA40362 1 (X57077) HI 
his tone [Zea mays] 

- % Identity: 95.2 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 2645: from 1 to 21 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 64 6 

- Ceres seq_id 1601570 

- Location of start within SEQ ID NO 2643: at 135 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypept ide ( s ) 

- linker histone HI and H5 family 

- Location within SEQ ID NO 2646: from 8 to 7 6 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 122 6 

- gi No. 121950 

- Description; HISTONE HI >gi | 22321 i emb ! CAA4 0362 1 (X57077) HI 
histone [Zea mays] 

- % Identity: 96.7 

- Alignment Length: 92 

- Location of Alignment in SEQ ID NO 2646: from 8 to 99 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336249 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2647 

- Ceres seq_id 1601571 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2648 

- Ceres seq_id 1601572 

- Location of start within SEQ ID NO 2647: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1227 

- gi No. 4902879 

- Description: (AJ006943) phosphoribosyl pyrophosphate synthas 
isozyme 4 [Spinacia oleracea] 

- % Identity: 81.7 

- Alignment Length: 12 0 

- Location of Alignment in SEQ ID NO 2648: from 40 to 159 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2649 

- Ceres seq_id 1601573 

- Location of start within SEQ ID NO 2647: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2650 

- Ceres seq_id 1601574 

- Location of start within SEQ ID NO 2647: at 82 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 122 8 

- gi No. 4902879 

- Description: (AJ006943) phosphoribosyl pyrophosphate synthas 
isozyme 4 [Spinacia oleracea] 

- % Identity: 81.7 

- Alignment Length: 12 0 

- Location of Alignment in SEQ ID NO 2650: from 13 to 132 

Maximum Length Sequence : 

related to: 
Clone IDs: 

336268 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2651 

- Ceres seq_id 1601575 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2652 

- Ceres seq_id 1601576 

- Location of start within SEQ ID NO 2651: at 179 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 6-phosphogluconate dehydrogenases 

- Location within SEQ ID NO 2652: from 12 to 8 6 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1229 

- gi No. 3925225 

- Description: (AF037030) 6-phosphogluconate dehydrogenase 
isoenzyme B [Zea mays] 

- % Identity: 100 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 2652: from 1 to 86 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2653 

- Ceres seq_id 1601577 

- Location of start within SEQ ID NO 2651: at 218 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 6-phosphogluconate dehydrogenases 

- Location within SEQ ID NO 2653: from 1 to 73 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1230 

- gi No. 3925225 

- Description: (AF037030) 6-phosphogluconate dehydrogenase 
isoenzyme B [Zea mays] 

- % Identity: 100 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 2653: from 1 to 73 



Maximum Length Sequence: 

related to: 
Clone IDs: 

336313 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2654 

- Ceres seq_id 1601582 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2655 

- Ceres seq_id 1601583 

- Location of start within SEQ ID NO 2654: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 31 

- gi No. 2384758 

- Description: (AF016896) GDP dissociation inhibitor protein OsGDIl 
[Oryza sativa] 

- % Identity: 100 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 554 



- Alignment Length: 23 

- Location of Alignment in SEQ ID NO 2655: from 38 to 60 

(B) Polypeptide Sequence 

~ Pat. Appln. SEQ ID NO 2656 

- Ceres seq_id 1601584 

- Location of start within SEQ ID NO 2654: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2657 

- Ceres seq_id 1601585 

- Location of start within SEQ ID NO 2654: at 270 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- GDP dissociation inhibitor 

- Location within SEQ ID NO 2657: from 3 to 72 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1232 

- gi No. 2384758 

- Description: (AF016896) GDP dissociation inhibitor protein OsGDIl 
[Oryza sativa] 

- % Identity: 94.4 

- Alignment Length: 71 

- Location of Alignment in SEQ ID NO 2657: from 3 to 72 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336402 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2658 

- Ceres seq_id 1601586 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2659 

- Ceres seq_id 1601587 

- Location of start within SEQ ID NO 2658: at 227 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1233 

- gi No. 2239260 

- Description: (Y13734) cinnamoyl CoA reductase [Zea mays] 

- % Identity: 100 

- Alignment Length: 85 

- Location of Alignment in SEQ ID NO 2659: from 1 to 85 



Maximum Length Sequence: 

related to: 
Clone IDs: 

336559 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2660 

- Ceres seq_id 1601596 
(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2661 

- Ceres seq_id 1601597 

- Location of start within SEQ ID NO 2660: at 8 9 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1234 

- gi No. 387050 

- Description: (M15825) nucleolin, C23 [Cricetulus griseus] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 2661: from 59 to 70 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336566 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2662 

- Ceres seq_id 1601598 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2663 

- Ceres seq_id 1601599 

- Location of start within SEQ ID NO 2662: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1235 

- gi No. 1170396 

- Description: SPERM PROTAMINE PI >gi i 598423 (L35450) protamine PI 
[Macropus eugenii] >gi I 1582116 I prf | | 2117429J protamine PI [Macropus eugenii] 

- % Identity: 70.6 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 2663: from 43 to 58 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2664 

- Ceres seq_id 1601600 

- Location of start within SEQ ID NO 2662: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2665 

- Ceres seq_id 1601601 

- Location of start within SEQ ID NO 2662: at 47 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336699 

(Ac) cDNA Polynucleotide Sequence 
- Pat. Appln. SEQ ID NO 2666 
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- Ceres seq__id 1601602 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2667 

- Ceres seq__id 1601603 

- Location of start within SEQ ID NO 2666: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 123 6 

- gi No. 2231312 

- Description: (U75603) AtRabl8 [Arabidopsis thaliana] 

- % Identity: 81 

- Alignment Length: 58 

- Location of Alignment in SEQ ID NO 2667: from 31 to 88 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2668 

- Ceres seq_id 1601604 

- Location of start within SEQ ID NO 2666: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336702 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2669 

- Ceres seq_id 1601605 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2670 

- Ceres seq_id 1601606 

- Location of start within SEQ ID NO 2669: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2671 

- Ceres seq_id 1601607 

- Location of start within SEQ ID NO 2669: at 101 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1237 

- gi No. 2493053 

- Description: ATP SYNTHASE EPSILON CHAIN, MITOCHONDRIAL >gi| 639793 
(L39120) mitochondrial FIFO ATP synthase epsilon subunit [Zea mays] 

- % Identity: 100 

- Alignment Length: 7 0 

- Location of Alignment in SEQ ID NO 2671: from 1 to 70 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2672 

- Ceres seq_id 1601608 

- Location of start within SEQ ID NO 2669: at 14 6 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1238 

- gi No. 2493053 

- Description: ATP SYNTHASE EPSILON CHAIN, MITOCHONDRIAL >gi 1639793 
(L39120) mitochondrial FIFO ATP synthase epsilon subunit [Zea mays] 

- % Identity: 100 

- Alignment Length: 7 0 

- Location of Alignment in SEQ ID NO 2672: from 1 to 55 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336720 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2673 

- Ceres seq_id 1601609 
<B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2674 

- Ceres seq__id 1601610 

- Location of start within SEQ ID NO 2673: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2675 

- Ceres seq_id 1601611 

- Location of start within SEQ ID NO 2673: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- NAD dependent epimerase/dehydratase family 

- Location within SEQ ID NO 2675: from 44 to 148 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 67 6 

- Ceres seq_id 1601612 

- Location of start within SEQ ID NO 2673: at 111 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- NAD dependent epimerase/dehydratase family 

- Location within SEQ ID NO 2676: from 8 to 112 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

336750 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2677 

- Ceres seq__id 1601616 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2678 
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- Ceres seq_id 1601617 

- Location of start within SEQ ID NO 2677: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1239 

- gi No. 1418321 

- Description: (X98669) C2H2 zinc finger protein [Arabidopsis 
thaliana] >gi 12317903 (U89959) C2H2 zinc finger protein [Arabidopsis 
thaliana] 

- % Identity: 81.8 

- Alignment Length: 22 

- Location of Alignment in SEQ ID NO 2678: from 150 to 170 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2679 

- Ceres seq__id 1601618 

- Location of start within SEQ ID NO 2677: at 141 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2680 

- Ceres seq_id 1601619 

- Location of start within SEQ ID NO 2677: at 207 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

330798 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2681 

- Ceres seq^id 1601636 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2682 

- Ceres seq__id 1601637 

- Location of start within SEQ ID NO 2681: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1240 

- gi No. 131039 

- Description: PROTAMINE Z (CLUPEINE Z) >gi 1 7 08 11 [ pir | | CLHRZ 
protamine Z - Pacific herring >gi I 70812 | pir | i CLHRZA protamine Z - Atlantic 
herring >gi I 99987 1 | pdb ! 7 INS I G Sus scrofa 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 2682: from 19 to 31 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2683 

- Ceres seq_id 1601638 

- Location of start within SEQ ID NO 2681: at 3 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1241 

- gi No. 85651 

- Description: protamine Z2 - striped bonito 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2683: from 58 to 68 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2684 

- Ceres seq_id 1601639 

- Location of start within SEQ ID NO 2681: at 170 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

330847 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2685 

- Ceres seq_id 1601640 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 68 6 

- Ceres seq_id 1601641 

- Location of start within SEQ ID NO 2685: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2687 

- Ceres seq_id 1601642 

- Location of start within SEQ ID NO 2685: at 127 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1242 

- gi No. 2501189 

- Description: THIAMINE BIOSYNTHETIC ENZYME 1-1 PRECURSOR 
>gi I 2130146 !pir 1 I S61419 thiamine biosynthetic enzyme thil-1 - maize 
>gi 1596078 (U17350) thiamine biosynthetic enzyme [Zea mays] 

- % Identity: 70.6 

- Alignment Length: 105 

- Location of Alignment in SEQ ID NO 2 687: from 1 to 58 

Maximum Length Sequence: 

related to: 
Clone IDs: 

331112 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2688 

- Ceres seq_id 1601653 
(B) Polypeptide Sequence 
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» Pat. Appln. SEQ ID NO 2689 

- Ceres seq_id 1601654 

- Location of start within SEQ ID NO 2688: at 1 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2690 

- Ceres seq_id 1601655 

- Location of start within SEQ ID NO 2688: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Protein phosphatase 2C 

- Location within SEQ ID NO 2690: from 90 to 150 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2691 

- Ceres seq_id 1601656 

- Location of start within SEQ ID NO 2688: at 44 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Protein phosphatase 2C 

- Location within SEQ ID NO 2691: from 76 to 136 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

331163 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2692 

- Ceres seq_id 1601657 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 6 93 

- Ceres seq_id 1601658 

- Location of start within SEQ ID NO 2692: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 694 

- Ceres seq_id 1601659 

- Location of start within SEQ ID NO 2692: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Mucin-like glycoprotein 

- Location within SEQ ID NO 2694: from 23 to 102 aa. 



(Dp) 



Related Amino Acid Sequences 

- Alignment No. 12 4 3 

- gi No. 423830 
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- Description: transactivator EBNA-2 - Herpesvirus papio HVP 
>gi 1306316 (L11366) EBNA2 gene product [Herpesvirus papio] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 2694: from 30 to 41 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2695 

- Ceres seq__id 1601660 

- Location of start within SEQ ID NO 2692: at 59 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

331955 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2696 

- Ceres seq_id 1601676 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2697 

- Ceres seq_id 1601677 

- Location of start within SEQ ID NO 2696: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 2697: from 28 to 123 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 12 4 4 

- gi No. 122106 

- Description: HISTONE H4 >gi | 7 077 1 j pir | | HSZM4 histone H4 - maize 
>gi I 81642 Ipir t i S06904 histone H4 - Arabidopsis thaliana 

>gi | 2119028 | pir ] I S60475 histone H4 - garden pea >gi I 21795 | emb j CAA24 924 | 
(X00043) histone H4 [Triticum aestivum] [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 103 

- Location of Alignment in SEQ ID NO 2697: from 27 to 129 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2698 

- Ceres seq_id 1601678 

- Location of start within SEQ ID NO 2696: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2699 

- Ceres seq_id 1601679 

- Location of start within SEQ ID NO 2696: at 7 9 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 2699: from 2 to 97 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 12 4 5 

- gi No. 122106 

- Description: HISTONE H4 >gi I 70771 | pir | i HSZM4 histone H4 - maize 
>gi | 81642 ipir | i S06904 histone H4 - Arabidopsis thaliana 

>gi I 2119028 Ipir i | S60475 histone H4 - garden pea >gi | 217 95 | emb | CAA24 924 | 
(X00043) histone H4 [Triticum aestivum] [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 103 

- Location of Alignment in SEQ ID NO 2699: from 1 to 103 



Maximum Length Sequence: 

related to: 
Clone IDs: 

332119 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2700 

- Ceres seq_id 1601684 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2701 

- Ceres seq_id 1601685 

- Location of start within SEQ ID NO 2700: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 4 6 

- gi No. 3758827 

- Description: (AJ011921) amino acid selective channel protein 
[Hordeum vulgare] 

- % Identity: 70 

- Alignment Length: 120 

- Location of Alignment in SEQ ID NO 2701: from 27 to 145 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2702 

- Ceres seq_id 1601686 

- Location of start within SEQ ID NO 2700: at 80 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1247 

- gi No. 3758827 

- Description: (AJ011921) amino acid selective channel protein 
[Hordeum vulgare] 

- % Identity: 70 

- Alignment Length: 120 

- Location of Alignment in SEQ ID NO 2702: from 1 to 119 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2703 

- Ceres seq_id 1601687 

- Location of start within SEQ ID NO 2700: at 140 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 124 8 

- gi No. 3758827 
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- Description: (AJ011921) amino acid selective channel protein 
[Hordeura vulgare] 

- % Identity: 70 

- Alignment Length: 120 

- Location of Alignment in SEQ ID NO 2703: from 1 to 99 

Maximum Length Sequence: 

related to: 
Clone IDs: 

332639 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2704 

- Ceres seq_id 1601688 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2705 

- Ceres seq_id 1601689 

- Location of start within SEQ ID NO 2704: at 120 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 124 9 

- gi No. 1914853 

- Description: (U92455) WW domain binding protein 7; WBP7 [Mus 

musculus] 

- % Identity: 72.7 

- Alignment Length : 11 

- Location of Alignment in SEQ ID NO 2705: from 94 to 104 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2706 

- Ceres seq_id 1601690 

- Location of start within SEQ ID NO 2704: at 168 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1250 

- gi No. 1914853 

- Description: (U92455) WW domain binding protein 7; WBP7 [Mus 

musculus] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2706: from 78 to 88 

Maximum Length Sequence: 

related to: 
Clone IDs: 

332650 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2707 

- Ceres seq__id 1601691 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2708 

- Ceres seq_id 1601692 

- Location of start within SEQ ID NO 2707: at 194 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s) 

- Peroxidase 

- Location within SEQ ID NO 2708: from 58 to 109 aa . 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 1251 

- gi No. 1321661 

- Description: (D45423) ascorbate peroxidase [Oryza sativa] 

- % Identity: 87.3 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 2708: from 1 to 109 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2709 

- Ceres seq_id 1601693 

- Location of start within SEQ ID NO 2707: at 299 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Peroxidase 

- Location within SEQ ID NO 2709: from 23 to 74 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1252 

- gi No. 1321661 

- Description: (D45423) ascorbate peroxidase [Oryza sativa] 

- % Identity: 87.3 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 2709: from 1 to 7 4 



Maximum Length Sequence: 

related to: 
Clone IDs: 

332741 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2710 

- Ceres seq_id 1601697 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2711 

- Ceres seq_id 1601698 

- Location of start within SEQ ID NO 2710: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2712 

- Ceres seq_id 1601699 

- Location of start within SEQ ID NO 2710: at 89 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Chlorophyll A-B binding proteins 

- Location within SEQ ID NO 2712: from 51 to 138 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1253 

- gi No. 733456 

- Description: (U23189) chlorophyll a/b-binding apoprotein CP26 
precursor [Zeamays] 

- % Identity: 100 

- Alignment Length: 13 9 

- Location of Alignment in SEQ ID NO 2712: from 1 to 138 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

332751 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2713 

- Ceres seq_id 1601700 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2714 

- Ceres seq_id 1601701 

- Location of start within SEQ ID NO 2713: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2715 

- Ceres seq_id 1601702 

- Location of start within SEQ ID NO 2713: at 199 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- S-adenosylmethionine synthetase 

- Location within SEQ ID NO 2715: from 6 to 101 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1254 

- gi No. 1170937 

- Description: S-ADENOSYLMETHIONINE SYNTHETASE 1 (METHIONINE 
ADENOSYLTRANSFERASE 1) (ADOMET SYNTHETASE 1) >gi j 4 505 4 9 \ emb 1 CAA814 8 1 i 
(Z268 67) S-adenosyl methionine synthetase [Oryza sativa] 

- % Identity: 98 

- Alignment Length: 102 

- Location of Alignment in SEQ ID NO 2715: from 1 to 101 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2716 

- Ceres seq_id 1601703 

- Location of start within SEQ ID NO 2713: at 281 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

332781 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2717 

- Ceres seq_id 1601704 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2718 

- Ceres seq_id 1601705 

- Location of start within SEQ ID NO 2717: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Band 7 family 
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- Location within SEQ ID NO 2718: from 46 to 183 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 125 5 

- gi No. 1762949 

- Description: (U66271) ORF; able to induce HR-like lesions 
[Nicotiana tabacum] 

- % Identity: 89.5 

- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 2718: from 53 to 71 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2719 

- Ceres seq__id 1601706 

- Location of start within SEQ ID NO 2717: at 157 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Band 7 family 

- Location within SEQ ID NO 2719: from 1 to 131 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1256 

- gi No. 1762949 

- Description: (U66271) ORF; able to induce HR-like lesions 
[Nicotiana tabacum] 

- % Identity: 89.5 

- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 2719: from 1 to 19 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2720 

- Ceres seq_id 1601707 

- Location of start within SEQ ID NO 2717: at 259 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Band 7 family 

- Location within SEQ ID NO 2720: from 1 to 97 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

333028 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2721 

- Ceres seq_id 1601719 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2722 

- Ceres seq_id 1601720 

- Location of start within SEQ ID NO 2721: at 212 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ubiquitin-conjugating enzyme 

- Location within SEQ ID NO 2722: from 1 to 98 aa . 



(Dp) Related Amino Acid Sequences 
- Alignment No. 1257 
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- gi No, 136636 

- Description: UBIQUITIN-CONJUGATING ENZYME E2-17 KD 1 (UBIQUITIN- 
PROTEIN LI GAS E 1) (UBIQUITIN CARRIER PROTEIN 1) >gi j 107 6424 | pir j 1 S43781 
ubiquit in-conjugating enzyme UBC1 - Arabidopsis thaliana >gi | 4 4 2594 ( pdb 1 1AAK | 
Ubiquitin Conjugating 

- % Identity: 95.9 

- Alignment Length: 98 

- Location of Alignment in SEQ ID NO 2722: from 1 to 98 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2723 
■ Ceres seq_id 1601721 

- Location of start within SEQ ID NO 2721: at 239 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquit in-conjugating enzyme 

- Location within SEQ ID NO 2723: from 1 to 89 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1258 

- gi No. 136636 

- Description: UBIQUITIN-CONJUGATING ENZYME E2-17 KD 1 (UBIQUITIN- 
PROTEIN LIGASE 1) (UBIQUITIN CARRIER PROTEIN 1) >gi I 107 6424 j pir j| S43781 
ubiquitin-conjugating enzyme UBC1 - Arabidopsis thaliana >gi 1 4 425 94 | pdb | 1AAK 1 
Ubiquitin Conjugating 

- % Identity: 95.9 

- Alignment Length: 98 

- Location of Alignment in SEQ ID NO 2723: from 1 to 89 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2724 

- Ceres seq_id 1601722 

- Location of start within SEQ ID NO 2721: at 260 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin-conjugating enzyme 

- Location within SEQ ID NO 2724: from 1 to 82 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1259 

- gi No. 136636 

- Description: UBIQUITIN-CONJUGATING ENZYME E2-17 KD 1 (UBIQUITIN- 
PROTEIN LIGASE 1) (UBIQUITIN CARRIER PROTEIN 1) >gi 1 1 07 64 2 4 j pir | j S 4 37 8 1 
ubiquitin-conjugating enzyme UBC1 - Arabidopsis thaliana >gi | 4 4 2594 ! pdb I 1AAK [ 
Ubiquitin Conjugating 

- % Identity: 95.9 

- Alignment Length: 98 

- Location of Alignment in SEQ ID NO 2724: from 1 to 82 



Maximum Length Sequence: 

related to: 
Clone IDs: 

333707 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2725 

- Ceres seq_id 1601731 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2726 

- Ceres seq_id 1601732 
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- Location of start within SEQ ID NO 2725: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 60 

- gi No. 1362077 

- Description: glycin-rich protein - cowpea (fragment) 

>gi I 871770 |embjCAA61200j (X87948) glycin-rich protein [Vigna unguiculata] 

- % Identity: 75 

- Alignment Length: 16 

- Location of Alignment in SEQ ID NO 272 6: from 119 to 133 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2727 

- Ceres seq_id 1601733 

- Location of start within SEQ ID NO 2725: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 61 

- gi No. 93505 

- Description: 0RF5 protein - Orf virus (strain NZ2) >gi 1 332564 
(M30023) ORF5 [Orf virus] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2727: from 41 to 51 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2728 

- Ceres seq_id 1601734 

- Location of start within SEQ ID NO 2725: at 33 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 62 

- gi No. 93505 

- Description: ORF5 protein - Orf virus (strain NZ2) >gi 1 332564 
(M30023) ORF5 [Orf virus] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2728: from 31 to 41 

Maximum Length Sequence: 

related to: 
Clone IDs: 

334609 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2729 

- Ceres seq_id 1601753 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2730 

- Ceres seq__id 1601754 

- Location of start within SEQ ID NO 2729: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2730: from 4 to 111 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 12 63 

- gi No. 1185397 

- Description: (U25281) SH3 domain binding protein [Rattus 
norvegicus] >gi 1 158707 0 I prf | | 2205340A CR1 6 gene [Rattus norvegicus] 

- % Identity: 81.8 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2730: from 48 to 58 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2731 

- Ceres seq__id 1601755 

- Location of start within SEQ ID NO 2729: at 70 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

334742 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2732 

- Ceres seq_id 1601760 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2733 

- Ceres seq_id 1601761 

- Location of start within SEQ ID NO 2732: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Inorganic pyrophosphatase 

- Location within SEQ ID NO 2733: from 87 to 157 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1264 

- gi No. 5669924 

- Description: (AF14 9116) soluble inorganic pyrophosphatase 
[Populus tremula x Populus tremuloides] 

- % Identity: 82.5 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 2733: from 45 to 157 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2734 

- Ceres seq_id 1601762 

- Location of start within SEQ ID NO 2732: at 108 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Inorganic pyrophosphatase 

- Location within SEQ ID NO 2734: from 52 to 122 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1265 

- gi No. 5669924 

- Description: (AF149116) soluble inorganic pyrophosphatase 
[Populus tremula x Populus tremuloides] 

- % Identity: 82,5 
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- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 2734: from 10 to 122 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2735 

- Ceres seq_id 1601763 

- Location of start within SEQ ID NO 2732: at 165 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Inorganic pyrophosphatase 

- Location within SEQ ID NO 2735: from 33 to 103 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 12 66 

- gi No. 5669924 

- Description: (AF149116) soluble inorganic pyrophosphatase 
[Populus tremula x Populus tremuloides] 

- % Identity: 82.5 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 2735: from 1 to 103 

Maximum Length Sequence : 

related to: 
Clone IDs: 

335395 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2736 

- Ceres seq_id 1601773 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2737 

- Ceres seq_id 1601774 

- Location of start within SEQ ID NO 2736: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Glutathione S-transf erases . 

- Location within SEQ ID NO 2737: from 29 to 149 aa. 



(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2738 

- Ceres seq_id 1601775 

- Location of start within SEQ ID NO 2736: at 77 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Glutathione S-transf erases . 

- Location within SEQ ID NO 2738: from 4 to 124 aa . 



(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2739 

- Ceres seq_id 1601776 

- Location of start within SEQ ID NO 2736: at 134 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Glutathione S-transf erases . 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 571 



- Location within SEQ ID NO 2739: from 1 to 105 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

335402 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2740 

- Ceres seq_id 1601777 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2741 

- Ceres seq_id 1601778 

- Location of start within SEQ ID NO 2740: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Heme-binding domain in cytochrome b5 and oxidoreductases 

- Location within SEQ ID NO 2741: from 69 to 143 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2742 

- Ceres seq_id 1601779 

- Location of start within SEQ ID NO 2740: at 188 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Heme-binding domain in cytochrome b5 and oxidoreductases 

- Location within SEQ ID NO 2742: from 7 to 81 aa . 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2743 

- Ceres seq_id 1601780 

- Location of start within SEQ ID NO 2740: at 215 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Heme-binding domain in cytochrome b5 and oxidoreductases 

- Location within SEQ ID NO 2743: from 1 to 72 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

335638 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2744 

- Ceres seq_id 1601792 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2745 

- Ceres seq_id 1601793 

- Location of start within SEQ ID NO 27 44: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 4 6 

- Ceres seq_id 1601794 

- Location of start within SEQ ID NO 2744: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Aminotransferases class-V 

- Location within SEQ ID NO 2746: from 100 to 157 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2747 

- Ceres seq_id 1601795 

- Location of start within SEQ ID NO 2744: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

335743 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 27 4 8 

- Ceres seq_id 1601800 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 4 9 

- Ceres seq_id 1601801 

- Location of start within SEQ ID NO 2748: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2750 

- Ceres seq_id 1601802 

- Location of start within SEQ ID NO 2748: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 67 

- gi No. 2769725 

- Description: (U92288) H88 [Human herpesvirus 6] 

- % Identity: 84.6 

- Alignment Length: 13 

- Location of Alignment in SEQ ID NO 2750: from 16 to 28 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336346 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2751 

- Ceres seq_id 1601803 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2752 

- Ceres seq_id 1601804 

- Location of start within SEQ ID NO 2751: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 68 

- gi No. 3421384 

- Description: (AF081067) IAA-Ala hydrolase; IAA-amino acid 
hydrolase [Arabidopsis thaliana] 

- % Identity: 71.2 

- Alignment Length: 5 9 

- Location of Alignment in SEQ ID NO 2752: from 84 to 142 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2753 

- Ceres seq_id 1601805 

- Location of start within SEQ ID NO 2751: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336440 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2754 

- Ceres seq_id 1601806 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2755 
_ ceres seq_id 1601807 

- Location of start within SEQ ID NO 2754: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1269 

- gi No. 1747296 

- Description: (D45384) vacuolar H+-pyrophosphatase [Oryza sativa] 
>gi 1 32 98 47 6 | dbj ! BAA3152 4 | (AB012766) ovp2 [Oryza sativa] 

- % Identity: 90.7 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 2755: from 1 to 85 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 5 6 

- Ceres seq__id 1601808 

- Location of start within SEQ ID NO 2754: at 75 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 127 0 

- gi No. 1747296 

- Description: (D45384) vacuolar ^-pyrophosphatase [Oryza sativa] 
>gi | 32 98 47 6 i dbj j BAA31524 | (AB012766) ovp2 [Oryza sativa] 

- % Identity: 90.7 
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- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 2756: from 1 to 61 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336544 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2757 

- Ceres seq_id 1601816 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2758 

- Ceres seq_id 1601817 

- Location of start within SEQ ID NO 2757: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s) 

{Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2759 

- Ceres seq_id 1601818 

- Location of start within SEQ ID NO 2757: at 115 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Aminotransf erases class-I 

- Location within SEQ ID NO 2759: from 38 to 94 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1271 

- gi No. 1353352 

- Description: (U31975) alanine aminotransferase [Chlamydomonas 
reinhardtii] 

- % Identity: 79.8 

- Alignment Length: 8 4 

- Location of Alignment in SEQ ID NO 2759: from 11 to 94 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336576 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2760 

- Ceres seq_id 1601823 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 61 

- Ceres seq_id 1601824 

- Location of start within SEQ ID NO 2760: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1272 

- gi No. 100934 

- Description: ubiquitin precursor Ubi-1 - maize 

>gi i 422037 Ipir M S20926 ubiquitin precursor Ubi-2 - maize >gi 1 24 8 337 | bbs i 94 4 65 
(S944 64) polyubiquitin (ubiquitin) [maize, Peptide, 533 aa] [ Zea mays] 
Peptide, 533 aa] [Zea mays] 

- % Identity: 78 

- Alignment Length: 41 
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- Location of Alignment in SEQ ID NO 2761: from 24 to 64 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2762 

- Ceres seq_ici 1601825 

- Location of start within SEQ ID NO 2760: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin family 

- Location within SEQ ID NO 2762: from 49 to 102 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1273 

- gi No. 100934 

- Description: ubiquitin precursor Ubi-1 - maize 

>gij 422037 ipirl [S20926 ubiquitin precursor Ubi-2 - maize >gi 1 24 8337 | bbs | 94 4 65 
(S94464) polyubiquitin (ubiquitin) [maize, Peptide, 533 aa] [Zea mays] 
Peptide, 533 aa] [Zea mays] 

- % Identity: 96.2 

- Alignment Length: 105 

- Location of Alignment in SEQ ID NO 27 62: from 53 to 132 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2763 

- Ceres seq_id 1601826 

- Location of start within SEQ ID NO 2760: at 79 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 127 4 

- gi No. 100934 

- Description: ubiquitin precursor Ubi-1 - maize 
>gi!422037jpiri 1S20926 ubiquitin precursor Ubi-2 - maize >gi I 2 4 8337 | bbs | 94 4 65 
(S94464) polyubiquitin (ubiquitin) [maize, Peptide, 533 aa] [Zea mays] 
Peptide, 533 aa] [Zea mays] 

- % Identity: 78 

- Alignment Length: 41 

- Location of Alignment in SEQ ID NO 27 63: from 1 to 38 

Maximum Length Sequence: 

related to: 
Clone IDs: 

336955 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 27 64 

- Ceres seq_id 1601840 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2765 

- Ceres seq_id 1601841 

- Location of start within SEQ ID NO 2764: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 127 5 

- gi No. 2190548 

- Description: (AC001229) EST gb|ATTS1121 comes from this gene. 
[Arabidopsis thaliana] 

- % Identity: 86.9 
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- Alignment Length: 8 4 

- Location of Alignment in SEQ ID NO 27 65: from 93 to 175 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337057 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 27 66 

- Ceres seq__id 1601858 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 67 

- Ceres seq_id 1601859 

- Location of start within SEQ ID NO 2766: at 121 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Inorganic pyrophosphatase 

- Location within SEQ ID NO 2767: from 52 to 124 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 127 6 

- gi No. 4033424 

- Description: SOLUBLE INORGANIC PYROPHOSPHATASE (PYROPHOSPHATE 
PHOSPHO-HYDROLASE) (PPASE) >gi| 2668746 (AF034947) inorganic pyrophosphatase 
[Zea mays] 

- % Identity: 97.6 

- Alignment Length: 125 

- Location of Alignment in SEQ ID NO 2767: from 1 to 124 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337078 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 27 68 

- Ceres seq_id 1601860 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 69 

- Ceres seq_id 1601861 

- Location of start within SEQ ID NO 2768: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Actin 

- Location within SEQ ID NO 2769: from 22 to 173 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1277 

- gi No. 1703108 

- Description: ACTIN 2/7 >gi | 2129525 | pir | i S71210 actin 2 - 
Arabidopsis thaliana >gi I 2129528 | pir | i S68107 actin 7 - Arabidopsis thaliana 
>gi 11049307 (U37281) actin- 2 [Arabidopsis thaliana] >gi | 1943863 (U27811) 
actin7 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 153 

- Location of Alignment in SEQ ID NO 27 69: from 22 to 173 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2770 

- Ceres seq_id 1601862 
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- Location of start within SEQ ID NO 27 68: at 65 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Actin 

- Location within SEQ ID NO 2770: from 1 to 152 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 127 8 

- gi No. 1703108 

- Description: ACTIN 2/7 >gi S 2 12 9525 | pir | j S7 1210 actin 2 - 
Arabidopsis thaliana >gi I 2129528 I pir i I S68107 actin 7 - Arabidopsis thaliana 
>gi 11049307 (U37281) actin-2 [Arabidopsis thaliana] >gi 11943863 (U27811) 
actin7 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 153 

- Location of Alignment in SEQ ID NO 2770: from 1 to 152 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2771 

- Ceres seq__id 1601863 

- Location of start within SEQ ID NO 2768: at 116 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Actin 

- Location within SEQ ID NO 2771: from 1 to 135 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 127 9 

- gi No. 1703108 

- Description: ACTIN 2/7 >gi | 212 9525 i pir i | S7 1210 actin 2 - 
Arabidopsis thaliana >gi ! 2129528 | pir || S68107 actin 7 - Arabidopsis thaliana 
>gi | 1049307 (U37281) actin-2 [Arabidopsis thaliana] >gi i 1943863 (U27811) 
actin7 [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 153 

- Location of Alignment in SEQ ID NO 2771: from 1 to 135 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337094 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2772 

- Ceres seq__id 1601864 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2773 

- Ceres seq_id 1601865 

- Location of start within SEQ ID NO 2772: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2774 

- Ceres seq_id 1601866 

- Location of start within SEQ ID NO 2772: at 3 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1280 

- gi No. 2108256 

- Description: (Y13141) extensin [Bromheadia f inlaysoniana] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2774: from 63 to 73 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 7 5 

- Ceres seq_id 1601867 

- Location of start within SEQ ID NO 2772: at 108 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1281 

- gi No. 2108256 

- Description: (Y13141) extensin [Bromheadia f inlaysoniana] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2775: from 28 to 38 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337125 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2776 

- Ceres seq__id 1601868 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2777 

- Ceres seq_id 1601869 

- Location of start within SEQ ID NO 277 6: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Protamine PI 

- Location within SEQ ID NO 2777: from 65 to 105 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2778 

- Ceres seq_id 1601870 

- Location of start within SEQ ID NO 277 6: at 164 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 277 9 

- Ceres seq_id 1601871 

- Location of start within SEQ ID NO 277 6: at 17 9 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 
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Maximum Length Sequence : 

related to: 
Clone IDs: 

337191 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2780 

- Ceres seq_id 1601872 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 81 

- Ceres seq_id 1601873 

- Location of start within SEQ ID NO 2780: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2781: from 2 to 92 aa. 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2782 

- Ceres seq__id 1601874 

- Location of start within SEQ ID NO 2780: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2783 

- Ceres seq_id 1601875 

- Location of start within SEQ ID NO 2780: at 73 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1282 

- gi No. 3851005 

- Description: (AF069911) pyruvate dehydrogenase El alpha subunit 

[Zea mays] 

- % Identity: 92.9 

- Alignment Length: 99 

- Location of Alignment in SEQ ID NO 27 83: from 1 to 96 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337281 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2784 

- Ceres seq__id 1601883 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2785 

- Ceres seq_id 1601884 

- Location of start within SEQ ID NO 278 4: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 8 6 

- Ceres seq_id 1601885 

- Location of start within SEQ ID NO 2784: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2787 

- Ceres seq_id 1601886 

- Location of start within SEQ ID NO 2784: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1283 

- gi No. 639867 

- Description: (D37884) choline kinase R [Rattus rattus] 

- % Identity: 76.9 

- Alignment Length: 13 

- Location of Alignment in SEQ ID NO 2787: from 7 to 19 



Maximum Length Sequence: 

related to: 
Clone IDs: 

337301 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2788 

- Ceres seq_id 1601890 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2789 

- Ceres seq_id 1601891 

- Location of start within SEQ ID NO 2788: at 85 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 128 4 

- gi No. 2407331 

- Description: (AF017787) metallothionein-1 like protein [Oenanthe 

j avanica] 

- % Identity: 91.3 

- Alignment Length: 23 

- Location of Alignment in SEQ ID NO 2789: from 1 to 23 



Maximum Length Sequence: 

related to: 
Clone IDs: 

337309 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 27 90 

- Ceres seq_id 1601892 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 91 

- Ceres seq_id 1601893 

- Location of start within SEQ ID NO 2790: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 581 



- ATP synthase alpha and beta subunits 

- Location within SEQ ID NO 2791: from 70 to 144 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 12 8 5 

- gi No. 2493131 

- Description: VACUOLAR ATP SYNTHASE SUBUNIT B ISOFORM 1 (V-ATPASE 
B SUBUNIT) >gi | 167108 (L11862) vacuolar ATPase B subunit [Hordeum vulgare] 

- % Identity: 97.1 

- Alignment Length: 102 

- Location of Alignment in SEQ ID NO 27 91: 



from 44 to 144 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 92 

- Ceres seq_id 1601894 

- Location of start within SEQ ID NO 2790: at 131 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ATP synthase alpha and beta subunits 

- Location within SEQ ID NO 2792: from 27 to 101 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 128 6 

- gi No. 2493131 

- Description: VACUOLAR ATP SYNTHASE SUBUNIT B ISOFORM 1 {V-ATPASE 
B SUBUNIT) >gi| 167108 (L11862) vacuolar ATPase B subunit [Hordeum vulgare] 

- % Identity: 97.1 

- Alignment Length: 102 

- Location of Alignment in SEQ ID NO 2792: from 1 to 101 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 93 

- Ceres seq__id 1601895 

- Location of start within SEQ ID NO 2790: at 158 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ATP synthase alpha and beta subunits 

- Location within SEQ ID NO 2793: from 18 to 92 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 128 7 

- gi No. 2493131 

- Description: VACUOLAR ATP SYNTHASE SUBUNIT B ISOFORM 1 (V-ATPASE 
B SUBUNIT) >gi 1167108 (L11862) vacuolar ATPase B subunit [Hordeum vulgare] 

- % Identity: 97.1 

- Alignment Length: 102 

- Location of Alignment in SEQ ID NO 27 93: from 1 to 92 



Maximum Length Sequence: 

related to: 
Clone IDs: 

337312 

{Ac) cDNA Polynucleotide Sequence 
- Pat. Appln. SEQ ID NO 27 94 
_ Ceres seq__id 1601900 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 95 

- Ceres seq__id 1601901 

- Location of start within SEQ ID NO 27 94: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 8 8 

- gi No. 2088650 

- Description: (AF002109) peroxisomal ATP/ADP carrier protein 
isolog [Arabidopsis thaliana] 

- % Identity: 100 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 27 95: from 93 to 112 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 96 

- Ceres seq_id 1601902 

- Location of start within SEQ ID NO 27 94: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 97 

- Ceres seq_id 1601903 

- Location of start within SEQ ID NO 27 94: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

337329 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 27 98 

- Ceres seq_id 1601904 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 27 99 

- Ceres seq_id 1601905 

- Location of start within SEQ ID NO 27 98: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Mucin-like glycoprotein 

- Location within SEQ ID NO 27 99: from 17 to 100 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337354 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2800 

- Ceres seq_id 1601909 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2801 

- Ceres seq_id 1601910 

- Location of start within SEQ ID NO 2800: at 1 nt . 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 28 02 

- Ceres seq_id 1601911 

- Location of start within SEQ ID NO 2800: at 308 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 128 9 

- gi No. 1143511 

- Description: (Z47076) Ser/Thr protein phosphatase homologous to 
PPX [Malus domestical >gi I 1586034 1 prf I I 220234 OA Ser/Thr protein phosphatase 
[Malus domestical 

- % Identity: 75.8 

- Alignment Length: 65 

- Location of Alignment in SEQ ID NO 2802: from 1 to 42 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337432 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2803 

- Ceres seq_id 1601916 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2804 

- Ceres seq_id 1601917 

- Location of start within SEQ ID NO 2803: at 1 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2805 

- Ceres seq_id 1601918 

- Location of start within SEQ ID NO 2803: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2806 

- Ceres seq_id 1601919 

- Location of start within SEQ ID NO 2803: at 141 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 90 

- gi No. 1172773 

- Description: TRANSCRIPTIONAL ACTIVATOR PROTEIN PUR-ALPHA (PURINE 
RICH SINGLE-STRANDED DNA-BINDING PROTEIN ALPHA) >gi 1404650 (U02098) Pur-alph 
[Mus musculus] >gi | 2460121 | gb| AAB71860.il (AF017631) purine-rich single- 
stranded DNA-binding protein alpha 

- % Identity: 70.8 
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- Alignment Length: 24 

- Location of Alignment in SEQ ID NO 2806: from 15 to 38 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337566 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 28 07 

- Ceres seq__id 1601928 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2808 

- Ceres seq_id 1601929 

- Location of start within SEQ ID NO 2807: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2809 

- Ceres seq_id 1601930 

- Location of start within SEQ ID NO 2807: at 226 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- RNA recognition motif, (a.k.a. RRM, RBD, or RNP domain) 

- Location within SEQ ID NO 2809: from 4 to 69 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 12 91 

- gi No. 4033467 

- Description: ARGININE/SERINE-RICH SPLICING FACTOR RSP31 

>gi I 1707366 1 emb I CAA67798 1 (X99435) splicing factor [Arabidopsis thaliana] 

- % Identity: 70.3 

- Alignment Length: 7 4 

- Location of Alignment in SEQ ID NO 2809: from 1 to 74 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2810 

- Ceres seq_id 1601931 

- Location of start within SEQ ID NO 2807: at 322 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 92 

- gi No. 4033467 

- Description: ARGININE/SERINE-RICH SPLICING FACTOR RSP31 

>gi | 1707366 1 emb | CAA67798 | (X99435) splicing factor [Arabidopsis thaliana] 

- % Identity: 70.3 

- Alignment Length: 74 

- Location of Alignment in SEQ ID NO 2810: from 1 to 42 



Maximum Length Sequence : 

related to: 
Clone IDs: 

337619 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2811 

- Ceres seq_id 1601940 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2812 

- Ceres seq_id 1601941 

- Location of start within SEQ ID NO 2811: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2813 

- Ceres seq_id 1601942 

- Location of start within SEQ ID NO 2811: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2814 

- Ceres seq_id 1601943 

- Location of start within SEQ ID NO 2811: at 115 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 93 

- gi No. 2815582 

- Description: (AF012911) histidine hexamer-multiple cloning site- 
histidine decamer [plasmid pETHIS-1] 

- % Identity: 81.8 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2814: from 5 to 15 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337663 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2815 

- Ceres seq_id 1601947 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2816 

- Ceres seq_id 1601948 

- Location of start within SEQ ID NO 2815: at 133 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin-conjugating enzyme 

- Location within SEQ ID NO 2816: from 1 to 109 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1294 

- gi No. 2668744 

- Description: (AF034946) ubiquitin conjugating enzyme [Zea mays] 

- % Identity: 100 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 2816: from 1 to 10 9 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2817 
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- Ceres seq_id 1601949 

- Location of start within SEQ ID NO 2815: at 220 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ubiquit in-conjugating enzyme 

- Location within SEQ ID NO 2817: from 1 to 80 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 95 

- gi No. 2668744 

- Description: (AF034946) ubiquitin conjugating enzyme [Zea mays] 

- % Identity: 100 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 2817: from 1 to 80 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2818 

- Ceres seq_id 1601950 

- Location of start within SEQ ID NO 2815: at 244 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ubiquitin-conjugating enzyme 

- Location within SEQ ID NO 2818: from 1 to 72 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1296 

- gi No. 2668744 

- Description: (AF034946) ubiquitin conjugating enzyme [Zea mays] 

- % Identity: 100 

- Alignment Length: 110 

- Location of Alignment in SEQ ID NO 2818: from 1 to 72 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337687 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2819 

- Ceres seq_id 1601957 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2820 

- Ceres seq_id 1601958 

- Location of start within SEQ ID NO 2819: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2821 

- Ceres seq_id 1601959 

- Location of start within SEQ ID NO 2819: at 139 nt . 

<C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Cof ilin/tropomyosin-type act in-binding proteins 

- Location within SEQ ID NO 2821: from 16 to 101 aa. 



(Dp) Related Amino Acid Sequences 
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- Alignment No. 12 97 

- gi No. 4185513 

- Description: (AF102823) actin depolymerizing factor 5 
[Arabidopsis thaliana] >gi 14185517 (AF102825) actin depolymerizing factor 5 
[Arabidopsis thaliana] 

- % Identity: 73.5 

- Alignment Length: 102 

- Location of Alignment in SEQ ID NO 2821: from 1 to 101 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2822 

- Ceres seq_id 1601960 

- Location of start within SEQ ID NO 2819: at 145 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Cof ilin/tropomyosin-type actin-binding proteins 

- Location within SEQ ID NO 2822: from 14 to 99 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1298 

- gi No. 4185513 

- Description: (AF102823) actin depolymerizing factor 5 
[Arabidopsis thaliana] >gi 14185517 (AF102825) actin depolymerizing fact 
[Arabidopsis thaliana] 

- % Identity: 73.5 

- Alignment Length: 102 

- Location of Alignment in SEQ ID NO 2822: from 1 to 99 



Maximum Length Sequence: 

related to: 
Clone IDs: 

337827 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2823 

- Ceres seq__id 1601976 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2824 

- Ceres seq_id 1601977 

- Location of start within SEQ ID NO 2823: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 12 9 9 

- gi No. 1346530 

- Description: MAGO NASHI PROTEIN HOMO LOG 

- % Identity: 98.5 

- Alignment Length: 65 

- Location of Alignment in SEQ ID NO 2824: from 56 to 120 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2825 

- Ceres seq_id 1601978 

- Location of start within SEQ ID NO 2823: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 2826 

- Ceres seq_id 1601979 

- Location of start within SEQ ID NO 2823: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337838 

{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2827 

- Ceres seq_id 1601980 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2828 

- Ceres seq_id 1601981 

- Location of start within SEQ ID NO 2827: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2829 

- Ceres seq_id 1601982 

- Location of start within SEQ ID NO 2827: at 141 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1300 

- gi No. 4914432 

- Description: (AL050351) ribosomal protein S25 [Arabidopsis 

thaliana] 

- % Identity: 76.4 

- Alignment Length: 8 9 

- Location of Alignment in SEQ ID NO 2829: from 1 to 89 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337890 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2830 

- Ceres seq_id 1601994 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2831 

- Ceres seq_id 1601995 

- Location of start within SEQ ID NO 2830: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- RNA recognition motif, (a.k.a. RRM, RBD 7 or RNP domain) 

- Location within SEQ ID NO 2831: from 2 to 60 aa . 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2832 
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- Ceres seq_id 1601996 

- Location of start within SEQ ID NO 2830: at 184 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

337928 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2833 

- Ceres seq_id 1601997 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2834 

- Ceres seq__id 1601998 

- Location of start within SEQ ID NO 2833: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 13 01 

- gi No. 3970680 

- Description: (AL034388) 67A9.b [Drosophila melanogaster] 

- % Identity: 80 

- Alignment Length: 15 

- Location of Alignment in SEQ ID NO 2834: from 111 to 125 

Maximum Length Sequence : 

related to: 
Clone IDs: 

337949 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2835 

- Ceres seq_id 1602005 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2836 

- Ceres seq_id 1602006 

- Location of start within SEQ ID NO 2835: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1302 

- gi No. 102706 

- Description: protamine - common cuttlefish 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 2836: from 114 to 127 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2837 

- Ceres seq_id 1602007 

- Location of start within SEQ ID NO 2835: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Aldehyde dehydrogenase 

- Location within SEQ ID NO 2837: from 15 to 146 aa . 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 1303 

- gi No. 1711534 

- Description: SUCCINATE SEMI ALDEHYDE DEHYDROGENASE (NAD (-IN- 
DEPENDENT SUCCINIC SEMI ALDEHYDE DEHYDROGENASE) >gi I 2136207 i pir j ! A55773 
succinate-semialdehyde dehydrogenase (EC 1.2.1.24) - human (fragment) 
>gi 1556221 (L34820) succinate 

- % Identity: 73 

- Alignment Length: 37 

- Location of Alignment in SEQ ID NO 2837: from 110 to 146 

Maximum Length Sequence: 

related to: 
Clone IDs: 

338191 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2838 

- Ceres seq_id 1602032 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2839 

- Ceres seq_id 1602033 

- Location of start within SEQ ID NO 2838: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2840 

- Ceres seq_id 1602034 

- Location of start within SEQ ID NO 2838: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2841 

- Ceres seq_id 1602035 

- Location of start within SEQ ID NO 2838: at 124 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- DnaJ domain 

- Location within SEQ ID NO 2841: from 13 to 74 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1304 

- gi No. 2984709 

- Description: (AF053468) DnaJ-related protein ZMDJ1 [Zea mays] 

- % Identity: 99.1 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 2841: from 1 to 111 

Maximum Length Sequence: 

related to: 
Clone IDs: 

338275 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2842 

- Ceres seq_id 1602039 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 591 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2843 

- Ceres seq__id 1602040 

- Location of start within SEQ ID NO 2842: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal L27 protein 

- Location within SEQ ID NO 2843: from 59 to 139 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1305 

- gi No. 2961176 

- Description: (AF050674) ribosomal protein L27 precursor [Oryza 

sativa] 

- % Identity: 83.8 

- Alignment Length: 160 

- Location of Alignment in SEQ ID NO 2843: from 4 to 157 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2844 

- Ceres seq_id 1602041 

- Location of start within SEQ ID NO 2842: at 10 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal L27 protein 

- Location within SEQ ID NO 2844: from 56 to 136 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 130 6 

- gi No. 2961176 

- Description: (AF050674) ribosomal protein L27 precursor [Oryza 

sativa] 

- % Identity: 83.8 

- Alignment Length: 160 

- Location of Alignment in SEQ ID NO 2844: from 1 to 154 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2845 

- Ceres seq_id 1602042 

- Location of start within SEQ ID NO 2842: at 19 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal L27 protein 

- Location within SEQ ID NO 2845: from 53 to 133 aa . 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1307 

- gi No. 2961176 

- Description: (AF050674) ribosomal protein L27 precursor [Oryza 

sativa] 

- % Identity: 83.8 

- Alignment Length: 160 

- Location of Alignment in SEQ ID NO 28 45: from 1 to 151 



Maximum Length Sequence: 

related to: 
Clone IDs: 

338455 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2846 

- Ceres seq_id 1602053 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2847 

- Ceres seq_id 1602054 

- Location of start within SEQ ID NO 2846: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2848 

- Ceres seq^id 1602055 

- Location of start within SEQ ID NO 2846: at 154 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1308 

- gi No. 3063467 

- Description: (AC003981) F22013.29 [Arabidopsis thaliana] 

- % Identity: 71.4 

- Alignment Length: 77 

- Location of Alignment in SEQ ID NO 28 48: from 1 to 7 5 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 284 9 

- Ceres seq__id 1602056 

- Location of start within SEQ ID NO 2846: at 296 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

338526 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2850 

- Ceres seq_id 1602060 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2851 

- Ceres seq_id 1602061 

- Location of start within SEQ ID NO 2850: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- GDP dissociation inhibitor 

- Location within SEQ ID NO 2851: from 49 to 161 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 130 9 

- gi No. 2384758 

- Description: (AF016896) GDP dissociation inhibitor protein OsGDIl 
[Oryza sativa] 

- % Identity: 100 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 2851: from 49 to 161 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2852 

- Ceres seq__id 1602062 

- Location of start within SEQ ID NO 2850: at 147 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- GDP dissociation inhibitor 

- Location within SEQ ID NO 2852: from 1 to 113 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1310 

- gi No. 2384758 

- Description: (AF016896) GDP dissociation inhibitor protein OsGDIl 
[Oryza sativa] 

- % Identity: 100 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 2852: from 1 to 113 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2853 

- Ceres seq_id 1602063 

- Location of start within SEQ ID NO 2850: at 243 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- GDP dissociation inhibitor 

- Location within SEQ ID NO 2853: from 1 to 81 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1311 

- gi No. 2384758 

- Description: (AF016896) GDP dissociation 
[Oryza sativa] 

- % Identity: 100 

- Alignment Length: 114 

- Location of Alignment in SEQ ID NO 2853: 

Maximum Length Sequence: 

related to: 
Clone IDs: 

338699 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2854 

- Ceres seq_id 1602076 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2855 

- Ceres seq_id 1602077 

- Location of start within SEQ ID NO 2854: at 93 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1312 

- gi No. 2388986 

- Description: (Z98980) actin associated protein 
[Schizosaccharomyces pombe] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2855: from 122 to 132 



inhibitor protein OsGDIl 



from 1 to 81 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

338705 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2856 

- Ceres seq_id 1602078 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2857 

- Ceres seq_id 1602079 

- Location of start within SEQ ID NO 2856: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1313 

- gi No. 3451392 

- Description: (AJ001264) mitochondrial uncoupling protein 
[Arabidopsis thaliana] >gi i 4 12744 6 | emb | CAA77109 | (Y18291) uncoupling protein 
[Arabidopsis thaliana] 

- % Identity: 71.1 

- Alignment Length: 38 

- Location of Alignment in SEQ ID NO 2857: from 107 to 144 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2858 

- Ceres seq__id 1602080 

- Location of start within SEQ ID NO 2856: at 164 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Mitochondrial carrier proteins 

- Location within SEQ ID NO 2858: from 12 to 98 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1314 

- gi No. 5031287 

- Description: (AF139921) uncoupling protein 1 [Bos taurus] 

- % Identity: 90.9 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2858: from 34 to 44 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2859 

- Ceres seq_id 1602081 

- Location of start within SEQ ID NO 2856: at 235 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1315 

- gi No. 3451392 

- Description: (AJ001264) mitochondrial uncoupling protein 
[Arabidopsis thaliana] >gi I 4 127 44 6 i emb j CAA77109 I (Y18291) uncoupling protein 
[Arabidopsis thaliana] 

- % Identity: 71.1 

- Alignment Length: 38 

- Location of Alignment in SEQ ID NO 2859: from 2 9 to 66 
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Maximum Length Sequence: 
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related to: 
Clone IDs: 

338733 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2860 

- Ceres seq_id 1602089 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2861 

- Ceres seq_id 1602090 

- Location of start within SEQ ID NO 28 60: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2862 

- Ceres seq_id 1602091 

- Location of start within SEQ ID NO 2860: at 140 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1316 

- gi No. 3128180 

- Description: (AC004521) citrate synthetase [Arabidopsis thaliana] 

- % Identity: 78 

- Alignment Length: 109 

- Location of Alignment in SEQ ID NO 28 62: from 1 to 107 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2863 
_ Ceres seq_id 1602092 

- Location of start within SEQ ID NO 2860: at 188 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1317 

- gi No. 3128180 

- Description: (AC004521) citrate synthetase [Arabidopsxs thalxana] 

- % Identity: 78 

- Alignment Length: 109 

- Location of Alignment in SEQ ID NO 2863: from 1 to 91 

Maximum Length Sequence: 

related to: 
Clone IDs: 

338740 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 28 64 

- Ceres seq_id 1602093 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 28 65 

- Ceres seq_id 1602094 

- Location of start within SEQ ID NO 2864: at 157 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
- Alignment No. 1318 
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- gi No. 1827677 

- Description: Brassica Napus Enoyl Acp ReductaseNADH BINARY 
COMPLEX AT Ph 8.0 And Room Temperature >gi I 1827 678 t pdb | 1ENO 1 Brassica Napus 
Enoyl Acp ReductaseNAD BINARY COMPLEX AT Ph 8.0 And Room Temperature 

- % Identity: 86.4 

- Alignment Length: 5 9 

- Location of Alignment in SEQ ID NO 2865: from 72 to 129 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2866 

- Ceres seq_id 1602095 

- Location of start within SEQ ID NO 2864: at 181 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1319 

- gi No. 1827677 

- Description: Brassica Napus Enoyl Acp ReductaseNADH BINARY 
COMPLEX AT Ph 8.0 And Room Temperature >gi | 1827 678 | pdb | 1ENO ! Brassica Napus 
Enoyl Acp ReductaseNAD BINARY COMPLEX AT Ph 8.0 And Room Temperature 

- % Identity: 86.4 

- Alignment Length: 5 9 

- Location of Alignment in SEQ ID NO 2866: from 64 to 121 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 28 67 

- Ceres seq__id 1602096 

- Location of start within SEQ ID NO 2864: at 187 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1320 

- gi No. 1827677 

- Description: Brassica Napus Enoyl Acp ReductaseNADH BINARY 
COMPLEX AT Ph 8.0 And Room Temperature >gi i 1827 67 8 | pdb | 1ENO I Brassica Napus 
Enoyl Acp ReductaseNAD BINARY COMPLEX AT Ph 8.0 And Room Temperature 

- % Identity: 86.4 

- Alignment Length: 59 

- Location of Alignment in SEQ ID NO 2867: from 62 to 119 

Maximum Length Sequence: 

related to: 
Clone IDs: 

338771 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2868 

- Ceres seq_id 1602104 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2869 

- Ceres seq__id 1602105 

- Location of start within SEQ ID NO 28 68: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2870 

- Ceres seq_id 1602106 
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- Location of start within SEQ ID NO 2868: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2871 

- Ceres seq_id 1602107 

- Location of start within SEQ ID NO 2868: at 235 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ras family 

- Location within SEQ ID NO 2871: from 17 to 92 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1321 

- gi No. 114088 

- Description: RAS -RELATED PROTEIN ARA-3 >gi | 320560 ! pir | I JS0640 
GTP-binding protein ara-3 - Arabidopsis thaliana >gi I 217837 1 dbj I BAA00830 | 
(D01025) small GTP-binding protein [Arabidopsis thaliana] 

- % Identity: 97.8 

- Alignment Length; 92 

- Location of Alignment in SEQ ID NO 2871: from 1 to 92 

Maximum Length Sequence: 

related to: 
Clone IDs: 

338822 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2872 

- Ceres seq_id 1602114 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2873 

- Ceres seq_id 1602115 

- Location of start within SEQ ID NO 2872: at 139 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1322 

- gi No. 4006848 

- Description: (AJ131433) selenocysteine methyltrans f erase 
[Astragalus bisulcatus] 

- % Identity: 72.3 

- Alignment Length: 94 

- Location of Alignment in SEQ ID NO 2873: from 20 to 113 

Maximum Length Sequence: 

related to: 
Clone IDs: 

338915 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 28 7 4 

- Ceres seq_id 1602137 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2875 

- Ceres seq_id 1602138 

- Location of start within SEQ ID NO 2874: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Collagen triple helix repeat {20 copies) 

- Location within SEQ ID NO 2875: from 86 to 134 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1323 

- gi No. 2088843 

- Description: (AF003386) F59E12.9 gene product [Caenorhabditis 

elegans ] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 2875: from 11 to 22 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2876 

- Ceres seq_id 1602139 

- Location of start within SEQ ID NO 2874: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1324 

- gi No. 310622 

- Description: (L20249) homologous to Saccharopolyspora erythraea 
beta-ketoacylsynthase [Streptomyces coriof aciens] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 2876: from 128 to 139 



(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2877 

- Ceres seq_id 1602140 

- Location of start within SEQ ID NO 287 4: at 5 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1325 

- gi No. 310622 

- Description: (L2024 9) homologous to Saccharopolyspora erythraea 
beta-ketoacylsynthase [Streptomyces coriof aciens] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 2877: from 127 to 138 



Maximum Length Sequence : 

related to: 
Clone IDs: 

338933 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2878 

- Ceres seq_id 1602141 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2879 

- Ceres seq_id 1602142 

- Location of start within SEQ ID NO 2878: at 1 nt. 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- 7 transmembrane receptor (rhodopsin family) 
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- Location within SEQ ID NO 2879: from 37 to 122 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2880 

- Ceres seq_id 1602143 

- Location of start within SEQ ID NO 2878: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2881 

- Ceres seq__id 1602144 

- Location of start within SEQ ID NO 2878: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

338964 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2882 

- Ceres seq_id 1602152 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2883 

- Ceres seq_id 1602153 

- Location of start within SEQ ID NO 2882: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1326 

- gi No. 2274845 

- Description: (D88461) N-WASP [Rattus rattus] 

- % Identity: 81.8 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2883: from 24 to 34 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2884 

- Ceres seq_id 1602154 

- Location of start within SEQ ID NO 2882: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2885 

- Ceres seq_id 1602155 

- Location of start within SEQ ID NO 2882: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Collagen triple helix repeat (20 copies) 
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- Location within SEQ ID NO 2885: from 41 to 83 aa . 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

338968 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2886 

- Ceres seq_id 1602156 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2887 

- Ceres seq_id 1602157 

- Location of start within SEQ ID NO 2886: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1327 

- gi No. 1657855 

- Description: (U73216) cold acclimation protein WCOR413 [Tritrcum 

aestivum] 

- % Identity: 72.7 

- Alignment Length: 99 

- Location of Alignment in SEQ ID NO 2887: from 57 to 154 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2888 

- Ceres seq_id 1602158 

- Location of start within SEQ ID NO 2886: at 153 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1328 

- gi No. 1657855 rm 

- Description: (U73216) cold acclimation protein WCOR413 [Tritrcum 

aestivum] 

- % Identity: 72.7 

- Alignment Length: 9 9 

- Location of Alignment in SEQ ID NO 2888: from 7 to 104 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2889 

- Ceres seq_id 1602159 

- Location of start within SEQ ID NO 2886: at 183 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1329 

- gi No. 1657855 , . 

- Description: (U73216) cold acclimation protein WCOR413 [Triticum 

aestivum] 

- % Identity: 72.7 

- Alignment Length: 99 

- Location of Alignment in SEQ ID NO 2889: from 1 to 94 



Maximum Length Sequence: 
related to: 
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Clone IDs: 

339014 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 28 90 

- Ceres seq_id 1602160 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2891 

- Ceres seq_id 1602161 

- Location of start within SEQ ID NO 2890: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2891: from 4 to 79 aa . 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 28 92 

- Ceres seq_id 1602162 

- Location of start within SEQ ID NO 2890: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2893 

- Ceres seq_id 1602163 

- Location of start within SEQ ID NO 2890: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1330 

- gi No. 2315424 

- Description: (AF016446) similar to C. elegans cuticulin precursor 
CUT-2 (SP:P34682) [Caenorhabditis elegans] 

- % Identity: 75 

- Alignment Length: 12 

- Location of Alignment in SEQ ID NO 28 93: from 61 to 7 2 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339063 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2894 

- Ceres seq_id 1602172 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2895 

- Ceres seq_id 1602173 

- Location of start within SEQ ID NO 2894: at 92 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Dehydrins 

- Location within SEQ ID NO 2895: from 21 to 123 aa. 



{Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2896 

- Ceres seq_id 1602174 

- Location of start within SEQ ID NO 2894: at 180 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Mucin-like glycoprotein 

- Location within SEQ ID NO 2896: from 1 to 102 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 28 97 

- Ceres seq_id 1602175 

- Location of start within SEQ ID NO 2894: at 225 nt. 

{C} Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Mucin-like glycoprotein 

- Location within SEQ ID NO 2897: from 1 to 87 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

339134 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 28 98 

- Ceres seq_id 1602183 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 28 99 

- Ceres seq_id 1602184 

- Location of start within SEQ ID NO 2898: at 75 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 1331 

- gi No. 2292978 

- Description: (Y10253) pantoate--beta-alanine ligase [Oryza 



- % Identity: 84.7 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 2899: from 6 to 115 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2900 

- Ceres seq_id 1602185 

- Location of start within SEQ ID NO 28 98: at 132 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 1332 

- gi No. 2292978 

- Description: (Y10253) pantoate--beta-alanine ligase [Oryza 



sativa] 



sativa] 



- % Identity: 84.7 

- Alignment Length: 111 
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- Location of Alignment in SEQ ID NO 2900: from 1 to 96 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 901 

- Ceres seq_id 1602186 

- Location of start within SEQ ID NO 2898: at 192 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

{Dp) Related Amino Acid Sequences 

- Alignment No. 1333 

- gi No. 2292978 

- Description: (Y10253) pantoate--beta-alanine ligase [Oryza 

sativa] 

- % Identity: 84,7 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 2901: from 1 to 76 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339139 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2902 

- Ceres seq_id 1602187 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 903 

- Ceres seq_id 1602188 

- Location of start within SEQ ID NO 2902: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 2903: from 66 to 158 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1334 

- gi No. 1208642 

- Description: (U47378) histone H3 [Amphicarpa bracteata] 

- % Identity: 76.9 

- Alignment Length: 39 

- Location of Alignment in SEQ ID NO 2903: from 7 6 to 114 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2904 

- Ceres seq_id 1602189 

- Location of start within SEQ ID NO 2902: at 65 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Core histone H2A/H2B/H3/H4 

- Location within SEQ ID NO 2904: from 45 to 137 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1335 

- gi No. 1208642 

- Description: (U47378) histone H3 [Amphicarpa bracteata] 

- % Identity: 7 6.9 

- Alignment Length: 39 

- Location of Alignment in SEQ ID NO 2904: from 55 to 93 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

339150 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2905 

- Ceres seq__id 1602190 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2906 

- Ceres seq_id 1602191 

- Location of start within SEQ ID NO 2905: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Hsp70 protein 

- Location within SEQ ID NO 2906: from 1 to 160 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 133 6 

- gi No. 1708311 

- Description: CHLOROPLAST STROMA 7 0 KD HEAT SHOCK-RELATED PROTEIN 
>gi I 170094 (M99565) 80 kDa heat shock protein [Spinacia oleracea] 

- % Identity: 95.7 

- Alignment Length: 161 

- Location of Alignment In SEQ ID NO 2906: from 1 to 160 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339196 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2907 

- Ceres seq__id 1602192 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2908 

- Ceres seq_id 1602193 

- Location of start within SEQ ID NO 2907: at 176 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1337 

- gi No. 1042189 

- Description: T2=testis-specif ic pro-protamine [Loligo 
pealeii=squids, testis chromatin, Peptide, 79 aa] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 2908: from 55 to 65 

Maximum Length Sequence: 

related to: 
Clone IDs; 

339267 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2909 

- Ceres seq_id 1602198 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2910 

- Ceres seq_id 1602199 

- Location of start within SEQ ID NO 2909: at 1 nt. 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2911 

- Ceres seq_id 1602200 

- Location of start within SEQ ID NO 2909: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- KH domain 

- Location within SEQ ID NO 2911: from 77 to 125 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2912 

- Ceres seq^id 1602201 

- Location of start within SEQ ID NO 2909: at 105 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- KH domain 

- Location within SEQ ID NO 2912: from 43 to 91 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339459 

{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2913 

- Ceres seq_id 1602209 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2914 

- Ceres seq_id 1602210 

- Location of start within SEQ ID NO 2913: at 181 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1338 

- gi No. 3913427 

- Description: S-ADENOSYLMETHIONINE DECARBOXYLASE PROENZYME 
(ADOMETDC) (SAMDC) >gi | 1532073 j emb ! CAA69075 | (Y077 67 ) S-adenosylmethionine 
decarboxylase [Zea mays] 

- % Identity: 79.6 

- Alignment Length: 98 

- Location of Alignment in SEQ ID NO 2914: from 1 to 54 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2915 

- Ceres seq_id 1602211 

- Location of start within SEQ ID NO 2913: at 196 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1339 
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- gi No. 3913427 

- Description: S-ADENOSYLMETHIONINE DECARBOXYLASE PROENZYME 
(ADOMETDC) { SAMDC) >gi j 153207 3 | emb | CAA69075 [ (Y07767) S-adenosylmethionine 
decarboxylase [Zea mays] 

- % Identity: 79.6 

- Alignment Length: 98 

- Location of Alignment in SEQ ID NO 2915: from 1 to 4 9 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2916 

- Ceres seq_id 1602212 

- Location of start within SEQ ID NO 2913: at 202 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1340 

- gi No. 3913427 

- Description: S-ADENOSYLMETHIONINE DECARBOXYLASE PROENZYME 
(ADOMETDC) (SAMDC) >gi | 1532073 | emb | CAA69075 | (Y07767) S-adenosylmethionine 
decarboxylase [Zea mays] 

- % Identity: 7 9.6 

- Alignment Length: 98 

- Location of Alignment in SEQ ID NO 2916: from 1 to 47 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339483 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2917 

- Ceres seq_id 1602213 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2918 

- Ceres seq_id 1602214 

- Location of start within SEQ ID NO 2917: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2919 

- Ceres seq_id 1602215 

- Location of start within SEQ ID NO 2917: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Protamine PI 

- Location within SEQ ID NO 2919: from 48 to 87 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2920 

- Ceres seq_id 1602216 

- Location of start within SEQ ID NO 2917: at 137 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Protamine Pi 
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- Location within SEQ ID NO 2920: from 3 to 42 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

339573 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2921 

- Ceres seq__id 1602221 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 922 

- Ceres seq_id 1602222 

- Location of start within SEQ ID NO 2921: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2923 

- Ceres seq_id 1602223 

- Location of start within SEQ ID NO 2921: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1341 

- gi No. 4063746 

- Description: (AC005851) nodulin-like protein [Arabidopsis 

thaliana] 

- % Identity: 74.6 

- Alignment Length: 142 

- Location of Alignment in SEQ ID NO 2923: from 1 to 141 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2924 

- Ceres seq_id 1602224 

- Location of start within SEQ ID NO 2921: at 95 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1342 

- gi No. 4063746 

- Description: (AC005851) nodulin-like protein [Arabidopsis 

thaliana] 

- % Identity: 74.6 

- Alignment Length: 142 

- Location of Alignment in SEQ ID NO 2924: from 1 to 110 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339627 

{Ac) cDNA Polynucleotide Sequence 

- Pat. Appln, SEQ ID NO 2925 

- Ceres seq_id 1602229 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2926 
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- Ceres seq_id 1602230 

- Location of start within SEQ ID NO 2925: at 104 nt ♦ 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 3' exoribonuclease family 

- Location within SEQ ID NO 2926: from 7 to 137 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 927 

- Ceres seq_id 1602231 

- Location of start within SEQ ID NO 2925: at 107 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 3 f exoribonuclease family 

- Location within SEQ ID NO 2927: from 6 to 136 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2928 

- Ceres seq_id 1602232 

- Location of start within SEQ ID NO 2925: at 129 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

339735 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2929 

- Ceres seq_id 1602233 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2930 

- Ceres seq__id 1602234 

- Location of start within SEQ ID NO 2929: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1343 

- gi No. 3901299 

- Description: (AF095636) urease accessory protein UreE [Yersinia 

pestis] 

- % Identity: 73.7 

- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 2930: from 96 to 114 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2931 

- Ceres seq_id 1602235 

- Location of start within SEQ ID NO 2929: at 90 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 134 4 

- gi No. 3901299 

- Description: (AF095636) urease accessory protein UreE [Yersinia 

pestis ] 

- % Identity: 73.7 

- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 2 931: from 67 to 85 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 932 

- Ceres seq_id 1602236 

- Location of start within SEQ ID NO 2929: at 183 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1345 

- gi No. 3901299 

- Description: (AF095636) urease accessory protein UreE [Yersinia 

pestis ] 

- % Identity: 73.7 

- Alignment Length: 19 

- Location of Alignment in SEQ ID NO 2932: from 36 to 54 

Maximum Length Sequence: 

related to: 
Clone IDs: 

338135 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2933 

- Ceres seq_id 1602239 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2934 

- Ceres seq_id 1602240 

- Location of start within SEQ ID NO 2933: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 134 6 

- gi No. 3036812 

- Description: (AL022373) ATM-like protein [Arabidopsis thaliana] 

- % Identity: 77.1 

- Alignment Length: 157 

- Location of Alignment in SEQ ID NO 2934: from 1 to 156 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2935 

- Ceres seq^id 1602242 

- Location of start within SEQ ID NO 2933: at 87 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1347 

- gi No. 3036812 

- Description: (AL022373) ATM-like protein [Arabidopsis thaliana] 

- % Identity: 77.1 

- Alignment Length: 157 

- Location of Alignment in SEQ ID NO 2935: from 1 to 128 
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Maximum Length Sequence : 

related to: 
Clone IDs: 

338274 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2936 

- Ceres seq_id 1602253 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2937 

- Ceres seq_id 1602254 

- Location of start within SEQ ID NO 2936: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2938 

- Ceres seq_id 1602255 

- Location of start within SEQ ID NO 2936: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- 7 transmembrane receptor (rhodopsin family) 

- Location within SEQ ID NO 2 938: from 9 to 7 6 aa. 

{Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2939 

- Ceres seq__id 1602256 

- Location of start within SEQ ID NO 2936: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

338401 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2940 

- Ceres seq_id 1602264 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 941 

- Ceres seq_id 1602265 

- Location of start within SEQ ID NO 2940: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Adhesion lipoprotein 

- Location within SEQ ID NO 2941: from 53 to 117 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1348 

- gi No. 1644232 

- Description: (D67066) N-WASP [Bos taurus] 

- % Identity: 71.4 

- Alignment Length: 21 



Attorney Docket No. 2750-1237P Table 1 

Client Docket No. 80146.003 Page 611 

- Location of Alignment in SEQ ID NO 2941: from 98 to 117 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 94 2 

- Ceres seq_id 1602266 

- Location of start within SEQ ID NO 2940: at 102 nt . 

{C} Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Adhesion lipoprotein 

- Location within SEQ ID NO 2942: from 20 to 84 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 134 9 

- gi No. 1644232 

- Description: (D67066) N-WASP [Bos taurus] 

- % Identity: 71.4 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 2942: from 65 to 84 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2943 

- Ceres seq_id 1602267 

- Location of start within SEQ ID NO 2940: at 117 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Adhesion lipoprotein 

- Location within SEQ ID NO 2943: from 15 to 7 9 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1350 

- gi No. 1644232 

- Description: (D67066) N-WASP [Bos taurus] 

- % Identity: 71.4 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 2943: from 60 to 7 9 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339027 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2944 

- Ceres seq_id 1602284 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2945 

- Ceres seq__id 1602285 

- Location of start within SEQ ID NO 2944: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

{Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2946 

- Ceres seq_id 1602286 

- Location of start within SEQ ID NO 2944: at 3 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 1351 

- gi No. 1914851 

- Description: (U92454) WW domain binding protein 5; WBP5 [Mus 

musculus ] 

- % Identity: 72.2 

- Alignment Length: 18 

- Location of Alignment in SEQ ID NO 2946: from 17 to 34 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 94 7 

- Ceres seq_id 1602287 

- Location of start within SEQ ID NO 2944: at 213 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- SRF-type transcription factor (DNA-binding and dimerisation 

domain) 

- Location within SEQ ID NO 2947: from 1 to 59 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1352 

- gi No. 939785 

- Description: (L46400) MADS box protein [Zea mays] 

- % Identity: 98.8 

- Alignment Length: 84 

- Location of Alignment in SEQ ID NO 2947: from 1 to 83 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339291 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2 94 8 

- Ceres seq_id 1602302 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 94 9 

- Ceres seq_id 1602303 

- Location of start within SEQ ID NO 2948: at 3 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1353 

- gi No. 232183 

- Description: GLYCINE-RICH CELL WALL STRUCTURAL PROTEIN 2 
PRECURSOR >gi | 72322 |pir M KNRZG2 glycine-rich cell wall structural protein 2 
precursor - rice >gi | 20245 | emb I CAA38315 | (X54449) Glycine-rich protein [Oryza 
sativa] >gi(1167557 (U40708) 

- % Identity: 70.6 

- Alignment Length: 17 

- Location of Alignment in SEQ ID NO 294 9: from 106 to 122 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2950 

- Ceres seq_id 1602304 

- Location of start within SEQ ID NO 2948: at 154 nt. 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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Maximum Length Sequence: 

related to: 
Clone IDs: 

339338 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2951 

- Ceres seq_id 1602305 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2952 

- Ceres seq_id 1602306 

- Location of start within SEQ ID NO 2951: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Ser/Thr protein phosphatase 

- Location within SEQ ID NO 2952: from 74 to 147 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1354 

- gi No. 5714762 

- Description: (AF173881) serine/threonine protein phosphatase 
PP2A-4 catalytic subunit [Oryza sativa subsp. indica] 

- % Identity: 83 

- Alignment Length: 8 9 

- Location of Alignment in SEQ ID NO 2952: from 61 to 147 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2953 

- Ceres seq_id 1602307 

- Location of start within SEQ ID NO 2951: at 182 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ser/Thr protein phosphatase 

- Location within SEQ ID NO 2953: from 14 to 87 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1355 

- gi No. 5714762 

- Description: (AF173881) serine/threonine protein phosphatase 
PP2A-4 catalytic subunit [Oryza sativa subsp. indica] 

- % Identity: 83 

- Alignment Length: 8 9 

- Location of Alignment in SEQ ID NO 2953: from 1 to 87 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2954 

- Ceres seq_id 1602308 

- Location of start within SEQ ID NO 2951: at 191 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ser/Thr protein phosphatase 

- Location within SEQ ID NO 2954: from 11 to 84 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 135 6 

- gi No. 5714762 

- Description: (AF173881) serine/threonine protein phosphatase 
PP2A-4 catalytic subunit [Oryza sativa subsp. indica] 
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- % Identity: 83 

- Alignment Length: 8 9 

- Location of Alignment in SEQ ID NO 2954: from 1 to 84 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339347 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2955 

- Ceres seq_id 1602309 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2956 

- Ceres seq_id 1602310 

- Location of start within SEQ ID NO 2955: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Cytochrome P4 50 

- Location within SEQ ID NO 2956: from 85 to 149 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2957 

- Ceres seq_id 1602311 

- Location of start within SEQ ID NO 2955: at 135 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Cytochrome P4 50 

- Location within SEQ ID NO 2957: from 41 to 105 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339407 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2958 

- Ceres seq_id 1602320 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2959 

- Ceres seq_id 1602321 

- Location of start within SEQ ID NO 2958; at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypept ide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2960 

- Ceres seq_id 1602322 

- Location of start within SEQ ID NO 2958: at 126 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Tubulin 

- Location within SEQ ID NO 2960: from 1 to 109 aa. 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 615 



(Dp) Related Amino Aoid Sequences 

- Alignment No. 1357 

- gi No. 398849 

- Description: (X74656) beta-5 tubulin [Zea mays] 

- % Identity: 91.7 

- Alignment Length: 10 9 

- Location of Alignment in SEQ ID NO 2960: from 1 to 109 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 961 

- Ceres seq_id 1602323 

- Location of start within SEQ ID NO 2958: at 208 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1358 

- gi No. 398849 

- Description: (X74656) beta-5 tubulin [Zea mays] 

- % Identity: 100 

- Alignment Length: 24 

- Location of Alignment in SEQ ID NO 2961: from 71 to 93 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339778 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2962 

- Ceres seq_id 1602324 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2963 

- Ceres seq_id 1602325 

- Location of start within SEQ ID NO 2962: at 68 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s } 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2964 

- Ceres seq_id 1602326 

- Location of start within SEQ ID NO 2962: at 203 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Plant lipid transfer protein family 

- Location within SEQ ID NO 2 964: from 1 to 64 aa. 



(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

339802 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2 965 

- Ceres seq_id 1602330 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2966 

- Ceres seq_id 1602331 
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- Location of start within SEQ ID NO 2965: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2967 

- Ceres seq_id 1602332 

- Location of start within SEQ ID NO 2965: at 98 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypept ide { s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2968 

- Ceres seq__id 1602333 

- Location of start within SEQ ID NO 2965: at 121 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- Collagen triple helix repeat (20 copies) 

- Location within SEQ ID NO 2968: from 43 to 90 aa. 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

339821 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2 9 69 

- Ceres seq_id 1602334 
{ B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 97 0 

- Ceres seq_id 1602335 

- Location of start within SEQ ID NO 2969: at 3 nt . 

{C} Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1359 

- gi No. 1706260 

- Description: CYSTEINE PROTEINASE 1 PRECURSOR 

>gi 1 2118131 ipir f i S59597 cysteine proteinase 1 precursor - maize 
>gi| 643597 idbj IBAA08244 | (D45402) cysteine proteinase [Zea mays] 

- % Identity: 99 

- Alignment Length: 98 

- Location of Alignment in SEQ ID NO 297 0: from 37 to 133 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339859 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2971 

- Ceres seq_id 1602346 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 97 2 

- Ceres seq_id 1602347 
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- Location of start within SEQ ID NO 2971: at 1 nt . 

{C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 13 60 

- gi No. 4091080 

- Description: (AF045571) nucleic acid binding protein [Oryza 



- % Identity: 89.2 

- Alignment Length: 102 

- Location of Alignment in SEQ ID NO 2 972: from 54 to 155 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2973 

- Ceres seq_id 1602348 

- Location of start within SEQ ID NO 2971: at 154 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 1361 

- gi No. 4091080 

- Description: (AF045571) nucleic acid binding protein [Oryza 



- % Identity: 89.2 

- Alignment Length: 102 

- Location of Alignment in SEQ ID NO 2973: from 3 to 104 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 297 4 

- Ceres seq_id 1602349 

- Location of start within SEQ ID NO 2971: at 241 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 1362 

- gi No. 4091080 

- Description: (AF045571) nucleic acid binding protein [Oryza 



- % Identity: 89.2 

- Alignment Length: 102 

- Location of Alignment in SEQ ID NO 2974: from 1 to 75 



Maximum Length Sequence: 

related to: 
Clone IDs: 

339927 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2975 

- Ceres seq_id 1602360 
(B) Polypeptide Sequence 



- Pat. Appln. SEQ ID NO 2976 

- Ceres seq_id 1602361 

- Location of start within SEQ ID NO 2975: at 3 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S7e 

- Location within SEQ ID NO 2976: from 35 to 159 aa. 



sativa] 



sativa] 



sativa] 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 1 
Page 618 



(Dp) Related Amino Acid Sequences 

- Alignment No. 13 63 

- gi No. 4588906 

~ Description: (AF118149) ribosomal protein S7 [Secale cereale] 

- % Identity: 92.4 

- Alignment Length: 132 

- Location of Alignment in SEQ ID NO 2976: from 29 to 159 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2977 

- Ceres seq_id 1602362 

- Location of start within SEQ ID NO 2975: at 87 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide £s) 

- Ribosomal protein S7e 

- Location within SEQ ID NO 2977: from 7 to 131 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1364 

- gi No. 4588906 

- Description: (AF118149) ribosomal protein S7 [Secale cereale] 

- % Identity: 92.4 

- Alignment Length: 132 

- Location of Alignment in SEQ ID NO 2977: from 1 to 131 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 978 

- Ceres seq_id 1602363 

- Location of start within SEQ ID NO 2975: at 240 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein S7e 

- Location within SEQ ID NO 2978: from 1 to 80 aa. 

{Dp) Related Amino Acid Sequences 

- Alignment No. 1365 

- gi No. 4588906 

- Description: {AF118149) ribosomal protein S7 [Secale cereale] 

- % Identity: 92.4 

- Alignment Length: 132 

- Location of Alignment in SEQ ID NO 2978: from 1 to 80 

Maximum Length Sequence : 

related to: 
Clone IDs: 

339928 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2979 

- Ceres seq_id 1602364 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2980 

- Ceres seq__id 1602365 

- Location of start within SEQ ID NO 2979: at 64 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L14 

- Location within SEQ ID NO 2980: from 19 to 140 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 1366 

- gi No. 730536 

- Description: 60S RIBOSOMAL PROTEIN L23 >gi 1310933 (L18915) 60S 
ribosomal protein subunit L17 [Nicotiana tabacum] 

- % Identity: 95*7 

- Alignment Length: 14 0 

- Location of Alignment in SEQ ID NO 2980: from 1 to 140 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2981 

- Ceres seq_id 1602366 

- Location of start within SEQ ID NO 2979: at 109 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribosomal protein L14 

- Location within SEQ ID NO 2981: from 4 to 125 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1367 

- gi No. 730536 

- Description: 60S RIBOSOMAL PROTEIN L23 >gi 1310933 (L18915) 60S 
ribosomal protein subunit L17 [Nicotiana tabacum] 

- % Identity: 95.7 

- Alignment Length: 14 0 

- Location of Alignment in SEQ ID NO 2981: from 1 to 125 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2 982 

- Ceres seq__id 1602367 

- Location of start within SEQ ID NO 2979: at 241 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L14 

- Location within SEQ ID NO 2982: from 1 to 81 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1368 

- gi No. 730536 

- Description: 60S RIBOSOMAL PROTEIN L23 >gi 1310933 (L18915) 60S 
ribosomal protein subunit L17 [Nicotiana tabacum] 

- % Identity: 95.7 

- Alignment Length: 140 

- Location of Alignment in SEQ ID NO 2982: from 1 to 81 

Maximum Length Sequence: 

related to: 
Clone IDs: 

339985 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2983 

- Ceres seq_id 1602370 
{B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2984 

- Ceres seq_id 1602371 

- Location of start within SEQ ID NO 2983: at 79 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 14-3-3 proteins 

- Location within SEQ ID NO 2984: from 8 to 138 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 13 69 

- gi No. 1345588 

- Description: 14-3-3-LIKE PROTEIN GF14-12 >gi I 998432 | bbs f 1 64524 
GF14-12=GRF2 product /14-3-3 protein homolog [Zea mays, XL80, Peptide, 261 aa] 

- % Identity: 100 

- Alignment Length: 138 

- Location of Alignment in SEQ ID NO 2984: from 1 to 138 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2985 

- Ceres seq_id 1602372 

- Location of start within SEQ ID NO 2983: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 14-3-3 proteins 

- Location within SEQ ID NO 2985: from 1 to 125 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1370 

- gi No. 1345588 

- Description: 14-3-3-LIKE PROTEIN GF14-12 >gi i 998 432 1 bbs I 164524 
GF14-12-GRF2 product /14-3-3 protein homolog [Zea mays, XL80, Peptide, 261 aa] 

- % Identity: 100 

- Alignment Length: 138 

- Location of Alignment in SEQ ID NO 2985: from 1 to 125 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2986 

- Ceres seq_id 1602373 

- Location of start within SEQ ID NO 2983: at 157 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- 14-3-3 proteins 

- Location within SEQ ID NO 298 6: from 1 to 112 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 1371 

- gi No. 1345588 

- Description: 14-3-3-LIKE PROTEIN GF14-12 >gi I 998 432 i bbs j 164524 
GF14-12=GRF2 product/14-3-3 protein homolog [Zea mays, XL80, Peptide, 261 aa] 

- % Identity: 100 

- Alignment Length: 138 

- Location of Alignment in SEQ ID NO 2986: from 1 to 112 



Maximum Length Sequence: 

related to: 
Clone IDs: 

340215 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2987 

- Ceres seq_id 1602389 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2988 
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- Ceres seq_id 1602390 

- Location of start within SEQ ID NO 2987: at 118 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Actin 

- Location within SEQ ID NO 2988: from 1 to 92 aa. 

(Dp} Related Amino Acid Sequences 

- Alignment No. 1372 

- gi No. 231503 

- Description: ACTIN 97 >gi 1 100417 | pir (| S20098 actin - potato 
>gi|21544|emb|CAA39280| (X55751) actin [Solarium tuberosum] 

- % Identity: 95 

- Alignment Length: 119 

- Location of Alignment in SEQ ID NO 2988: from 1 to 92 

Maximum Length Sequence; 

related to: 
Clone IDs: 

340271 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 298 9 

- Ceres seq_id 1602391 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2990 

- Ceres seq_id 1602392 

- Location of start within SEQ ID NO 2989: at 1 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- UDP-glucose/GDP-mannose dehydrogenase family 

- Location within SEQ ID NO 2990: from 2 to 152 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 137 3 

- gi No. 1518540 

- Description: (U53418) UDP-glucose dehydrogenase [Glycine max] 

- % Identity: 90.1 

- Alignment Length: 152 

- Location of Alignment in SEQ ID NO 2990: from 2 to 152 

Maximum Length Sequence: 

related to: 
Clone IDs: 

340278 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2991 

- Ceres seq_id 1602394 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2992 

- Ceres seq_id 1602395 

- Location of start within SEQ ID NO 2991: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ADP-ribosylation factor family 

- Location within SEQ ID NO 2992: from 49 to 118 aa. 



(Dp) Related Amino Acid Sequences 
- Alignment No. 137 4 
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- gi No. 1351974 

- Description: ADP-RIBOSYLATION FACTOR >gi [ 1076788 |pir i I S49325 ADP 
ribosylation factor - maize >gi J 107 678 9 | pir || S5348 6 ADP-ribosylation factor 
maize >gi j 55 668 6 i emb i CAA5 6351 | (X80042) ADP-ribosylation factor [Zea mays] 

- % Identity: 97.8 

- Alignment Length: 4 5 

- Location of Alignment in SEQ ID NO 2992: from 48 to 92 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2993 

- Ceres seq_id 1602396 

- Location of start within SEQ ID NO 2991: at 144 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ADP-ribosylation factor family 

- Location within SEQ ID NO 2993: from 2 to 71 aa. 

{Dp) Related Amino Acid Sequences 

- Alignment No. 1375 

- gi No. 1351974 

- Description: ADP-RIBOSYLATION FACTOR >gi { 1076788 ( pir | f S4 9325 ADP 
ribosylation factor - maize >gi i 107 6789 ] pir I 1 S5348 6 ADP-ribosylation factor 
maize >gi | 55 668 6 1 emb j CAA5 6351 j (X80042) ADP-ribosylation factor [Zea mays] 

- % Identity: 97.8 

- Alignment Length: 45 

- Location of Alignment in SEQ ID NO 2993: from 1 to 45 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2994 

- Ceres seq_id 1602397 

- Location of start within SEQ ID NO 2991: at 195 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- ADP-ribosylation factor family 

- Location within SEQ ID NO 2994: from 1 to 54 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 137 6 

- gi No. 1351974 

- Description: ADP-RIBOSYLATION FACTOR >gi | 107 6788 j pir ! 1 S4 9325 ADP 
ribosylation factor - maize >gi i 107 67 8 9 | pir | 1 S5348 6 ADP-ribosylation factor 
maize >gi j 55 668 6 ( emb ( CAA5 6351 f (X80042) ADP-ribosylation factor [Zea mays] 

- % Identity: 97.8 

- Alignment Length: 4 5 

- Location of Alignment in SEQ ID NO 2994: from 1 to 28 

Maximum Length Sequence : 

related to: 
Clone IDs: 

340286 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2995 

- Ceres seq_id 1602398 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2996 

- Ceres seq_id 1602399 

- Location of start within SEQ ID NO 2995: at 1 nt . 
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(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2997 

- Ceres seq_id 1602400 

- Location of start within SEQ ID NO 2995: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 137 7 

- gi No. 5733872 

- Description: (AC007932) Similar to gi | 4982048 ribosomal protein 
L18 from Thermotoga maritima genome gb|AE001798. ESTs gbiZ35613, gb|T75951 
gb|T22182, gb|T45962, gb[H76281 and gb|AI100025 come from this gene. 
[Arabidopsis thaliana] 

- % Identity: 70 

- Alignment Length: 90 

- Location of Alignment in SEQ ID NO 2997: from 60 to 148 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 2998 

- Ceres seq_id 1602401 

- Location of start within SEQ ID NO 2995: at 68 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1378 

- gi No. 5733872 

- Description: (AC007932) Similar to gi 14982048 ribosomal protein 
L18 from Thermotoga maritima genome gb[AE001798. ESTs gb|Z35613, gb|T75951 
gbjT22182, gb|T45962, gb|H76281 and gb{AI100025 come from this gene. 
[Arabidopsis thaliana] 

- % Identity: 70 

- Alignment Length: 90 

- Location of Alignment in SEQ ID NO 2998: from 38 to 126 

Maximum Length Sequence : 

related to: 
Clone IDs: 

340429 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 2999 

- Ceres seq_id 1602402 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3000 

- Ceres seq_id 1602403 

- Location of start within SEQ ID NO 2999: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein LI 4 

- Location within SEQ ID NO 3000: from 38 to 144 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 137 9 

- gi No. 730536 
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- Description: 60S RIBOSOMAL PROTEIN L23 >gi( 310933 (L18915) 60S 
ribosomal protein subunit L17 [Nicotiana tabacum] 

- % Identity: 95.2 

- Alignment Length: 12 6 

- Location of Alignment in SEQ ID NO 3000: from 20 to 144 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3001 

- Ceres seq___id 1602404 

- Location of start within SEQ ID NO 2999: at 59 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribosomal protein L14 

- Location within SEQ ID NO 3001: from 19 to 125 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 138 0 

- gi No. 730536 

- Description: 60S RIBOSOMAL PROTEIN L23 >gi } 310933 (L18915) 60S 
ribosomal protein subunit L17 [Nicotiana tabacum] 

- % Identity: 95.2 

- Alignment Length: 126 

- Location of Alignment in SEQ ID NO 3001: from 1 to 125 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3002 

- Ceres seq_id 1602405 

- Location of start within SEQ ID NO 2999: at 104 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L14 

- Location within SEQ ID NO 3002: from 4 to 110 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1381 

- gi No. 730536 

- Description: 60S RIBOSOMAL PROTEIN L23 >gi| 310933 (L18915) 60S 
ribosomal protein subunit LI 7 [Nicotiana tabacum] 

- % Identity: 95.2 

- Alignment Length: 12 6 

- Location of Alignment in SEQ ID NO 3002: from 1 to 110 

Maximum Length Sequence: 

related to: 
Clone IDs: 

340963 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3003 

- Ceres seq_id 1602411 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3004 

- Ceres seq_id 1602412 

- Location of start within SEQ ID NO 3003: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 



(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 3005 

- Ceres seq_id 1602413 

- Location of start within SEQ ID NO 3003: at 187 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EF hand 

- Location within SEQ ID NO 3005; from 48 to 7 6 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1382 

- gi No. 20186 

- Description: (X65016) calmodulin [Oryza sativa] 

>gi | 333 695 0 1 emb 1 CAA7 4307 | (Y13974) calmodulin [Zea mays] >gi|4103961 
(AF030034) calmodulin [Phaseolus vulgaris] 

- % Identity: 100 

- Alignment Length: 99 

- Location of Alignment in SEQ ID NO 3005: from 1 to 99 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3006 

- Ceres seq_id 1602414 

- Location of start within SEQ ID NO 3003: at 295 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

- EF hand 

- Location within SEQ ID NO 3006: from 12 to 40 aa . 

{Dp} Related Amino Acid Sequences 

- Alignment No. 1383 

- gi No. 20186 

- Description: (X65016) calmodulin [Oryza sativa] 

>gi[ 3336950 1 emb iCAA74307j (Y13974) calmodulin [Zea mays] >gi 1 4103961 
(AF030034) calmodulin [Phaseolus vulgaris] 

- % Identity: 100 

- Alignment Length: 99 

- Location of Alignment in SEQ ID NO 300 6: from 1 to 63 

Maximum Length Sequence: 

related to: 
Clone IDs: 

340964 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3007 

- Ceres seq_id 1602415 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3008 

- Ceres seq_id 1602416 

- Location of start within SEQ ID NO 3007: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 138 4 

- gi No. 1350720 

- Description: 60S RIBOSOMAL PROTEIN L32 

- % Identity: 81.1 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 3008: from 24 to 128 
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(B) Polypeptide Sequence 

- Pat, Appln. SEQ ID NO 3009 

- Ceres seq_id 1602417 

- Location of start within SEQ ID NO 3007: at 66 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s ) 

(Dp) Related Amino Acid Sequences 



- Alignment No. 1385 

- gi No. 1350720 

- Description: 60S RIBOSOMAL PROTEIN L32 

- % Identity: 81.1 

- Alignment Length: 111 

- Location of Alignment in SEQ ID NO 3009: from 3 to 107 



Maximum Length Sequence: 

related to: 
Clone IDs: 

341230 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3010 

- Ceres seq_id 1602422 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3011 

- Ceres seq_id 1602423 

- Location of start within SEQ ID NO 3010: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypept ide { s ) 

- Ribosomal protein Lll 

- Location within SEQ ID NO 3011: from 46 to 103 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 138 6 

- gi No. 3986695 

- Description: (AF101423) ribosomal protein L12 [Cichorium intybus] 

- % Identity: 94.3 

- Alignment Length: 7 0 

- Location of Alignment in SEQ ID NO 3011: from 34 to 103 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3012 

- Ceres seq__id 1602424 

- Location of start within SEQ ID NO 3010: at 100 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

- Ribosomal protein Lll 

- Location within SEQ ID NO 3012: from 13 to 70 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1387 

- gi No. 3986695 

- Description: (AF101423) ribosomal protein L12 [Cichorium intybus] 

- % Identity: 94.3 

- Alignment Length: 7 0 

- Location of Alignment in SEQ ID NO 3012: from 1 to 7 0 



Maximum Length Sequence: 

related to: 
Clone IDs: 
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341243 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3013 

- Ceres seq_id 1602425 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3014 

- Ceres seq__id 1602426 

- Location of start within SEQ ID NO 3013: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 138 8 

- gi No. 2239260 

- Description: (Y13734) cinnamoyl CoA reductase [Zea mays] 

- % Identity: 99.3 

- Alignment Length: 150 

- Location of Alignment in SEQ ID NO 3014: from 17 to 165 

Maximum Length Sequence: 

related to: 
Clone IDs: 

340798 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3015 

- Ceres seq__id 1602435 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3016 

- Ceres seq_id 1602436 

- Location of start within SEQ ID NO 3015: at 17 9 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1389 

- gi No. 417298 

- Description: MFS18 PROTEIN PRECURSOR >gi 1 10 087 2 1 pir | | S25103 MFS18 
protein - maize >gi I 22647 | emb ! CAA47738 i (X67324) MFS18 [Zea mays] 

- % Identity: 96.9 

- Alignment Length: 9 6 

- Location of Alignment in SEQ ID NO 3016: from 1 to 95 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3017 

- Ceres seq_id 1602437 

- Location of start within SEQ ID NO 3015: at 197 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1390 

- gi No. 417298 

- Description: MFS18 PROTEIN PRECURSOR >gi | 100872 j pir i | S25103 MFS18 
protein - maize >gi f 22 647 [ emb | CAA47738 | (X67324) MFS18 [Zea mays] 

- % Identity: 96.9 

- Alignment Length: 96 

- Location of Alignment in SEQ ID NO 3017: from 1 to 89 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3018 

- Ceres seq_id 1602438 
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- Location of start within SEQ ID NO 3015: at 200 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1391 

- gi No. 417298 

- Description; MFS18 PROTEIN PRECURSOR >gi | 100872 j pir | | S25103 MFS18 
protein - maize >gi | 22647 | emb | CAA47738 | (X67324) MFS18 [Zea mays] 

- % Identity: 96.9 

- Alignment Length: 96 

- Location of Alignment in SEQ ID NO 3018: from 1 to 88 

Maximum Length Sequence: 

related to: 
Clone IDs; 

340982 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3019 

- Ceres seq_id 1602439 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3020 

- Ceres seq_id 1602440 

- Location of start within SEQ ID NO 3019: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s } 

- AP2 domain 

- Location within SEQ ID NO 3020: from 56 to 113 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1392 

- gi No. 4099914 

- Description: (U91857) ethylene-responsive element binding protein 
homolog [Stylosanthes hamata] 

- % Identity: 71.6 

- Alignment Length: 67 

- Location of Alignment in SEQ ID NO 3020; from 48 to 114 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3021 

- Ceres seq_id 1602441 

- Location of start within SEQ ID NO 3019: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3022 

- Ceres seq_id 1602442 

- Location of start within SEQ ID NO 3019: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp} Related Amino Acid Sequences 

Maximum Length Sequence; 

related to: 
Clone IDs: 

341315 
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(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3023 

- Ceres seq_id 1602446 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3024 

- Ceres seq_id 1602447 

- Location of start within SEQ ID NO 3023: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 13 93 

- gi No. 349259 

- Description: (L22849) HAHB-2 [Helianthus annuus] 

- % Identity: 72 

- Alignment Length: 25 

- Location of Alignment in SEQ ID NO 3024: from 112 to 136 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3025 

- Ceres seq_id 1602448 

- Location of start within SEQ ID NO 3023: at 99 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1394 

- gi No. 349259 

- Description: (L22849) HAHB-2 [Helianthus annuus] 

- % Identity: 72 

- Alignment Length: 25 

- Location of Alignment in SEQ ID NO 3025: from 80 to 104 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 302 6 

- Ceres seq_id 1602449 

- Location of start within SEQ ID NO 3023: at 184 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence: 

related to: 
Clone IDs: 

341321 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3027 

- Ceres seq_id 1602450 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3028 

- Ceres seq_id 1602451 

- Location of start within SEQ ID NO 3027: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1395 

- gi No. 5456971 

- Description: (AB024366) 0RF1 [TT virus] 

- % Identity: 71.4 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 3028: from 104 to 117 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3029 

- Ceres seq_id 1602452 

- Location of start within SEQ ID NO 3027: at 2 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3030 

- Ceres seq_id 1602453 

- Location of start within SEQ ID NO 3027: at 115 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) 'Related Amino Acid Sequences 

- Alignment No. 1396 

- gi No. 5456971 

- Description: (AB024366) ORF1 [TT virus] 

- % Identity: 71.4 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 3030: from 66 to 7 9 

Maximum Length Sequence: 

related to: 
Clone IDs: 

341328 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3031 

- Ceres seq_id 1602454 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3032 

- Ceres seq_id 1602455 

- Location of start within SEQ ID NO 3031: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1397 

- gi No. 262713 

- Description: (S51472) Gag-JunD fusion protein [JDV virus, 
chickens, Peptide Recombinant Partial, 32 aa, segment 2 of 2] [JDV vi 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 3032: from 66 to 7 6 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3033 

- Ceres seq_id 1602456 

- Location of start within SEQ ID NO 3031: at 194 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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Maximum Length Sequence: 
related to: 
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Clone IDs; 

341354 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3034 

- Ceres seq_id 1602457 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 30 35 

- Ceres seq_id 1602458 

- Location of start within SEQ ID NO 3034: at 243 nt. 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 3035: from 4 to 76 aa . 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3036 

- Ceres seq_id 1602459 

- Location of start within SEQ ID NO 3034: at 303 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

- Eukaryotic protein kinase domain 

- Location within SEQ ID NO 3036: from 1 to 56 aa. 
(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3037 

- Ceres seq_id 16024 60 

- Location of start within SEQ ID NO 3034: at 336 nt . 

(C) Nomination and Annotation of Domains within Predicted 

Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

Maximum Length Sequence : 

related to: 
Clone IDs: 

341374 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3038 

- Ceres seq_id 1602461 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3039 

- Ceres seq_id 1602462 

- Location of start within SEQ ID NO 3038: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1398 

- gi No. 3252813 

- Description: (AC004705) vacuolar sorting receptor-like protein 
[Arabidopsis thaliana] >gi 1 3810586 (AC005398) vacuolar sorting receptor-li 
protein [Arabidopsis thaliana] 

- % Identity: 82.8 

- Alignment Length: 163 

- Location of Alignment in SEQ ID NO 3039: from 3 to 164 
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(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3040 

- Ceres seq_id 1602463 

- Location of start within SEQ ID NO 3038: at 59 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 13 99 

- gi No. 3252813 

- Description: (AC004705) vacuolar sorting receptor-like protein 
[Arabidopsis thaliana] >gi 13810586 (AC005398) vacuolar sorting receptor-like 
protein [Arabidopsis thaliana] 

- % Identity: 82.8 

- Alignment Length: 163 

- Location of Alignment in SEQ ID NO 3040: from 1 to 145 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3041 

- Ceres seq__id 1602464 

- Location of start within SEQ ID NO 3038: at 143 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1400 

- gi No. 3252813 

- Description: (AC004705) vacuolar sorting receptor-like protein 
[Arabidopsis thaliana] >gi 13810586 (AC005398) vacuolar sorting receptor-like 
protein [Arabidopsis thaliana] 

- % Identity: 82.8 

- Alignment Length: 163 

- Location of Alignment in SEQ ID NO 3041: from 1 to 117 

Maximum Length Sequence: 

related to: 
Clone IDs: 

341540 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3042 

- Ceres seq_id 1602476 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3043 

- Ceres seq_id 1602477 

- Location of start within SEQ ID NO 3042: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 304 4 

- Ceres seq_id 1602478 

- Location of start within SEQ ID NO 3042: at 110 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L34e 

- Location within SEQ ID NO 3044: from 5 to 103 aa . 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 1401 

- gi No. 730557 

- Description; 60S RIBOSOMAL PROTEIN L34 >gi | 2119150 | pir | | S6047 6 
ribosomal protein L34 - garden pea >gi| 498908 (U10047) ribosomal protein L34 
homolog [Pisum sativum] 

- % Identity: 90 

- Alignment Length: 120 

- Location of Alignment in SEQ ID NO 3044: from 1 to 119 

Maximum Length Sequence: 

related to: 
Clone IDs: 

341552 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3045 

- Ceres seq_id 1602479 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 304 6 

- Ceres seq_id 1602480 

- Location of start within SEQ ID NO 3045: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3047 

- Ceres seq_id 1602481 

- Location of start within SEQ ID NO 3045: at 41 nt . 



(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1402 

- gi No. 2150000 

- Description: (AF000939) aleurone ribonuclease [Hordeum vulgare] 

- % Identity: 71.4 

- Alignment Length: 7 0 

- Location of Alignment in SEQ ID NO 3047; from 1 to 69 



Maximum Length Sequence; 

related to: 
Clone IDs: 

341646 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3048 

- Ceres seq_id 1602482 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3049 

- Ceres seq_id 1602483 

- Location of start within SEQ ID NO 3048: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- CoA-ligases 

- Location within SEQ ID NO 3049: from 13 to 66 aa. 



(Dp) Related Amino Acid Sequences 

- Alignment No. 14 03 

- gi No. 1711572 
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- Description: SUCCINYL-COA LIGASE (GDP-FORMING) , ALPHA-CHAIN 
PRECURSOR (SUCCINYL-COA SYNTHETASE, ALPHA CHAIN) (SCS-ALPHA) 

>gi j 1076415 Ipir M S30579 succinate — CoA ligase (GDP-f orming) (EC 6.2.1.4) 
alpha chain - Arabidopsis thaliana (fragment) 

- % Identity: 94.4 

- Alignment Length: 5 4 

- Location of Alignment in SEQ ID NO 304 9: from 13 to 66 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3050 

- Ceres seq_id 1602484 

- Location of start within SEQ ID NO 3048: at 48 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- CoA-ligases 

- Location within SEQ ID NO 3050: from 1 to 51 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1404 

- gi No. 1711572 

- Description: SUCCINYL-COA LIGASE (GDP-FORMING), ALPHA-CHAIN 
PRECURSOR (SUCCINYL-COA SYNTHETASE, ALPHA CHAIN) (SCS-ALPHA) 

>gi I 1076415 jpir | | S30579 succinate — CoA ligase (GDP-f orming) (EC 6.2.1.4) 
alpha chain - Arabidopsis thaliana (fragment) 

- % Identity: 94.4 

- Alignment Length: 5 4 

- Location of Alignment in SEQ ID NO 3050: from 1 to 51 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3051 

- Ceres seq_id 1602485 

- Location of start within SEQ ID NO 3048: at 287 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1405 

- gi No. 1711572 

- Description: SUCCINYL-COA LIGASE (GDP-FORMING) , ALPHA-CHAIN 
PRECURSOR (SUCCINYL-COA SYNTHETASE, ALPHA CHAIN) (SCS-ALPHA) 

>gi | 1076415 Ipir | [S30579 succinate— CoA ligase (GDP-f orming) (EC 6.2.1.4) 
alpha chain - Arabidopsis thaliana (fragment) 

- % Identity: 81 

- Alignment Length: 21 

- Location of Alignment in SEQ ID NO 3051: from 36 to 56 

Maximum Length Sequence: 

related to: 
Clone IDs: 

341691 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3052 

- Ceres seq__id 1602486 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3053 

- Ceres seq_id 1602487 

- Location of start within SEQ ID NO 3052: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3054 

- Ceres seq_id 1602488 

- Location of start within SEQ ID NO 3052: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EF hand 

- Location within SEQ ID NO 3054: from 97 to 125 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 14 06 

- gi No. 115511 

- Description: CALMODULIN >gi | 231682 | sp | P2 9 612 1 CALM__ORYSA 
CALMODULIN >gi j 7 1 68 2 | pir | i MCBH calmodulin - barley >gi I 100666 [ pir | 1 S24952 
calmodulin 1 (clone lambda DASH) - rice >gi i 20188 1 emb | CAA78287 | (Z12827) 
calmodulin [Oryza sativa] 

- % Identity: 100 

- Alignment Length: 113 

- Location of Alignment in SEQ ID NO 3054: from 50 to 161 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3055 

- Ceres seq_id 1602489 

- Location of start within SEQ ID NO 3052: at 150 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- EF hand 

- Location within SEQ ID NO 3055: from 48 to 76 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1407 

- gi No. 115511 

- Description: CALMODULIN >gi | 231682 f sp I P29612 | CALM__ORYSA 
CALMODULIN >gi | 7 1 68 2 | pir M MCBH calmodulin - barley >gi i 100666 | pir I i S24 952 
calmodulin 1 (clone lambda DASH) - rice >gi I 20188 ! emb i CAA78287 1 (Z12827) 
calmodulin [Oryza sativa] 

- % Identity: 100 

- Alignment Length: 113 

- Location of Alignment in SEQ ID NO 3055: from 1 to 112 

Maximum Length Sequence: 

related to: 
Clone IDs: 

341909 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3056 

- Ceres seq__id 1602494 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3057 

- Ceres seq_id 1602495 

- Location of start within SEQ ID NO 3056: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 3057: from 27 to 107 aa. 
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(Dp) Related Amino Acid Sequences 

- Alignment No. 14 08 

- gi No. 2497904 

- Description: METALLOTHIONEIN-LIKE PROTEIN TYPE 2 >gi | 1667588 
(U77294) metallothionein-like protein [Oryza sativa] 

>gi I 2326785 t errib | CAA69845 | (Y08529) metallothionein-like protein [Oryza 
sativa] >gi 14097338 (U57638) metallothionein-like protein sativa] 

- % Identity: 71.4 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 3057: from 27 to 108 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3058 

- Ceres seq_id 1602496 

- Location of start within SEQ ID NO 3056: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 14 0 9 

- gi No. 296594 

- Description: (X68600) pZE40 [Hordeum vulgare] 

- % Identity: 70.6 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 3058: from 26 to 62 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 305 9 

- Ceres seq_id 1602497 

- Location of start within SEQ ID NO 3056: at 79 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Metallothionein 

- Location within SEQ ID NO 3059: from 1 to 81 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1410 

- gi No. 2497904 

- Description: METALLOTHIONEIN-LIKE PROTEIN TYPE 2 >gi | 1667588 
(U77294) metallothionein-like protein [Oryza sativa] 

>gij 2326785 i emb j CAA69845 ! (Y08529) metallothionein-like protein [Oryza 
sativa] >gi I 4097338 (U57638) metallothionein-like protein sativa] 

- % Identity: 71.4 

- Alignment Length: 8 6 

- Location of Alignment in SEQ ID NO 305 9: from 1 to 82 

Maximum Length Sequence: 

related to: 
Clone IDs: 

342082 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 30 60 

- Ceres seq_id 1602503 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3061 

- Ceres seq_id 1602504 

- Location of start within SEQ ID NO 3060: at 3 nt • 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 
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- Ribosomal protein L7Ae 

- Location within SEQ ID NO 3061: from 68 to 160 aa. 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1411 

- gi No. 4826860 

- Description: ref j NP_004 9 99 . 1 i pNHP2Ll | non-histone chromosome 
protein 2 (S. cerevisiae ) -like 1 >gi [2500345 ispl P5 57691 NHPX__HUMAN NHP2/RS 6 
FAMILY PROTEIN YEL02 6W HOMOLOG (HIGH MOBILITY GROUP-LIKE NUCLEAR PROTEIN 2 
>gi I 3859990 (AF091076) OTK27 [Homo sapiens] 

- % Identity: 82.4 

- Alignment Length: 10 8 

- Location of Alignment in SEQ ID NO 30 61: from 53 to 160 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3062 

- Ceres seq__id 1602505 

- Location of start within SEQ ID NO 3060: at 156 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Ribosomal protein L7Ae 

- Location within SEQ ID NO 3062: from 17 to 109 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1412 

- gi No. 4826860 

- Description: ref 1 NP_004999 . 1 | pNHP2Ll | non-histone chromosome 
protein 2 (S. cerevisiae ) -like 1 >gi | 2500345 | sp | P557 69 | NHPX^HUMAN NHP2/RS6 
FAMILY PROTEIN YEL02 6W HOMOLOG (HIGH MOBILITY GROUP-LIKE NUCLEAR PROTEIN 2 
>gi | 3859990 (AF091076) OTK27 [Homo sapiens] 

- % Identity: 82.4 

- Alignment Length: 108 

- Location of Alignment in SEQ ID NO 3062: from 2 to 109 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3063 

- Ceres seq_id 1602506 

- Location of start within SEQ ID NO 3060: at 210 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s ) 

- Ribosomal protein L7Ae 

- Location within SEQ ID NO 3063: from 1 to 91 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1413 

- gi No. 4826860 

- Description: ref | NP_004 999 . 1 | pNHP2Ll | non-histone chromosome 
protein 2 (S. cerevisiae) -like 1 >gi ! 2500345 i sp | P55769 I NHPX_HUMAN NHP2/RS6 
FAMILY PROTEIN YEL02 6W HOMOLOG (HIGH MOBILITY GROUP-LIKE NUCLEAR PROTEIN 2 
>gi 13859990 (AF091076) OTK27 [Homo sapiens] 

- % Identity: 82.4 

- Alignment Length: 108 

- Location of Alignment in SEQ ID NO 3063: from 1 to 91 

Maximum Length Sequence: 

related to: 
Clone IDs: 

312786 

(Ac) cDNA Polynucleotide Sequence 
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- Pat. Appln. SEQ ID NO 30 64 

- Ceres seq__id 1602514 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3065 

- Ceres seq_id 1602515 

- Location of start within SEQ ID NO 3064: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3066 

- Ceres seq_id 1602516 

- Location of start within SEQ ID NO 3064: at 98 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide ( s ) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1414 

- gi No. 465445 

- Description: PROBABLE NUCLEAR ANTIGEN >gi | 418708 | pir M B45344 
probable nuclear antigen - suid herpesvirus 1 (strain Kaplan) >gi 1334072 
(M34 651) ORF-3 protein [ Pseudorabies virus] 

- % Identity: 78.6 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 3066: from 27 to 40 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 30 67 

- Ceres seq_id 1602517 

- Location of start within SEQ ID NO 3064: at 125 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1415 

- gi No. 465445 

- Description: PROBABLE NUCLEAR ANTIGEN >gi 1 4 187 0 8 ! pir | | B4534 4 
probable nuclear antigen - suid herpesvirus 1 (strain Kaplan) >gi 1334072 
(M34 651) ORF-3 protein [Pseudorabies virus] 

- % Identity: 78.6 

- Alignment Length: 14 

- Location of Alignment in SEQ ID NO 3067: from 18 to 31 

Maximum Length Sequence: 

related to: 
Clone IDs: 

319174 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3068 

- Ceres seq_id 1602518 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3069 

- Ceres seq_id 1602519 

- Location of start within SEQ ID NO 3068: at 1 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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{ B ) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3070 

- Ceres seq_id 1602520 

- Location of start within SEQ ID NO 3068: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

{Dp) Related Amino Acid Sequences 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3071 

- Ceres seq_id 1602521 

- Location of start within SEQ ID NO 3068: at 3 nt. 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide { s } 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1416 

- gi No. 3810850 

- Description: (AL032 684) proline-rich protein [ Schizosaccharomyce 

pombe] 

- % Identity: 72.7 

- Alignment Length: 11 

- Location of Alignment in SEQ ID NO 3071: from 37 to 47 

Maximum Length Sequence: 

related to: 
Clone IDs: 

341463 

(Ac) cDNA Polynucleotide Sequence 

- Pat. Appln. SEQ ID NO 3072 

- Ceres seq_id 1602530 
(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3073 

- Ceres seq_id 1602531 

- Location of start within SEQ ID NO 3072: at 2 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

- Transketolase 

- Location within SEQ ID NO 3073: from 68 to 150 aa . 

(Dp) Related Amino Acid Sequences 

- Alignment No. 1417 

- gi No. 3851003 

- Description: (AF069910) pyruvate dehydrogenase El beta subunit 
isoform 3 [Zea mays] 

- % Identity: 98.7 

- Alignment Length: 151 

- Location of Alignment in SEQ ID NO 3073: from 1 to 150 

(B) Polypeptide Sequence 

- Pat. Appln. SEQ ID NO 3074 

- Ceres seq_id 1602532 

- Location of start within SEQ ID NO 3072: at 3 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 
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(B) Polypeptide Sequence 
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- Pat. Appln. SEQ ID NO 3075 

- Ceres seq__id 1602533 

- Location of start within SEQ ID NO 3072: at 21 nt . 

(C) Nomination and Annotation of Domains within Predicted 
Polypeptide (s) 

(Dp) Related Amino Acid Sequences 



o 
m 
m 

m 

a 
o 

» 

o 

u 
o 

O 
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(ii) 
(ix) 



(xi) 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 870 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..87 0 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580171 
v ._, SEQUENCE DESCRIPTION: SEQ ID NO:l: 
attataactt ggatcagatt cttcttctcc tctaaacaac aacaaagtct ctcttccttc 
tgtctttcac atttcctctt cccctaaaaa cagaactcaa aaccaaaaca aaatgactcg 
ctcctctaga ttccttggta cggcgtcgcc tccaccaccg gaagaaatcc tcgccgccga 
aaccgacatg gttgtgattc tctcagcact tctctgcgct ctcgtatgcg taccggttta 
gccgccgtag ctcgctgcgc ttggctccgc cgtctcaccg gagttaatcc cgccgccgtc 
ggagaagctc caccgcctaa caaaggcctc aagaaaaaag cacttcaagc tctccctaaa 
tcaacttaca ccgcctctgc ctctaccgcc gccgccgctg acgatctacc gtgctcttca 
gtcggagacg gagattcatc caccgagtgt gcaatatgca taacggaatt ttcggagggc 
gaagaaatca gaatcttacc gctgtgtagt cacgccttcc acgtggcgtg tatagacaag 
tggttgactt cacggtcgtc gtgtccgtct tgtcgtcgga tattggtgcc ggtgaagtgt 
gataggtgtg ggcaccatgc ttcgacggcg gagactcagg tcaaagatca acctcctcat 
caccaacatc cttcacagtt cacttccgcc attattcctg cttttcttcc ttgaaatgtt 
ttcactttat taaattttta taatttgcta aaaatatcgg catgattaaa agtaaagtat 
taattaatta ggaattagta atctttagtt aagtactttt atatttctcc tctgtaatcc 
tctgttctga attaatcaga tgaaatttcg 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 162 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..162 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580172 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
Met Arg Thr Gly Leu Ala Ala Val Ala Arg Cys Ala Trp Leu Arg Arg 
15 10 15 

Leu Thr Gly Val Asn Pro Ala Ala Val Gly Glu Ala Pro Pro Pro Asn 

20 25 30 

Lys Gly Leu Lys Lys Lys Ala Leu Gin Ala Leu Pro Lys Ser Thr Tyr 

35 40 45 

Thr Ala Ser Ala Ser Thr Ala Ala Ala Ala Asp Asp Leu Pro Cys Ser 

50 55 60 

Ser Val Gly Asp Gly Asp Ser Ser Thr Glu Cys Ala He Cys He Thr 
65 70 75 80 

Glu Phe Ser Glu Gly Glu Glu He Arg He Leu Pro Leu Cys Ser His 

85 90 95 

Ala Phe His Val Ala Cys He Asp Lys Trp Leu Thr Ser Arg Ser Ser 

100 105 HO 

Cys Pro Ser Cys Arg Arg He Leu Val Pro Val Lys Cys Asp Arg Cys 

115 120 125 

Gly His His Ala Ser Thr Ala Glu Thr Gin Val Lys Asp Gin Pro Pro 

130 135 140 

His His Gin His Pro Ser Gin Phe Thr Ser Ala He He Pro Ala Phe 
145 150 155 160 

Leu Pro 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
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(2) INFORMATION FOR SEQ ID NO: 3: 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 662 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 
MOLECULE TYPE: 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



DNA (genomic) 



ID 1580240 



(xi) 



1. . 662 

(D) OTHER INFORMATION: / Ceres Seq. 
SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
aagagacaaa agtgattgat tggtcagaaa aaaatggaag cggcgaagaa acagagtgtt 
acaaatcagc ttctcgccgt gaaatcagct tccggcaaga cttttagcca gttagcggcg 
gagacaggtc taaccaacgt atacgtagct cagcttctcc gtcgtcaagc tcagctcaaa 
cccgacacag tcccaaagct taagaaagct ttaccggctc tgaacgatga actaatcgga 
gctatgatgt ctccaccgtg gagatcctac gatcctaatc tcatccaaga acccactatc 
tacaggttga atgaagcagt gatgcatttt ggtgagagta taaaggagat tatcaatgaa 
gattttggag atggcatcat gtcggcgata gatttctatt gctctgtcga caaaatcaaa 
ggagtggatg gtaacaatcg cgtggttgtg acgcttgatg gaaagtatct ttcgcattcc 
gaacagagga cggagaatat ggtctcaagg ctaaatctca agggaggtac aagcgaatga 
taagaaagcc tttacgtatc catgaaggcc ttattgtaag tggtaacgtt gtaataccta 
tgtgtttgtt tatctgtaat atatgcaact tcagcatcta gattaaaagc tgtttcaggt 
tg 

INFORMATION FOR SEQ ID NO : 4 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 168 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
peptide 



(2) 



(ii) 
(ix) 



(xi) 



MOLECULE TYPE: 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 
1. . 168 

(D) OTHER INFORMATION: / Ceres Seq. 
SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



ID 1580241 



Met 


Glu 


Ala 


Ala 


Lys 


Lys 


Gin 


Ser 


Val 


Thr 


Asn 


Gin 


Leu 


Leu 


Ala 


Val 


1 








5 










10 










15 




Lys 


Ser 


Ala 


Ser 


Gly 


Lys 


Thr 


Phe 


Ser 


Gin 


Leu 


Ala 


Ala 


Glu 


Thr 


Gly 






20 










25 










30 






Leu 


Thr 


Asn 


Val 


Tyr 


Val 


Ala 


Gin 


Leu 


Leu 


Arg 


Arg 


Gin 


Ala 


Gin 


Leu 






35 










40 










45 








Lys 


Pro 


Asp 


Thr 


Val 


Pro 


Lys 


Leu 


Lys 


Lys 


Ala 


Leu 


Pro 


Ala 


Leu 


Asn 




50 










55 










60 










Asp 


Glu 


Leu 


He 


Gly 


Ala 


Met 


Met 


Ser 


Pro 


Pro 


Trp 


Arg 


Ser 


Tyr 


Asp 


65 










70 










75 










80 


Pro 


Asn 


Leu 


He 


Gin 


Glu 


Pro 


Thr 


He 


Tyr 


Arg 


Leu 


Asn 


Glu 


Ala 


Val 










85 










90 










95 




Met 


His 


Phe 


Gly 


Glu 


Ser 


He 


Lys 


Glu 


He 


He 


Asn 


Glu 


Asp 


Phe 


Gly 








100 










105 










110 






Asp 


Gly 


He 


Met 


Ser 


Ala 


He 


Asp 


Phe 


Tyr 


Cys 


Ser 


Val 


Asp 


Lys 


He 






115 










120 










125 








Lys 


Gly 


Val 


Asp 


Gly 


Asn 


Asn 


Arg 


Val 


Val 


Val 


Thr 


Leu 


Asp 


Gly 


Lys 




130 










135 










140 










Tyr 


Leu 


Ser 


His 


Ser 


Glu 


Gin 


Arg 


Thr 


Glu 


Asn 


Met 


Val 


Ser 


Arg 


Leu 


145 










150 










155 










160 


Asn 


Leu 


Lys 


Gly 


Gly 


Thr 


Ser 


Glu 



















165 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 98 amino acids 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..98 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580242 
{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



Met 


Met 


Ser 


Pro 


Pro 


Trp 


Arg 


Ser 


Tyr 


Asp 


Pro 


Asn 


Leu 


He 


Gin 


Glu 


1 








5 










10 










15 




Pro 


Thr 


He 


Tyr 


Arg 


Leu 


Asn 


Glu 


Ala 


Val 


Met 


His 


Phe 


Gly 


Glu 


Ser 








20 










25 










30 






He 


Lys 


Glu 


He 


He 


Asn 


Glu 


Asp 


Phe 


Gly 


Asp 


Gly 


He 


Met 


Ser 


Ala 






35 










40 










45 








He 


Asp 


Phe 


Tyr 


Cys 


Ser 


Val 


Asp 


Lys 


He 


Lys 


Gly 


Val 


Asp 


Gly 


Asn 




50 










55 










60 










Asn 


Arg 


Val 


Val 


Val 


Thr 


Leu 


Asp 


Gly 


Lys 


Tyr 


Leu 


Ser 


His 


Ser 


Glu 


65 










70 










75 










80 


Gin 


Arg 


Thr 


Glu 


Asn 


Met 


Val 


Ser 


Arg 


Leu 


Asn 


Leu 


Lys 


Gly 


Gly 


Thr 



85 90 95 



Ser Glu 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 97 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 



(A) NAME/ KEY : peptide 

(B) LOCATION: 1 . . 97 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580243 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 6: 












Met 


Ser 


Pro 


Pro 


Trp 


Arg 


Ser 


Tyr 


Asp 


Pro 


Asn 


Leu 


He 


Gin 


Glu 


Pro 


1 








5 










10 










15 




Thr 


He 


Tyr 


Arg 


Leu 


Asn 


Glu 


Ala 


Val 


Met 


His 


Phe 


Gly 


Glu 


Ser 


He 








20 










25 










30 






Lys 


Glu 


He 


He 


Asn 


Glu 


Asp 


Phe 


Gly 


Asp 


Gly 


He 


Met 


Ser 


Ala 


He 






35 










40 










45 








Asp 


Phe 


Tyr 


Cys 


Ser 


Val 


Asp 


Lys 


He 


Lys 


Gly 


Val 


Asp 


Gly Asn 


Asn 




50 










55 










60 










Arg 


Val 


Val 


Val 


Thr 


Leu 


Asp 


Gly 


Lys 


Tyr 


Leu 


Ser 


His 


Ser 


Glu 


Gin 


65 










70 










75 










80 


Arg 


Thr 


Glu 


Asn 


Met 


Val 


Ser 


Arg 


Leu 


Asn 


Leu 


Lys 


Gly 


Gly 


Thr 


Ser 



85 90 95 



Glu 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1621 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..1621 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580263 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
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ctccatagtt ccagttctga aatctcactt 
tggcagagct ctcaaccccc aaaacgacgt 
gtctctcttc aaaattgcac ctttcaaacc 
acacaacaac tcccaactcc aaaatctctt 
ttgctgtgca agaaaatgga ttggtgaaga 
tcacctatga tcttaaagct gaagaagaga 
cagtttcagg tgctgcagga atgatttcta 
aagtatttgg tccagatcaa cccattgcat 
aagctcttga aggtgttgca atggaactgg 
ttgatatagg aacagatcca aatgaagtgt 
gagcaaaacc tcgaggccct ggaatggaac 
tctttgctga gcagggcaaa gctctgaaca 
tagtgggaaa cccttgcaac accaatgcct 
ctgcaaagaa cttccatgcc ctcacgaggt 
ctcttaaagc cggtgttttc tatgacaaag 
ccacgactca ggtgccagac ttcttaaatg 
ttattacaga tcacaaatgg ttagaagagg 
ggttattaat tcagaaatgg ggtcgatctt 
atgctataaa gtctcttgta actcctactc 
acacggatgg aaatccttat ggtattgaag 
cgaagggaga tggagattat gaacttgtca 
aacgaatcgc caagtcggaa gcggaactgt 
ctggagaggg cattgcctac tgtgatcttg 
tttgattttt gcaggacgtt gaacatctca 
tacagagcac agccacatta cttatgatga 
ttcttattta catgcatctg tatgtgattt 
agtagtattt gtaaacactt gaaacgtttc 
t 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 43 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
<ix) FEATURE: 

(A) NAME / KEY : peptide 

(B) LOCATION: 1..443 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580264 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 



Met 


Ala 


Met 


Ala 


Glu 


Leu 


Ser 


Thr 


Pro 


Lys 


Thr 


Thr 


Ser 


Pro 


Phe 


Leu 


1 








5 










10 










15 




Asn 


Ser 


Ser 


Ser 


Arg 


Leu 


Arg 


Leu 


Ser 


Ser 


Lys 


Leu 


His 


Leu 


Ser 


Asn 








20 










25 










30 






His 


Phe 


Arg 


His 


Leu 


Leu 


Leu 


Pro 


Pro 


Leu 


His 


Thr 


Thr 


Thr 


Pro 


Asn 






35 










40 










45 








Ser 


Lys 


He 


Ser 


Cys 


Ser 


Val 


Ser 


Gin 


Asn 


Ser 


Gin 


Ala 


Pro 


Val 


Ala 




50 










55 










60 










Val 


Gin 


Glu 


Asn 


Gly 


Leu 


Val 


Lys 


Thr 


Lys 


Lys 


Glu 


Cys 


Tyr 


Gly 


Val 


65 










70 










75 










80 


Phe 


Cys 


Leu 


Thr 


Tyr 


Asp 


Leu 


Lys 


Ala 


Glu 


Glu 


Glu 


Thr 


Arg 


Ser 


Trp 








85 










90 










95 




Lys 


Lys 


Leu 


He 


Asn 


He 


Ala 


Val 


Ser 


Gly 


Ala 


Ala 


Gly 


Met 


He 


Ser 




100 










105 










110 






Asn 


His 


Leu 


Leu 


Phe 


Lys 


Leu 


Ala 


Ser 


Gly 


Glu 


Val 


Phe 


Gly 


Pro 


Asp 






115 










120 










125 








Gin 


Pro 


He 


Ala 


Leu 


Lys 


Leu 


Leu 


Gly 


Ser 


Glu 


Arg 


Ser 


He 


Gin 


Ala 




130 










135 










140 










Leu 


Glu 


Gly 


Val 


Ala 


Met 


Glu 


Leu 


Glu 


Asp 


Ser 


Leu 


Phe 


Pro 


Leu 


Leu 


145 










150 










155 










160 


Arg 


Glu 


Val 


Asp 


He 


Gly 


Thr 


Asp 


Pro 


Asn 


Glu 


Val 


Phe 


Gin 


Asp 


Val 



tcacagtttg tgtctgtgcg ataatggcca 60 

cgccttttct caactcttcg tctcggcttc 120 

actttcgcca tcttcttctt ccacctctcc 180 

gctccgtttc tcaaaatagc caagctcctg 240 

cgaagaaaga gtgttatgga gtgttctgcc 300 

caagatcatg gaagaagtta attaatattg 360 

accatcttct cttcaaactt gcttcagggg 420 

tgaaactgct aggatcagag agatcaattc 480 

aggattcatt gttcccattg ttgagagaag 540 

tccaagatgt ggagtgggct attctgattg 600 

gtgctgactt gttggacatc aatggccaaa 660 

aagctgcctc tcctaacgtc aaggttcttg 720 

tgatttgtct taaaaatgct cccaacattc 780 

tagacgaaaa tcgtgccaaa tgccagcttg 840 

tgtctaatat gaccatatgg ggaaatcact 900 

ccagaattaa tggcctgcct gtgaaggagg 960 

gattcactga gagtgtgcag aagagaggtg 1020 

ctgctgcttc tactgctgtt tccattgttg 1080 

ctgagggtga ttggttttcg actggggtgt 1140 

agggccttgt cttcagtatg ctatgccggt 1200 

aggatgtaga aattgatgac taccttcgcc 1260 

tggctgagaa gagatgtgtt gcacacctca 1320 

gtccggtaga tactatgctt cctggggaag 1380 

agtaagcatt ctcttccggg ttgttagctg 1440 

ttgttcagaa taagaaaatg aaactcttat 1500 

ttcttgagca atgctccaaa agtcatatac 1560 

tatgctttat tccagtttca gaactcaaac 1620 
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165 










170 










175 




Glu 


Trp 


Ala 


He 


Leu 


He 


Gly 


Ala 


Lys 


Pro 


Arg 


Gly 


Pro 


Gly 


Met 


Glu 








180 










185 










190 






Arg 


Ala 


Asp 


Leu 


Leu 


Asp 


He 


Asn 


Gly 


Gin 


He 


Phe 


Ala 


Glu 


Gin 


Gly 






195 










200 










205 








Lys 


Ala 


Leu 


Asn 


Lys 


Ala 


Ala 


Ser 


Pro 


Asn 


Val 


Lys 


Val 


Leu 


Val 


Val 




210 










215 










220 










Gly 


Asn 


Pro 


Cys 


Asn 


Thr 


Asn 


Ala 


Leu 


He 


Cys 


Leu 


Lys 


Asn 


Ala 


Pro 


225 










230 










235 










240 


Asn 


He 


Pro 


Ala 


Lys 


Asn 


Phe 


His 


Ala 


Leu 


Thr 


Arg 


Leu 


Asp 


Glu 


Asn 










245 










250 










255 




Arg 


Ala 


Lys 


Cys 


Gin 


Leu 


Ala 


Leu 


Lys 


Ala 


Gly 


Val 


Phe 


Tyr 


Asp 


Lys 








260 










265 










270 






Val 


Ser 


Asn 


Met 


Thr 


He 


Trp 


Gly 


Asn 


His 


Ser 


Thr 


Thr 


Gin 


Val 


Pro 






275 










280 










285 








Asp 


Phe 


Leu 


Asn 


Ala 


Arg 


He 


Asn 


Gly 


Leu 


Pro 


Val 


Lys 


Glu 


Val 


He 




290 










295 










300 










Thr 


Asp 


His 


Lys 


Trp 


Leu 


Glu 


Glu 


Gly 


Phe 


Thr 


Glu 


Ser 


Val 


Gin 


Lys 


305 










310 










315 










320 


Arg 


Gly 


Gly 


Leu 


Leu 


He 


Gin 


Lys 


Trp 


Gly 


Arg 


Ser 


Ser 


Ala 


Ala 


Ser 










325 










330 










335 




Thr 


Ala 


Val 


Ser 


He 


Val 


Asp 


Ala 


He 


Lys 


Ser 


Leu 


Val 


Thr 


Pro 


Thr 








340 










345 










350 






Pro 


Glu 


Gly 


Asp 


Trp 


Phe 


Ser 


Thr 


Gly 


Val 


Tyr 


Thr 


Asp 


Gly 


Asn 


Pro 






355 










360 










365 








Tyr 


Gly 


He 


Glu 


Glu 


Gly 


Leu 


Val 


Phe 


Ser 


Met 


Leu 


Cys 


Arg 


Ser 


Lys 




370 










375 










380 










Gly Asp 


Gly 


Asp 


Tyr 


Glu 


Leu 


Val 


Lys 


Asp 


Val 


Glu 


He 


Asp 


Asp 


Tyr 


385 










390 










395 










400 


Leu 


Arg 


Gin 


Arg 


He 


Ala 


Lys 


Ser 


Glu 


Ala 


Glu 


Leu 


Leu 


Ala 


Glu 


Lys 










405 










410 










415 




Arg 


Cys 


Val 


Ala 


His 


Leu 


Thr 


Gly 


Glu 


Gly 


He 


Ala 


Tyr 


Cys 


Asp 


Leu 








420 










425 










430 






Gly 


Pro 


Val 


Asp 


Thr 


Met 


Leu 


Pro 


Gly 


Glu 


Val 













435 440 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 441 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..441 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580265 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



Met 


Ala 


Glu 


Leu 


Ser 


Thr 


Pro 


Lys 


Thr 


Thr 


Ser 


Pro 


Phe 


Leu 


Asn 


Ser 


1 








5 










10 










15 




Ser 


Ser 


Arg 


Leu 


Arg 


Leu 


Ser 


Ser 


Lys 


Leu 


His 


Leu 


Ser 


Asn 


His 


Phe 








20 










25 










30 






Arg 


His 


Leu 


Leu 


Leu 


Pro 


Pro 


Leu 


His 


Thr 


Thr 


Thr 


Pro 


Asn 


Ser 


Lys 




35 










40 










45 








He 


Ser 


Cys 


Ser 


Val 


Ser 


Gin 


Asn 


Ser 


Gin 


Ala 


Pro 


Val 


Ala 


Val 


Gin 




50 










55 










60 










Glu 


Asn 


Gly 


Leu 


Val 


Lys 


Thr 


Lys 


Lys 


Glu 


Cys 


Tyr 


Gly 


Val 


Phe 


Cys 


65 










70 










75 










80 


Leu 


Thr 


Tyr 


Asp 


Leu 


Lys 


Ala 


Glu 


Glu 


Glu 


Thr 


Arg 


Ser 


Trp 


Lys 


Lys 










85 










90 










95 




Leu 


He 


Asn 


He 


Ala 


Val 


Ser 


Gly 


Ala 


Ala 


Gly 


Met 


He 


Ser 


Asn 


His 



100 105 110 
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Leu 


Leu 


Phe 


Lys 


Leu 


Ala 


Ser 


Gly 


Glu 


Val 


Phe 


Gly 


Pro 


Asp 


Gin 


Pro 






115 










120 










125 








He 


Ala 


Leu 


Lys 


Leu 


Leu 


Gly 


Ser 


Glu 


Arg 


Ser 


He 


Gin 


Ala 


Leu 


Glu 




130 










135 










140 










Gly 


Val 


Ala 


Met 


Glu 


Leu 


Glu 


Asp 


Ser 


Leu 


Phe 


Pro 


Leu 


Leu 


Arg 


Glu 


145 










150 










155 










160 


Val 


Asp 


He 


Gly 


Thr 


Asp 


Pro 


Asn 


Glu 


Val 


Phe 


Gin 


Asp 


Val 


Glu 


Trp 










165 










170 










175 




Ala 


He 


Leu 


He 


Gly Ala 


Lys 


Pro 


Arg 


Gly 


Pro 


Gly 


Met 


Glu 


Arg 


Ala 








180 










185 










190 






Asp 


Leu 


Leu 


Asp 


He 


Asn 


Gly 


Gin 


He 


Phe 


Ala 


Glu 


Gin 


Gly 


Lys 


Ala 






195 










200 










205 








Leu 


Asn 


Lys 


Ala 


Ala 


Ser 


Pro 


Asn 


Val 


Lys 


Val 


Leu 


Val 


Val 


Gly 


Asn 




210 










215 










220 










Pro 


Cys 


Asn 


Thr 


Asn 


Ala 


Leu 


He 


Cys 


Leu 


Lys 


Asn 


Ala 


Pro 


Asn 


He 


225 








230 










235 










240 


Pro 


Ala 


Lys 


Asn 


Phe 


His 


Ala 


Leu 


Thr 


Arg 


Leu 


Asp 


Glu 


Asn 


Arg 


Ala 










245 










250 










255 




Lys 


Cys 


Gin 


Leu 


Ala 


Leu 


Lys 


Ala 


Gly 


Val 


Phe 


Tyr 


Asp 


Lys 


Val 


Ser 




260 










265 










270 






Asn 


Met 


Thr 


He 


Trp 


Gly 


Asn 


His 


Ser 


Thr 


Thr 


Gin 


Val 


Pro 


Asp 


Phe 






275 










280 










285 








Leu 


Asn 


Ala 


Arg 


He 


Asn 


Gly 


Leu 


Pro 


Val 


Lys 


Glu 


Val 


lie 


Thr 


Asp 




290 










295 










300 










His 


Lys 


Trp 


Leu 


Glu 


Glu 


Gly 


Phe 


Thr 


Glu 


Ser 


Val 


Gin 


Lys 


Arg 


Gly 


305 








310 










315 










320 


Gly 


Leu 


Leu 


He 


Gin 


Lys 


Trp 


Gly Arg 


Ser 


Ser 


Ala 


Ala 


Ser 


Thr 


Ala 










325 










330 










335 




Val 


Ser 


He 


Val 


Asp 


Ala 


He 


Lys 


Ser 


Leu 


Val 


Thr 


Pro 


Thr 


Pro 


Glu 








340 










345 










350 






Gly Asp 


Trp 


Phe 


Ser 


Thr 


Gly Val 


Tyr 


Thr 


Asp 


Gly 


Asn 


Pro 


Tyr 


Gly 






355 










360 










365 








He 


Glu 


Glu 


Gly 


Leu 


Val 


Phe 


Ser 


Met 


Leu 


Cys 


Arg 


Ser 


Lys 


Gly Asp 




370 










375 










380 










Gly 


Asp 


Tyr 


Glu 


Leu 


Val 


Lys 


Asp 


Val 


Glu 


He 


Asp 


Asp 


Tyr 


Leu 


Arg 


385 








390 










395 










400 


Gin 


Arg 


He 


Ala 


Lys 


Ser 


Glu 


Ala 


Glu 


Leu 


Leu 


Ala 


Glu 


Lys 


Arg 


Cys 








405 










410 










415 




Val 


Ala 


His 


Leu 


Thr 


Gly 


Glu 


Gly 


He 


Ala 


Tyr 


cys 


Asp 


Leu 


Gly 


Pro 








420 










425 










430 






Val 


Asp 


Thr 


Met 


Leu 


Pro 


Gly 


Glu 


Val 




















435 










440 


















(2) 


INFORMATION 


FOR 


SEQ 


ID : 


NO:10: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 334 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..334 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580266 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
Met He Ser Asn His Leu Leu Phe Lys Leu Ala Ser Gly Glu Val Phe 
15 10 15 

Gly Pro Asp Gin Pro He Ala Leu Lys Leu Leu Gly Ser Glu Arg Ser 

20 25 30 

He Gin Ala Leu Glu Gly Val Ala Met Glu Leu Glu Asp Ser Leu Phe 

35 40 45 

Pro Leu Leu Arg Glu Val Asp He Gly Thr Asp Pro Asn Glu Val Phe 
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50 55 60 

Gin Asp Val Glu Trp Ala lie Leu lie Gly Ala Lys Pro Arg Gly Pro 
65 70 75 80 

Gly Met Glu Arg Ala Asp Leu Leu Asp lie Asn Gly Gin lie Phe Ala 

85 90 95 

Glu Gin Gly Lys Ala Leu Asn Lys Ala Ala Ser Pro Asn Val Lys Val 

100 105 HO 

Leu Val Val Gly Asn Pro Cys Asn Thr Asn Ala Leu lie Cys Leu Lys 

115 120 125 

Asn Ala Pro Asn lie Pro Ala Lys Asn Phe His Ala Leu Thr Arg Leu 

130 135 140 

Asp Glu Asn Arg Ala Lys Cys Gin Leu Ala Leu Lys Ala Gly Val Phe 
145 150 155 160 

Tyr Asp Lys Val Ser Asn Met Thr lie Trp Gly Asn His Ser Thr Thr 

165 170 175 

Gin Val Pro Asp Phe Leu Asn Ala Arg lie Asn Gly Leu Pro Val Lys 

180 185 190 

Glu Val lie Thr Asp His Lys Trp Leu Glu Glu Gly Phe Thr Glu Ser 

195 200 205 

Val Gin Lys Arg Gly Gly Leu Leu lie Gin Lys Trp Gly Arg Ser Ser 

210 215 220 

Ala Ala Ser Thr Ala Val Ser lie Val Asp Ala lie Lys Ser Leu Val 
225 230 235 240 

Thr Pro Thr Pro Glu Gly Asp Trp Phe Ser Thr Gly Val Tyr Thr Asp 

245 250 255 

Gly Asn Pro Tyr Gly lie Glu Glu Gly Leu Val Phe Ser Met Leu Cys 

260 265 270 

Arg Ser Lys Gly Asp Gly Asp Tyr Glu Leu Val Lys Asp Val Glu lie 

275 280 285 

Asp Asp Tyr Leu Arg Gin Arg lie Ala Lys Ser Glu Ala Glu Leu Leu 

290 295 300 

Ala Glu Lys Arg Cys Val Ala His Leu Thr Gly Glu Gly lie Ala Tyr 
305 310 315 320 

Cys Asp Leu Gly Pro Val Asp Thr Met Leu Pro Gly Glu Val 

325 330 
(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 987 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1. .987 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580285 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
acaaattcac tttccacaat taatctctct ctctctttat ctccctcaca cctcagcctt 
ctgtgttcgt ctctatctct ctcccaaatc tcaattttgg actcgtctgc ttttgtttat 
ataaacactt ctcttcttct tcaatctcca cacaccacaa caaaaccacc taccttcttc 
gatttctcca gaaatcctcc ttcaagttca tactcatgga agaagctaaa gtggaagcaa 
aagatggtac catctccgta gcttctgcat tttccggtca ccaacaagct gttcacgata 
gcgaccataa attcctaacg caagctgttg aagaagctta caagggagtg gactgtggtg 
atggtggccc atttggtgcg gtgattgtgc ataataacga ggttgtcgct agttgccaca 
atatggtttt gaaatatact gacccaactg cacatgctga agtcaccgcc attagagagg 
catgtaagaa acttaacaaa atcgagttat cagaatgcga gatttatgca tcttgtgagc 
cgtgtcccat gtgcttcgga gccatccatc tctcgagact caagaggttg gtttatggag 
ccaaagccga agcagctata gccatcgggt ttgatgactt catagctgat gctttgagag 
gcacaggggt ttaccagaaa tctagtctgg agatcaagaa agctgacggg aatggcgctg 
cgattgcgga gcaagttttc cagaacacta aggagaagtt ccgtttatac tgaagcactt 
ctccattacc acattcttct tcttcttctc cttcatgatg gtaattagaa gtagtagaag 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
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aagaactgca cttgccttga ctaaaataaa tctctgtttc aaattgttca ttttaaaaac 
agaaggaaac gaaaaaaaaa aaaaaatcaa tttcggttct tgttgacacg atttgtaatt 
tcttttcttc atttgaattg aactggc 
(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 185 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..185 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580286 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



Met 


Glu 


Glu 


Ala 


Lys 


Val 


Glu 


Ala 


Lys 


Asp 


Gly 


Thr 


He 


Ser 


Val 


Ala 


1 








5 










10 










15 




Ser 


Ala 


Phe 


Ser 


Gly 


His 


Gin 


Gin 


Ala 


Val 


His 


Asp 


Ser 


Asp 


His 


Lys 








20 










25 










30 






Phe 


Leu 


Thr 


Gin 


Ala 


Val 


Glu 


Glu 


Ala 


Tyr 


Lys 


Gly 


Val 


Asp 


Cys 


Gly 






35 










40 










45 








Asp 


Gly 


Gly 


Pro 


Phe 


Gly Ala 


Val 


He 


Val 


His 


Asn 


Asn 


Glu 


Val 


Val 




50 










55 










60 










Ala 


Ser 


Cys 


His 


Asn 


Met 


Val 


Leu 


Lys 


Tyr 


Thr 


Asp 


Pro 


Thr 


Ala 


His 


65 








70 










75 










80 


Ala 


Glu 


Val 


Thr 


Ala 


He 


Arg 


Glu 


Ala 


Cys 


Lys 


Lys 


Leu 


Asn 


Lys 


He 










85 










90 










95 




Glu 


Leu 


Ser 


Glu 


Cys 


Glu 


He 


Tyr 


Ala 


Ser 


Cys 


Glu 


Pro 


Cys 


Pro 


Met 








100 










105 










110 






Cys 


Phe 


Gly 


Ala 


He 


His 


Leu 


Ser 


Arg 


Leu 


Lys 


Arg 


Leu 


Val 


Tyr 


Gly 




115 










120 










125 








Ala 


Lys 


Ala 


Glu 


Ala 


Ala 


He 


Ala 


He 


Gly 


Phe 


Asp 


Asp 


Phe 


He 


Ala 




130 










135 










140 










Asp 


Ala 


Leu 


Arg 


Gly 


Thr 


Gly 


Val 


Tyr 


Gin 


Lys 


Ser 


Ser 


Leu 


Glu 


He 


145 










150 










155 










160 


Lys 


Lys 


Ala 


Asp 


Gly 


Asn 


Gly 


Ala 


Ala 


He 


Ala 


Glu 


Gin 


Val 


Phe 


Gin 








165 










170 










175 




Asn 


Thr 


Lys 


Glu 


Lys 


Phe 


Arg 


Leu 


Tyr 

















180 185 
(2) INFORMATION FOR SEQ ID NO: 13: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..116 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580287 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
Met Val Leu Lys Tyr Thr Asp Pro Thr Ala His Ala Glu Val Thr Ala 
15 10 15 

He Arg Glu Ala Cys Lys Lys Leu Asn Lys He Glu Leu Ser Glu Cys 

20 25 30 

Glu He Tyr Ala Ser Cys Glu Pro Cys Pro Met Cys Phe Gly Ala He 

35 40 45 

His Leu Ser Arg Leu Lys Arg Leu Val Tyr Gly Ala Lys Ala Glu Ala 

50 55 60 

Ala He Ala He Gly Phe Asp Asp Phe He Ala Asp Ala Leu Arg Gly 
65 70 75 80 
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Thr Gly Val Tyr Gin Lys Ser Ser Leu Glu lie Lys Lys Ala Asp Gly 

85 90 95 

Asn Gly Ala Ala lie Ala Glu Gin Val Phe Gin Asn Thr Lys Glu Lys 

100 105 110 

Phe Arg Leu Tyr 
115 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1244 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1244 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580305 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
attcatcaaa tcctcaaaca acttcaagaa caaagagtgc ttttcgatgc ccaaaacaca 
ttaccataaa atggcattct cgaagggact catttttgcg atgatatttg cggttttagc 120 
gatagttaag ccctcagaag ctgctcttga tgctcattac tatgatcaat cgtgccctgc 180 
tgctgaaaaa atcatacttg aaactgttag gaatgctact ttgtatgatc ccaaagtgcc 
tgctcgtctc ctaagaatgt tcttccacga ttgcttcatc aggggatgtg atgcgtccat 
tctactagat tcgacacggt caaaccaagc tgagaaggat ggtcctccaa acatctcagt 
acggtcattt tacgtgatcg aagatgctaa gagaaagctc gaaaaggctt gtcctcgtac 
cgtgtcttgt gccgatgtaa tcgccattgc cgccagagat gtggtcaccc tgtccggtgg 
tccttattgg agcgtactga aagggcgaaa agacgggaca atttcacgag ccaacgagac 
tagaaatctc ccaccaccaa cgttcaatgt ttctcaactg atacaaagct ttgcagcaag 
aggcttgtcg gtgaaagaca tggttacgct ctcaggtggc cacaccatag ggttctctca 
ctgttcttct ttcgagtccc gtcttcaaaa cttcagcaaa ttccatgaca tagatccttc 
gatgaactat gcgttcgcgc aaaccttgaa aaagaaatgc ccgagaacat ctaaccgagg 
caagaacgca gggacagtct tggactctac ttcctcggtt ttcgacaatg tttactacaa 
gcaaattttg tccgggaaag gagtgtttgg gtctgatcag gcacttctag gcgattcccg 
gactaagtgg atcgttgaga cgtttgctca agaccaaaaa gctttcttca gagagtttgc 
agcttccatg gtaaaacttg gaaactttgg agtcaaagag actggacaag ttagagttaa 1020 
cactcgcttc gtcaactaaa agcataaccc taacaagcaa ccaaatgaga gttttctttt 1080 
cttcaaattt gatttcattt atacttgata ttataataat gtagaccggc caatatatgg 1140 
atcggtttta aaacctccta attctctatt gagttgcaga gtatgcttta aaagtaaatt 1200 
tgttctgctt cattgaactg atctataaat tcatcagcat tttc 
(2) INFORMATION FOR SEQ ID NO: 15: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 345 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1. .345 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580306 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
Phe lie Lys Ser Ser Asn Asn Phe Lys Asn Lys Glu Cys Phe Ser Met 
15 10 15 

Pro Lys Thr His Tyr His Lys Met Ala Phe Ser Lys Gly Leu lie Phe 

20 25 30 

Ala Met lie Phe Ala Val Leu Ala lie Val Lys Pro Ser Glu Ala Ala 

35 40 45 

Leu Asp Ala His Tyr Tyr Asp Gin Ser Cys Pro Ala Ala Glu Lys lie 

50 55 60 

lie Leu Glu Thr Val Arg Asn Ala Thr Leu Tyr Asp Pro Lys Val Pro 
65 70 75 80 



60 



240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
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Ala 


Arg 


Leu 


Leu 


Arg 


Met 


Phe 


Phe 


His 


Asp 


Cys 


Phe 


He 


Arg 


Gly 


Cys 








85 










90 










95 




Asp 


Ala 


Ser 


He 


Leu 


Leu 


Asp 


Ser 


Thr 


Arg 


Ser 


Asn 


Gin 


Ala 


Glu 


Lys 






100 










105 










110 






Asp 


Gly 


Pro 


Pro 


Asn 


He 


Ser 


Val 


Arg 


Ser 


Phe 


Tyr 


Val 


He 


Glu 


Asp 




115 










120 










125 








Ala 


Lys 


Arg 


Lys 


Leu 


Glu 


Lys 


Ala 


Cys 


Pro 


Arg 


Thr 


Val 


Ser 


Cys 


Ala 




130 










135 










140 










Asp 


Val 


He 


Ala 


He 


Ala 


Ala 


Arg 


Asp 


Val 


Val 


Thr 


Leu 


Ser 


Gly 


Gly 


145 










150 










155 










160 


Pro 


Tyr 


Trp 


Ser 


Val 


Leu 


Lys 


Gly 


Arg 


Lys 


Asp 


Gly 


Thr 


He 


Ser 


Arg 








165 










170 










175 




Ala 


Asn 


Glu 


Thr 


Arg 


Asn 


Leu 


Pro 


Pro 


Pro 


Thr 


Phe 


Asn 


Val 


Ser 


Gin 








180 










185 










190 






Leu 


He 


Gin 


Ser 


Phe 


Ala 


Ala 


Arg 


Gly 


Leu 


Ser 


Val 


Lys 


Asp 


Met 


Val 






195 










200 










205 








Thr 


Leu 


Ser 


Gly 


Gly 


His 


Thr 


He 


Gly 


Phe 


Ser 


His 


Cys 


Ser 


Ser 


Phe 




210 










215 










220 










Glu 


Ser 


Arg 


Leu 


Gin 


Asn 


Phe 


Ser 


Lys 


Phe 


His 


Asp 


He 


Asp 


Pro 


Ser 


225 








230 










235 










240 


Met 


Asn 


Tyr 


Ala 


Phe 


Ala 


Gin 


Thr 


Leu 


Lys 


Lys 


Lys 


Cys 


Pro 


Arg 


Thr 








245 










250 










255 




Ser 


Asn 


Arg 


Gly 


Lys 


Asn 


Ala 


Gly 


Thr 


Val 


Leu 


Asp 


Ser 


Thr 


Ser 


Ser 








260 










265 










270 






Val 


Phe 


Asp 


Asn 


Val 


Tyr 


Tyr 


Lys 


Gin 


He 


Leu 


Ser 


Gly 


Lys 


Gly 


Val 






275 










280 










285 








Phe 


Gly 


Ser 


Asp 


Gin 


Ala 


Leu 


Leu 


Gly 


Asp 


Ser 


Arg 


Thr 


Lys 


Trp 


He 




290 










295 










300 










Val 


Glu 


Thr 


Phe 


Ala 


Gin 


Asp 


Gin 


Lys 


Ala 


Phe 


Phe 


Arg 


Glu 


Phe 


Ala 


305 










310 










315 










320 


Ala 


Ser 


Met 


Val 


Lys 


Leu 


Gly 


Asn 


Phe 


Gly 


Val 


Lys 


Glu 


Thr 


Gly 


Gin 










325 










330 










335 




Val 


Arg 


Val 


Asn 


Thr 


Arg 


Phe 


Val 


Asn 






















340 










345 
















(2) 


INFORMATION 


FOR 


SEQ 


ID ] 


NO : 1 6 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 330 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
<ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1. .330 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580307 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



Met 


Pro 


Lys 


Thr 


His 


Tyr 


His 


Lys 


Met 


Ala 


Phe 


Ser 


Lys 


Gly 


Leu 


He 


1 






5 










10 










15 




Phe 


Ala 


Met 


He 


Phe 


Ala 


Val 


Leu 


Ala 


He 


Val 


Lys 


Pro 


Ser 


Glu 


Ala 








20 










25 










30 






Ala 


Leu 


Asp 


Ala 


His 


Tyr 


Tyr 


Asp 


Gin 


Ser 


Cys 


Pro 


Ala 


Ala 


Glu 


Lys 






35 










40 










45 








He 


He 


Leu 


Glu 


Thr 


Val 


Arg 


Asn 


Ala 


Thr 


Leu 


Tyr 


Asp 


Pro 


Lys 


Val 




50 










55 










60 










Pro 


Ala 


Arg 


Leu 


Leu 


Arg 


Met 


Phe 


Phe 


His 


Asp 


Cys 


Phe 


He 


Arg 


Gly 


65 








70 










75 










80 


Cys 


Asp 


Ala 


Ser 


He 


Leu 


Leu 


Asp 


Ser 


Thr 


Arg 


Ser 


Asn 


Gin 


Ala 


Glu 






85 










90 










95 




Lys 


Asp 


Gly 


Pro 


Pro 


Asn 


He 


Ser 


Val 


Arg 


Ser 


Phe 


Tyr 


Val 


He 


Glu 


100 










105 










110 






Asp 


Ala 


Lys 


Arg 


Lys 


Leu 


Glu 


Lys 


Ala 


Cys 


Pro 


Arg 


Thr 


Val 


Ser 


Cys 
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115 










120 










125 








Ala 


Asp 


Val 


He 


Ala 


He 


Ala 


Ala 


Arg 


Asp 


Val 


Val 


Thr 


Leu 


Ser 


Gly 




130 










135 










140 










Gly 


Pro 


Tyr 


Trp 


Ser 


Val 


Leu 


Lys 


Gly 


Arg 


Lys 


Asp 


Gly 


Thr 


He 


Ser 


145 










150 










155 










160 


Arg 


Ala 


Asn 


Glu 


Thr 


Arg 


Asn 


Leu 


Pro 


Pro 


Pro 


Thr 


Phe 


Asn 


Val 


Ser 








165 










170 










175 




Gin 


Leu 


He 


Gin 


Ser 


Phe 


Ala 


Ala 


Arg 


Gly 


Leu 


Ser 


Val 


Lys 


Asp 


Met 








180 










185 










190 






Val 


Thr 


Leu 


Ser 


Gly 


Gly 


His 


Thr 


He 


Gly 


Phe 


Ser 


His 


Cys 


Ser 


Ser 






195 










200 










205 








Phe 


Glu 


Ser 


Arg 


Leu 


Gin 


Asn 


Phe 


Ser 


Lys 


Phe 


His 


Asp 


He 


Asp 


Pro 




210 










215 










220 










Ser 


Met 


Asn 


Tyr 


Ala 


Phe 


Ala 


Gin 


Thr 


Leu 


Lys 


Lys 


Lys 


Cys 


Pro 


Arg 


225 










230 










235 










240 


Thr 


Ser 


Asn 


Arg 


Gly 


Lys 


Asn 


Ala 


Gly 


Thr 


Val 


Leu 


Asp 


Ser 


Thr 


Ser 










245 










250 










255 




Ser 


Val 


Phe 


Asp 


Asn 


Val 


Tyr 


Tyr 


Lys 


Gin 


He 


Leu 


Ser 


Gly 


Lys 


Gly 








260 










265 










270 






Val 


Phe 


Gly 


Ser 


Asp 


Gin 


Ala 


Leu 


Leu 


Gly Asp 


Ser 


Arg 


Thr 


Lys 


Trp 






275 










280 










285 








m He 


Val 


Glu 


Thr 


Phe 


Ala 


Gin 


Asp 


Gin 


Lys 


Ala 


Phe 


Phe 


Arg 


Glu 


Phe 




290 










295 










300 










Ala 


Ala 


Ser 


Met 


Val 


Lys 


Leu 


Gly 


Asn 


Phe 


Gly 


Val 


Lys 


Glu 


Thr 


Gly 


ll 305 










310 










315 










320 


-~ Gin 


Val 


Arg 


Val 


Asn 


Thr 


Arg 


Phe 


Val 


Asn 















325 330 



(2) INFORMATION FOR SEQ ID NO: 17: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 322 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..322 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580308 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 



Met 


Ala 


Phe 


Ser 


Lys 


Gly 


Leu 


He 


Phe 


Ala 


Met 


He 


Phe 


Ala 


Val 


Leu 


1 








5 










10 










15 




Ala 


He 


Val 


Lys 
20 


Pro 


Ser 


Glu 


Ala 


Ala 
25 


Leu 


Asp 


Ala 


His 


Tyr 
30 


Tyr 


Asp 


Gin 


Ser 


Cys 
35 


Pro 


Ala 


Ala 


Glu 


Lys 

40 


He 


He 


Leu 


Glu 


Thr 
45 


Val 


Arg 


Asn 


Ala 


Thr 
50 


Leu 


Tyr 


Asp 


Pro 


Lys 
55 


Val 


Pro 


Ala 


Arg 


Leu 
60 


Leu 


Arg 


Met 


Phe 


Phe 


His 


Asp 


Cys 


Phe 


He 


Arg 


Gly 


Cys 


Asp 


Ala 


Ser 


He 


Leu 


Leu 


Asp 


65 










70 










75 










80 


Ser 


Thr 


Arg 


Ser 


Asn 


Gin 


Ala 


Glu 


Lys 


Asp 


Gly 


Pro 


Pro 


Asn 


He 


Ser 








85 










90 










95 




Val 


Arg 


Ser 


Phe 


Tyr 


Val 


He 


Glu 


Asp 


Ala 


Lys 


Arg 


Lys 


Leu 


Glu 


Lys 






100 










105 










110 






Ala 


Cys 


Pro 


Arg 


Thr 


Val 


Ser 


Cys 


Ala 


Asp 


Val 


He 


Ala 


He 


Ala 


Ala 




115 










120 










125 








Arg 


Asp 


Val 


Val 


Thr 


Leu 


Ser 


Gly 


Gly 


Pro 


Tyr 


Trp 


Ser 


Val 


Leu 


Lys 


130 










135 










140 










Gly Arg 


Lys 


Asp 


Gly 


Thr 


He 


Ser 


Arg 


Ala 


Asn 


Glu 


Thr 


Arg 


Asn 


Leu 


145 










150 










155 










160 


Pro 


Pro 


Pro 


Thr 


Phe 
165 


Asn 


Val 


Ser 


Gin 


Leu 
170 


He 


Gin 


Ser 


Phe 


Ala 
175 


Ala 
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Arg 


Gly 


Leu 


Ser 


Val 


Lys 


Asp 


Met 


Val 








180 










185 


He 


Gly 


Phe 


Ser 


His 


Cys 


Ser 


Ser 


Phe 






195 










200 




Ser 


Lys 


Phe 


His 


Asp 


He 


Asp 


Pro 


Ser 




210 










215 






Thr 


Leu 


Lys 


Lys 


Lys 


Cys 


Pro 


Arg 


Thr 


225 










230 








Gly 


Thr 


Val 


Leu 


Asp 


Ser 


Thr 


Ser 


Ser 










245 










Lys 


Gin 


He 


Leu 


Ser 


Gly 


Lys 


Gly 


Val 








260 










265 


Leu 


Gly 


Asp 


Ser 


Arg 


Thr 


Lys 


Trp 


He 






275 










280 




Gin 


Lys 


Ala 


Phe 


Phe 


Arg 


Glu 


Phe 


Ala 




290 










295 






Asn 


Phe 


Gly 


Val 


Lys 


Glu 


Thr 


Gly 


Gin 


305 










310 








Val 


Asn 
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Thr 


Leu 


Ser 


Gly 


Gly 
190 


His 


Thr 


Glu 


Ser 


Arg 


Leu 
205 


Gin 


Asn 


Phe 


Met 


Asn 


Tyr 

220 


Ala 


Phe 


Ala 


Gin 


Ser 


Asn 
235 


Arg 


Gly 


Lys 


Asn 


Ala 
240 


Val 


Phe 


Asp 


Asn 


Val 


Tyr 


Tyr 


250 










255 




Phe 


Gly 


Ser 


Asp 


Gin 
270 


Ala 


Leu 


Val 


Glu 


Thr 


Phe 
285 


Ala 


Gin 


Asp 


Ala 


Ser 


Met 
300 


Val 


Lys 


Leu 


Gly 


Val 


Arg 
315 


Val 


Asn 


Thr 


Arg 


Phe 
320 



(2) INFORMATION FOR SEQ ID NO: 18: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 444 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
{ D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1 . . 444 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580328 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 : 

tgaagaagta tttgatagag aacaagcttg caaaggaagc agagctaaag tcaatagaga 
aaaagataga cgagttggtg gaggaagcgg ttgagtttgc agacgctagt ccacagcccg 
gtcgcagtca gttgctagag aatgtgtttg ctgatccaaa aggatttgga attggacctg 
atggacggta cagatgtgag gaccccaagt ttaccgaagc acagctcaag tctgagaaga 
caagtttaac cataagctgt ctactgtctc ttcgatgttt ctatatatct tattaagtta 
aatgctacag agaatcagtt tgaatcattt gcactttttg ctttttgttt ggtgttacta 
aattatcaca aggttcttct tgtagttcgt tgggttttca ttggttacca cttaccagag 
aattgtattt ttttttttaa agac 
(2) INFORMATION FOR SEQ ID NO: 19: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 97 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1. . 97 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580329 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 



Lys 


Lys 


Tyr 


Leu 


He 


Glu 


Asn 


Lys Leu 


Ala 


Lys 


Glu 


Ala 


Glu 


Leu 


Lys 


1 






5 








10 










15 




Ser 


He 


Glu 


Lys 


Lys 


He 


Asp 


Glu Leu 


Val 


Glu 


Glu 


Ala 


Val 


Glu 


Phe 








20 








25 










30 






Ala 


Asp 


Ala 


Ser 


Pro 


Gin 


Pro 


Gly Arg 


Ser 


Gin 


Leu 


Leu 


Glu 


Asn 


Val 




35 










40 








45 








Phe 


Ala 


Asp 


Pro 


Lys 


Gly 


Phe 


Gly He 


Gly 


Pro 


Asp 


Gly 


Arg 


Tyr 


Arg 




50 










55 








60 










Cys 


Glu 


Asp 


Pro 


Lys 


Phe 


Thr 


Glu Ala 


Gin 


Leu 


Lys 


Ser 


Glu 


Lys 


Thr 
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65 70 75 80 

Ser Leu Thr lie Ser Cys Leu Leu Ser Leu Arg Cys Phe Tyr lie Ser 
85 90 95 

Tyr 

(2) INFORMATION FOR SEQ ID NO:20: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 725 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..7 25 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580388 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



ctcacacaaa aatacaacaa cttagatcag tctcaaaggg ggaaaaaaaa cttaaaagaa 60 

acattaagag gcaacacaaa tcacacaaaa gatcaaattg aagcctaaga agaaggcaaa 120 

aagtgagaag caatggctac catgttgaag gtctctcttg tattgtcatt gttaggtttt 180 

ctggtgatcg ctgtcgtgac tccatcggcg gcgaacccat tcaggaagag cgtagttctc 24 0 

ggagggaagt caggcgttcc aaacattcgg accaacaggg aaattcaaca acttggaagg 300 

tactgcgtgg agcaattcaa tcaacaagca cagaacgagc aaggaaacat aggatccatt 360 

gcgaaaacag acacggcaat ttcgaatcca ttgcaattta gccgagtagt gtctgctcag 420 

aaacaggtcg tcgctggact caaatactat ctaaggattg aagtcactca acccaatggc 480 

tctaccagga tgtttgactc tgttgtggtt attcaaccat ggctccattc taagcagttg 540 

ctcggtttca ctcctgttgt cagtcctgtc tactaacttt atttcttctt attcgactta 600 

aatttccata atatgatcaa gaaaagacta aaaggtgtat gatacaaagc tattaagaat 660 

gggttaatag ttggttttca tgatatgttt acgttgttca taaataaaaa caagttgtta 720 



ttagg 

(2) INFORMATION FOR SEQ ID NO: 21: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 147 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..147 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580389 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 



Met 


Ala 


Thr 


Met 


Leu 


Lys 


Val 


Ser 


Leu 


Val 


Leu 


Ser 


Leu 


Leu 


Gly 


Phe 


1 








5 










10 










15 




Leu 


Val 


He 


Ala 
20 


Val 


Val 


Thr 


Pro 


Ser 
25 


Ala 


Ala 


Asn 


Pro 


Phe 
30 


Arg 


Lys 


Ser 


Val 


Val 
35 


Leu 


Gly 


Gly 


Lys 


Ser 
40 


Gly 


Val 


Pro 


Asn 


He 
45 


Arg 


Thr 


Asn 


Arg 


Glu 
50 


He 


Gin 


Gin 


Leu 


Gly 
55 


Arg 


Tyr 


Cys 


Val 


Glu 
60 


Gin 


Phe 


Asn 


Gin 


Gin 


Ala 


Gin 


Asn 


Glu 


Gin 


Gly 


Asn 


He 


Gly 


Ser 


He 


Ala 


Lys 


Thr 


Asp 


65 










70 










75 










80 


Thr 


Ala 


He 


Ser 


Asn 
85 


Pro 


Leu 


Gin 


Phe 


Ser 

90 


Arg 


Val 


Val 


Ser 


Ala 
95 


Gin 


Lys 


Gin 


Val 


Val 
100 


Ala 


Gly 


Leu 


Lys 


Tyr 
105 


Tyr 


Leu 


Arg 


He 


Glu 
110 


Val 


Thr 


Gin 


Pro 


Asn 
115 


Gly 


Ser 


Thr 


Arg 


Met 
120 


Phe 


Asp 


Ser 


Val 


Val 
125 


Val 


He 


Gin 


Pro 


Trp 
130 


Leu 


His 


Ser 


Lys 


Gin 
135 


Leu 


Leu 


Gly 


Phe 


Thr 
140 


Pro 


Val 


Val 


Ser 



Pro Val Tyr 
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145 

(2) INFORMATION FOR SEQ ID NO: 22: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..144 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580390 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
Met Leu Lys Val Ser Leu Val Leu Ser Leu Leu Gly Phe Leu Val lie 
15 10 15 

Ala Val Val Thr Pro Ser Ala Ala Asn Pro Phe Arg Lys Ser Val Val 

20 25 30 

Leu Gly Gly Lys Ser Gly Val Pro Asn lie Arg Thr Asn Arg Glu lie 

35 40 45 

Gin Gin Leu Gly Arg Tyr Cys Val Glu Gin Phe Asn Gin Gin Ala Gin 

50 55 60 

Asn Glu Gin Gly Asn lie Gly Ser lie Ala Lys Thr Asp Thr Ala lie 
65 70 75 80 

Ser Asn Pro Leu Gin Phe Ser Arg Val Val Ser Ala Gin Lys Gin Val 

85 90 95 

Val Ala Gly Leu Lys Tyr Tyr Leu Arg lie Glu Val Thr Gin Pro Asn 

100 105 110 

Gly Ser Thr Arg Met Phe Asp Ser Val Val Val lie Gin Pro Trp Leu 

115 120 125 

His Ser Lys Gin Leu Leu Gly Phe Thr Pro Val Val Ser Pro Val Tyr 
130 135 140 



(2) INFORMATION FOR SEQ ID NO: 23: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 666 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..666 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580426 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
atttttcttc tcaccactta aagagtttcc ccagaaattt tcttccgccg taaaagcaaa 60 
aaaagatgca gatcttcgtg aaaaccctaa cggggaagac gatcactctc gaggtcgagt 120 
cctctgacac catcgacaat gtcaaggcca agatccaaga caaggaagga atcccaccgg 180 
accagcagcg attgattttc gccggaaagc agctcgaaga cggacgtacc ttagccgatt 240 
acaacatcca gaaggaatca acgcttcacc ttgtccttcg tctccgtgga ggtgctaaga 300 
agaggaagaa gaagacctac accaagccta agaagatcaa gcacaagcac aagaaggtca 360 
agctcgctgt tcttcagttc tacaaggttg atggttcagg taaggttcag cgtttgagga 
aggagtgccc taacgcaacc tgtggtgctg ggactttcat ggcgagtcat ttcgatcgtc 
actactgtgg taagtgtggt ctcacctacg tttaccagaa agaaggagct caggaatgat 
tttcatctcg atctctatca ttttgaattg aatactgctt tttttttgga atttggaagt 
tgtttttgga tgttgtggat cttatgttga acttgtttga atttcatatc taggtttttc 
ttatgg 

(2) INFORMATION FOR SEQ ID NO: 24: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 157 amino acids 

(B) TYPE: amino acid 



20 
480 
540 
600 
660 
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(C) STRANDEDNESS: 

{ D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..157 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580427 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Met Gin lie Phe Val Lys Thr Leu Thr Gly Lys Thr lie Thr Leu Glu 

15 10 15 

Val Glu Ser Ser Asp Thr lie Asp Asn Val Lys Ala Lys lie Gin Asp 

20 25 30 

Lys Glu Gly lie Pro Pro Asp Gin Gin Arg Leu lie Phe Ala Gly Lys 

35 40 45 

Gin Leu Glu Asp Gly Arg Thr Leu Ala Asp Tyr Asn lie Gin Lys Glu 

50 55 60 

Ser Thr Leu His Leu Val Leu Arg Leu Arg Gly Gly Ala Lys Lys Arg 
65 70 75 80 

Lys Lys Lys Thr Tyr Thr Lys Pro Lys Lys lie Lys His Lys His Lys 

85 90 95 

Lys Val Lys Leu Ala Val Leu Gin Phe Tyr Lys Val Asp Gly Ser Gly 

100 105 110 

Lys Val Gin Arg Leu Arg Lys Glu Cys Pro Asn Ala Thr Cys Gly Ala 

115 120 125 

Gly Thr Phe Met Ala Ser His Phe Asp Arg His Tyr Cys Gly Lys Cys 

130 135 140 

Gly Leu Thr Tyr Val Tyr Gin Lys Glu Gly Ala Gin Glu 
145 150 155 

(2) INFORMATION FOR SEQ ID NO: 25: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1340 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..1340 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580481 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
gtacagcagg atcaacttat tacatagaaa taatggctca gggacaccaa ggcttatgct 60 
ttgqqtttat cttcttacta tgcttcttca gctctacttt tgcactcagg attcttaagg 120 

180 
240 
300 



acctccctat gatggagcca agaaagctag atctgaacca tgaaatgatg cctgggatgg 
ttatgctaca gccagatcat gaggtccaac ctgagcacac ggtcaagcct ggccatacac 
tccagtccat gtCyggTgTA TCagaacatc cwgagcacam ggwwaagcct grccatamac 
tccagtcyat gtCyggTgCA TTagaacatc ctgagcacaa ggaaaagcct gaccataaac 360 

420 
480 
540 
600 
660 



tccagtctat gttgggagaa catcctgagc acaaggaaaa gcctgaccat aaactccagt 
ctatgttggg agaacatcct gagcacaagg aaaagcctga ccataaactc cagtccatgt 
ctggtgaatc aaaacatcct gagcacaaag tcaagcctga tcagataatg cagaccatgg 
cgagtgaact ggaagaggaa gatcctgacc acacaagtaa gcctatgggg tatggtgtag 
gtaggggcta tggaagtggt gggtctggtg ttggctatgg cgttggaatt ggctctagtg 

gaggcagWtg gctttggaga aggtattggc tctagtggag gcaacggctt tggagaagga 720 

attggctcta gtggaggcag cggctttgga gaaggaattg gctctagtgg aggcagcggc 78 0 

tttggtgaag gaattggttc tggtggtggt acaggtattg gcattggaga aggaatagga 84 0 

tccgggtctg cccaacccaa ttgcggccct gtcacgggag ctccaggtag cggattcggt 900 

gaaggcatag gacaaggtag tggtccaggg gaagggattg gtatcggtat aggacagggt 960 

ggacctagtg tagtagtccc aggcacaaga atccctccga tattggttcc aggcgcacag 1020 

attcccggtt ttgtgattcc aggtatcaca gttcctggct acggtactgg atgccaaaca 1080 

ggtggctgca atccgaaccc accacactac tatatcccac caagctgtcc ccattgtcca 1140 

cctttcacat caggacaaga caaacacatg tcagacaaag gagccatgac cgaagctctt 1200 

gcacccactt caccggagat atatgtttaa gctttaagag tccatgaaac accagtctaa 1260 
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taaatcactc ataatgaaat ctattttggt ttataaaagc ttcacagaaa gttgaataat 1320 
acaactttct tgctcaagtg 
(2) INFORMATION FOR SEQ ID NO:26: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 230 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..230 

{ D) OTHER INFORMATION: / Ceres Seq. ID 1580482 



{xi} SEQUENCE DESCRIPTION: SEQ ID NO: 26: 





Thr 


Ala 


Gly 


Ser 


Thr 


Tyr 


Tyr 


He 


Glu 


He 


Met 


Ala 


Gin 


Gly 


His 


Gin 




1 








5 










10 










15 






Gly 


Leu 


Cys 


Phe 


Gly 


Phe 


He 


Phe 


Leu 


Leu 


Cys 


Phe 


Phe 


Ser 


Ser 


Thr 










20 










25 










30 








Phe 


Ala 


Leu 


Arg 


He 


Leu 


Lys 


Asp 


Leu 


Pro 


Met 


Met 


Glu 


Pro 


Arg 


Lys 








35 










40 










45 










Leu 


Asp 


Leu 


Asn 


His 


Glu 


Met 


Met 


Pro 


Gly 


Met 


Val 


Met 


Leu 


Gin 


Pro 






50 










55 










60 












Asp 


His 


Glu 


Val 


Gin 


Pro 


Glu 


His 


Thr 


Val 


Lys 


Pro 


Gly 


His 


Thr 


Leu 




65 










70 










75 










80 




Gin 


Ser 


Met 


Xaa 


Gly 


Val 


Ser 


Glu 


His 


Xaa 


Glu 


His 


Xaa 


Xaa 


Lys 


Pro 












85 










90 










95 






Xaa 


His 


Xaa 


Leu 


Gin 


Xaa 


Met 


Xaa 


Gly 


Ala 


Leu 


Glu 


His 


Pro 


Glu 


His 










100 










105 










110 








Lys 


Glu 


Lys 


Pro 


Asp 


His 


Lys 


Leu 


Gin 


Ser 


Met 


Leu 


Gly 


Glu 


His 


Pro 








115 










120 










125 










Glu 


His 


Lys 


Glu 


Lys 


Pro 


Asp 


His 


Lys 


Leu 


Gin 


Ser 


Met 


Leu 


Gly 


Glu 






130 










135 










140 












His 


Pro 


Glu 


His 


Lys 


Glu 


Lys 


Pro 


Asp 


His 


Lys 


Leu 


Gin 


Ser 


Met 


Ser 




145 










150 










155 










160 




Gly 


Glu 


Ser 


Lys 


His 


Pro 


Glu 


His 


Lys 


Val 


Lys 


Pro 


Asp 


Gin 


He 


Met 












165 










170 










175 






Gin 


Thr 


Met 


Ala 


Ser 


Glu 


Leu 


Glu 


Glu 


Glu 


Asp 


Pro 


Asp 


His 


Thr 


Ser 










180 










185 










190 








Lys 


Pro 


Met 


Gly 


Tyr 


Gly 


Val 


Gly 


Arg 


Gly 


Tyr 


Gly 


Ser 


Gly 


Gly 


Ser 








195 










200 










205 










Gly Val 


Gly 


Tyr 


Gly 


Val 


Gly 


He 


Gly 


Ser 


Ser 


Gly 


Gly 


Xaa 


Trp 


Leu 






210 










215 










220 












Trp 


Arg 


Arg 


Tyr 


Trp 


Leu 























225 230 
(2) INFORMATION FOR SEQ ID NO: 27: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 220 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

<D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 220 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580483 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

Met Ala Gin Gly His Gin Gly Leu Cys Phe Gly Phe He Phe Leu Leu 

15 10 15 

Cys Phe Phe Ser Ser Thr Phe Ala Leu Arg He Leu Lys Asp Leu Pro 

20 25 30 

Met Met Glu Pro Arg Lys Leu Asp Leu Asn His Glu Met Met Pro Gly 
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35 










40 










45 








Met 


Val 


Met 


Leu 


Gin 


Pro 


Asp 


His 


Glu 


Val 


Gin 


Pro 


Glu 


His 


Thr 


Val 




50 










55 










60 










Lys 


Pro 


Gly 


His 


Thr 


Leu 


Gin 


Ser 


Met 


Xaa 


Gly 


Val 


Ser 


Glu 


His 


Xaa 


65 










70 










75 










80 


Glu 


His 


Xaa 


Xaa 


Lys 


Pro 


Xaa 


His 


Xaa 


Leu 


Gin 


Xaa 


Met 


Xaa 


Gly 


Ala 










85 










90 










95 




Leu 


Glu 


His 


Pro 


Glu 


His 


Lys 


Glu 


Lys 


Pro 


Asp 


His 


Lys 


Leu 


Gin 


Ser 








100 










105 










110 






Met 


Leu 


Gly 


Glu 


His 


Pro 


Glu 


His 


Lys 


Glu 


Lys 


Pro 


Asp 


His 


Lys 


Leu 






115 










120 










125 








Gin 


Ser 


Met 


Leu 


Gly 


Glu 


His 


Pro 


Glu 


His 


Lys 


Glu 


Lys 


Pro 


Asp 


His 




130 










135 










140 










Lys 


Leu 


Gin 


Ser 


Met 


Ser 


Gly 


Glu 


Ser 


Lys 


His 


Pro 


Glu 


His 


Lys 


Val 


145 










150 










155 










160 


Lys 


Pro 


As P 


Gin 


He 


Met 


Gin 


Thr 


Met 


Ala 


Ser 


Glu 


Leu 


Glu 


Glu 


Glu 






165 










170 










175 




Asp 


Pro 


Asp 


His 


Thr 


Ser 


Lys 


Pro 


Met 


Gly 


Tyr 


Gly Val 


Gly Arg 


Gly 






180 










185 










190 






Tyr 


Gly 


Ser 


Gly 


Gly 


Ser 


Gly 


Val 


Gly 


Tyr 


Gly Val 


Gly 


He 


Gly 


Ser 






195 










200 










205 








Ser 


Gly 


Gly 


Xaa 


Trp 


Leu 


Trp 


Arg 


Arg 


Tyr 


Trp 


Leu 












210 










215 










220 










(2) 


INFORMATION 


FOR 


SEQ 


ID N0:21 


3: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 206 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1. .206 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580484 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 



Met 


Glu 


Val 


Val 


Gly 


Leu 


Val 


Leu 


Ala 


Met 


Ala 


Leu 


Glu 


Leu 


Ala 


Leu 


1 








5 










10 










15 




Val 


Glu 


Ala 


Xaa 


Gly 


Phe 


Gly 


Glu 


Gly 


He 


Gly 


Ser 


Ser 


Gly 


Gly 


Asn 








20 










25 










30 






Gly 


Phe 


Gly 


Glu 


Gly 


He 


Gly 


Ser 


Ser 


Gly 


Gly 


Ser 


Gly 


Phe 


Gly 


Glu 




35 










40 










45 








Gly 


He 


Gly 


Ser 


Ser 


Gly 


Gly 


Ser 


Gly 


Phe 


Gly 


Glu 


Gly 


He 


Gly 


Ser 


50 










55 










60 










Gly 


Gly 


Gly 


Thr 


Gly 


He 


Gly 


He 


Gly 


Glu 


Gly 


He 


Gly 


Ser 


Gly 


Ser 


65 










70 










75 










80 


Ala 


Gin 


Pro 


Asn 


Cys 


Gly 


Pro 


Val 


Thr 


Gly 


Ala 


Pro 


Gly 


Ser 


Gly 


Phe 










85 










90 










95 




Gly 


Glu 


Gly 


He 


Gly 


Gin 


Gly 


Ser 


Gly 


Pro 


Gly 


Glu 


Gly 


He 


Gly 


He 






100 










105 










110 






Gly 


He 


Gly 


Gin 


Gly 


Gly 


Pro 


Ser 


Val 


Val 


Val 


Pro 


Gly 


Thr 


Arg 


He 




115 










120 










125 








Pro 


Pro 


He 


Leu 


Val 


Pro 


Gly 


Ala 


Gin 


He 


Pro 


Gly 


Phe 


Val 


He 


Pro 




130 










135 










140 










Gly 


He 


Thr 


Val 


Pro 


Gly 


Tyr 


Gly 


Thr 


Gly 


Cys 


Gin 


Thr 


Gly 


Gly 


Cys 


145 










150 










155 










160 


Asn 


Pro 


Asn 


Pro 


Pro 


His 


Tyr 


Tyr 


He 


Pro 


Pro 


Ser 


Cys 


Pro 


His 


Cys 










165 










170 










175 




Pro 


Pro 


Phe 


Thr 


Ser 


Gly 


Gin 


Asp 


Lys 


His 


Met 


Ser 


Asp 


Lys 


Gly 


Ala 








180 










185 










190 






Met 


Thr 


Glu 


Ala 


Leu 


Ala 


Pro 


Thr 


Ser 


Pro 


Glu 


He 


Tyr 


Val 







195 200 205 
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(2) INFORMATION FOR SEQ ID NO: 29: 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 703 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



DNA (genomic) 



ID 1580511 



1. .703 

(D) OTHER INFORMATION: / Ceres Seq. 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
aaatctaggg ttttgtgtgc tcctccttcc aaagccgctg cgaaactcgt cgattgtcaa 
aaccggagct cagagactga tctcggaaag taaatcgcag ccatggcgga aaaagctgtc 
actatcagga ccagaaactt catgaccaac aggcttctcg ccagaaagca attcgttatt 
gatgttcttc atcctggaag agccaatgtt tcaaaggctg agcttaagga gaaattggcg 
aggatgtatg aggttaagga cccaaatgct atcttctgtt tcaagttcag aactcacttt 
ggaggtggta aatcttctgg atatggtttg atctatgata ctgtcgagaa cgctaagaag 
tttgagccta agtacagact tatcaggaat ggacttgaca ccaagattga gaaatcaagg 
aaacagatca aggagaggaa gaacagggcg aagaagatcc gtggtgttaa gaagaccaag 
gctggtgatg ccaagaagaa gtaagacgtt tgaagaggac agaagcagca atatggtttt 
tgttctgcat cactcctttg aagccatgtt tgctattaga catcttctaa ttatttctcg 
attttgtgct tctttttgtc acacttgctc tattttggtt tggatcttgt tttagagaaa 
catgtggtta agattttgtt tgaatacatt gattttaaac ttc 
(2) INFORMATION FOR SEQ ID NO: 30: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 133 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 



peptide 
1. . 133 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580512 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
Met Ala Glu Lys Ala Val Thr lie Arg Thr Arg Asn Phe Met Thr Asn 
15 10 15 

Arg Leu Leu Ala Arg Lys Gin Phe Val lie Asp Val Leu His Pro Gly 

20 25 30 

Arg Ala Asn Val Ser Lys Ala Glu Leu Lys Glu Lys Leu Ala Arg Met 

35 40 45 

Tyr Glu Val Lys Asp Pro Asn Ala lie Phe Cys Phe Lys Phe Arg Thr 

50 55 60 

His Phe Gly Gly Gly Lys Ser Ser Gly Tyr Gly Leu He Tyr Asp Thr 

70 75 80 

Phe Glu Pro Lys Tyr Arg Leu He Arg Asn 
90 95 



65 



Val Glu Asn Ala Lys Lys 
85 

Gly Leu Asp Thr Lys He Glu Lys Ser Arg Lys Gin He Lys Glu Arg 



120 



100 

Lys Asn Arg Ala Lys Lys 
115 

Asp Ala Lys Lys Lys 
130 

(2) INFORMATION FOR SEQ ID NO: 31: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 120 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 



105 HO 
lie Arg Gly Val Lys Lys Thr Lys Ala Gly 

125 
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(ix) FEATURE: 

(A) NAME/ KEY : peptide 

(B) LOCATION: 1..120 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580513 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 



Met 


Thr 


Asn 


Arg 


Leu 


Leu 


Ala 


Arg 


Lys 


Gin 


Phe 


Val 


He 


Asp 


Val 


Leu 


1 








5 










10 










15 




His 


Pro 


Gly 


Arg 
20 


Ala 


Asn 


Val 


Ser 


Lys 
25 


Ala 


Glu 


Leu 


Lys 


Glu 
30 


Lys 


Leu 


Ala 


Arg 


Met 
35 


Tyr 


Glu 


Val 


Lys 


Asp 
40 


Pro 


Asn 


Ala 


He 


Phe 
45 


Cys 


Phe 


Lys 


Phe 


Arg 
50 


Thr 


His 


Phe 


Gly 


Gly 
55 


Gly 


Lys 


Ser 


Ser 


Gly 
60 


Tyr 


Gly 


Leu 


He 


Tyr 


Asp 


Thr 


Val 


Glu 


Asn 


Ala 


Lys 


Lys 


Phe 


Glu 


Pro 


Lys 


Tyr 


Arg 


Leu 


65 










70 










75 










80 


He 


Arg 


Asn 


Gly 


Leu 
85 


Asp 


Thr 


Lys 


He 


Glu 
90 


Lys 


Ser 


Arg 


Lys 


Gin 
95 


He 


Lys 


Glu 


Arg 


Lys 


Asn 


Arg 


Ala 


Lys 


Lys 


He 


Arg 


Gly Val 


Lys 


Lys 


Thr 






100 










105 










110 






Lys 


Ala 


Gly 
115 


Asp 


Ala 


Lys 


Lys 


Lys 
120 


















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO: 32: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 106 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1. . 106 

(D) OTHER INFORMATION: / Ceres Seq* ID 1580514 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 



Met 


He 


Leu 


Ser 


Arg 


Thr 


Leu 


Arg 


Ser 


Leu 


Ser 


Leu 


Ser 


Thr 


Asp 


Leu 


1 








5 










10 










15 




Ser 


Gly 


Met 


Asp 


Leu 


Thr 


Pro 


Arg 


Leu 


Arg 


Asn 


Gin 


Gly 


Asn 


Arg 


Ser 






20 










25 










30 






Arg 


Arg 


Gly 
35 


Arg 


Thr 


Gly 


Arg 


Arg 
40 


Arg 


Ser 


Val 


Val 


Leu 
45 


Arg 


Arg 


Pro 


Arg 


Leu 
50 


Val 


Met 


Pro 


Arg 


Arg 
55 


Ser 


Lys 


Thr 


Phe 


Glu 
60 


Glu 


Asp 


Arg 


Ser 


Ser 


Asn 


Met 


Val 


Phe 


Val 


Leu 


His 


His 


Ser 


Phe 


Glu 


Ala 


Met 


Phe 


Ala 


65 










70 










75 










80 


He 


Arg 


His 


Leu 


Leu 


He 


He 


Ser 


Arg 


Phe 


Cys 


Ala 


Ser 


Phe 


Cys 


His 








85 










90 










95 




Thr 


Cys 


Ser 


He 
100 


Leu 


Val 


Trp 


He 


Leu 
105 


Phe 














(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO: 33: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 513 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1 . . 513 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580599 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
atgcattatc atcgtatgat ttatgttggt gataaatagg tggggcagaa aggaggaacg 
ggaccaggac ctagaagttc acacggcata gccgcggtcg gagacaagct ctacagtttc 
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ggcggcgagt taacaccaaa caaacacatc gacaaagacc tctacgtctt tgacttcaac 
actcaaactt ggtcaatcgc tcaacccaaa ggagacgccc caactgtatc ctgcttaggc 
gtgcgcatgg tggccgtggg aactaagatc tatatctttg gaggccgcga tgagaaccgc 
aacttcgaaa actttcgctc ctacgatacg gtgacatccg agtggacatt cctgacgaag 
cttgatgagg tgggaggacc cgaggctcgt actttccatt cgatggcttc ggatgaaaac 
catgtgtatg tattcggtgg ggtgagcaaa ggcggtacta tgaatactcc cacgcggttc 
aggacaatcg aggcgtataa cattgctgat ggg 
(2) INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
peptide 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 
1. .89 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580600 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
Met Val Ala Val Gly Thr Lys lie Tyr He Phe Gly Gly Arg Asp Glu 
15 10 15 

Asn Arg Asn Phe Glu Asn Phe Arg Ser Tyr Asp Thr Val Thr Ser Glu 

20 25 30 

Trp Thr Phe Leu Thr Lys Leu Asp Glu Val Gly Gly Pro Glu Ala Arg 

35 40 45 

Thr Phe His Ser Met Ala Ser Asp Glu Asn His Val Tyr Val Phe Gly 

50 55 60 

Gly Val Ser Lys Gly Gly Thr Met Asn Thr Pro Thr Arg Phe Arg Thr 
65 70 75 80 

He Glu Ala Tyr Asn He Ala Asp Gly 
85 

(2) INFORMATION FOR SEQ ID NO: 35: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 452 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..452 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580618 
SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
aattcactga ttattgtttt aaggcaaatt aagatcatct tcaaaatctt ctcagatctc 
ttccaatttt ctagaaaaaa catgtcttgc tgtggtggaa gctgtggttg tggatctgcc 
tgcaagtgcg gcaatggttg cggaggttgc aaaaggtacc ctgacttgga gaacaccgcc 
accgagactc ttgtcctcgg tgttgctccg gcgatgaact ctcagtacga ggcttccggc 
gagactttcg ttgccgagaa tgatgcctgc aaatgcggat ctgactgcaa gtgcaaccct 
tgtacctgca aatgaagaac ttcataaacc ctaagtctgt aataacccta atgttatgtt 
aggtttgctt atatgtaata attggctgat ttttccggta gttttgccgg cgacgttggt 
ctttctcttc ttcttctata aatggatgct gt 
{2} INFORMATION FOR SEQ ID NO: 36: 



(ii) 
(ix) 



(xi) 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 104 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE : 

(A) NAME/KEY: 

(B) LOCATION: 



peptide 

peptide 
1. .104 



180 
240 
300 
360 
420 
480 



60 
120 
180 
240 
300 
360 
420 
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(D) OTHER INFORMATION: / Ceres Seq. ID 1580619 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 



Asn 


Ser 


Leu 


He 


He 


Val 


Leu 


Arg 


Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


1 








5 










10 










15 




Phe 


Ser 


Asp 


Leu 


Phe 


Gin 


Phe 


Ser 


Arg 


Lys 


Asn 


Met 


Ser 


Cys 


Cys 


Gly 






20 










25 










30 






Gly 


Ser 


Cys 
35 


Gly 


Cys 


Gly 


Ser 


Ala 
40 


Cys 


Lys 


Cys 


Gly 


Asn 
45 


Gly 


Cys 


Gly 


Gly 


Cys 
50 


Lys 


Arg 


Tyr 


Pro 


Asp 
55 


Leu 


Glu 


Asn 


Thr 


Ala 
60 


Thr 


Glu 


Thr 


Leu 


Val 


Leu 


Gly 


Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


65 










70 










75 










80 


Glu 


Thr 


Phe 


Val 


Ala 
85 


Glu 


Asn 


Asp 


Ala 


Cys 
90 


Lys 


Cys 


Gly 


Ser 


Asp 
95 


Cys 


Lys 


Cys 


Asn 


Pro 


Cys 


Thr 


Cys 


Lys 



















100 



(2) INFORMATION FOR SEQ ID NO: 37: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580620 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 



Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 








5 










10 










15 




Gly 


Asn 


Gly 


Cys 
20 


Gly 


Gly 


Cys 


Lys 


Arg 
25 


Tyr 


Pro 


Asp 


Leu 


Glu 
30 


Asn 


Thr 


Ala 


Thr 


Glu 
35 


Thr 


Leu 


Val 


Leu 


Gly 
40 


Val 


Ala 


Pro 


Ala 


Met 
45 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe 


Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


50 










55 










60 










Cys 


Gly 


Ser 


Asp 


Cys 


Lys 


Cys 


Asn 


Pro 


Cys 


Thr 


Cys 


Lys 








65 










70 










75 












(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:3i 


3: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 313 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..313 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580629 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aaatgcggat ctgactgcaa gtgcaaccct 
tgtaccnttt tgg 

(2) INFORMATION FOR SEQ ID NO: 39: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 93 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 93 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580630 





(xi) 


SEQUENCE DESCRIPTION: SEQ : 


ID NO:39: 












Ser 


Leu 


He 


He 


Val 


Leu 


Arg Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


Phe 


1 








5 








10 










15 




Ser 


Asp 


Leu 


Phe 


Gin 


Phe 


Ser Arg 


Lys 


Asn 


Met 


Ser 


Cys 


Cys 


Gly 


Gly 






20 








25 










30 






Ser 


Cys 


Gly 
35 


Cys 


Gly 


Ser 


Ala Cys 
40 


Lys 


Cys 


Gly 


Asn 


Gly 
45 


Cys 


Gly 


Gly 


Cys 


Lys 
50 


Arg 


Tyr 


Pro 


Asp 


Leu Glu 
55 


Asn 


Thr 


Ala 


Thr 

60 


Glu 


Thr 


Leu 


Val 


Leu 


Gly 


Val 


Ala 


Pro 


Ala 


Met Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


65 








70 








75 










80 


Thr 


Phe 


Val 


Ala 


Glu 
85 


Asn 


Asp Ala 


Cys 


Lys 
90 


Met 


Arg 


He 








(2) 


INFORMATION 


FOR 


SEQ 


ID NO:40: 

















<i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 67 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 



peptide 



ID 1580631 



Met 

1 



peptide 
1. .67 

(D) OTHER INFORMATION: / Ceres Seq. 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
5 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Met Arg He 
65 

(2) INFORMATION FOR SEQ ID NO: 41: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 492 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..492 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580663 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
attcaytgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctks 
ttccaatttt kstagaaaaa acatgtcttg ctgtggtgga agctgtggtt gtggatctgc 
ctgcaagtgc ggcaatggtt gcggaggttg caaaaggtac cctgacttgg agaacaccgc 
caccgagact cttgtcctcg gtgttgctcc ggcgatgaac tctcagtacg aggcttccgg 
cgagactttc gttgccgaga atgatgcctg caaatgcgga tctgactgca agtgcaaccc 
ttgtacctgc aaatgaagaa cttcataaac cctaagtstg taataaccct aatgttatgt 
taggtttgnt tatatgtaat aattggbtga tttttccggt agttttgccg gcgacgttgg 
htttvtcttc ttgaaaagct tcaccgaagc gacagttctt ctagctcctc aagtgaggaa 



60 
120 
180 
240 
300 
360 
420 
480 
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gaaggttcag at 

(2) INFORMATION FOR SEQ ID NO: 42: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 71 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/ KEY : peptide 

(B) LOCATION: 1..71 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580664 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 



Ser 


Xaa 


He 


He 


Val 


Leu 


Arg Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


Phe 


1 








5 








10 










15 




Ser 


Asp 


Xaa 


Xaa 


Pro 


He 


Xaa Xaa 


Glu 


Lys 


Thr 


Cys 


Leu 


Ala 


Val 


Val 






20 








25 










30 






Glu 


Ala 


Val 
35 


Val 


Val 


Asp 


Leu Pro 
40 


Ala 


Ser 


Ala 


Ala 


Met 
45 


Val 


Ala 


Glu 


Val 


Ala 
50 


Lys 


Gly 


Thr 


Leu 


Thr Trp 
55 


Arg 


Thr 


Pro 


Pro 
60 


Pro 


Arg 


Leu 


Leu 


Ser 


Ser 


Val 


Leu 


Leu 


Arg 


Arg 


















65 










70 




















(2) 


INFORMATION 


FOR 


SEQ 


ID NO: 43: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580665 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 



Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 








5 










10 










15 




Gly 


Asn 


Gly 


Cys 
20 


Gly 


Gly 


Cys 


Lys 


Arg 
25 


Tyr 


Pro 


Asp 


Leu 


Glu 
30 


Asn 


Thr 


Ala 


Thr 


Glu 
35 


Thr 


Leu 


Val 


Leu 


Gly 
40 


Val 


Ala 


Pro 


Ala 


Met 
45 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe 


Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


50 










55 










60 










Cys 


Gly 


Ser 


Asp 


C Y S 


Lys 


Cys 


Asn 


Pro 


Cys 


Thr 


Cys 


Lys 








65 










70 










75 












(2) 


INFORMATION 


FOR 


SEQ 


ID ; 


NO : 4 4 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 504 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..504 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580681 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
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agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggm gacgttggtc 
tttctcttct tcttcttctt ctgtgtgtgt ttttatggtt tggtcattaa gatatctctg 
caaatatacc acgaatcctt gatt 
(2) INFORMATION FOR SEQ ID NO: 45: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580682 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 



Ser 


Leu lie 


He 


Val 


Leu 


Arg 


Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


Phe 


1 






5 










10 










15 




Ser 


Asp Leu 


Phe 


Gin 


Phe 


Ser 


Arg 


Lys 


Asn 


Met 


Ser 


Cys 


Cys 


Gly 


Gly 




20 










25 










30 






Ser 


Cys Gly 
35 


Cys 


Gly 


Ser 


Ala 


Cys 
40 


Lys 


Cys 


Gly 


Asn 


Gly 
45 


Cys 


Gly 


Gly 


Cys 


Lys Arg 

50 


Tyr 


Pro 


Asp 


Leu 
55 


Glu 


Asn 


Thr 


Ala 


Thr 
60 


Glu 


Thr 


Leu 


Val 


Leu 


Gly Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


65 






70 










75 










80 


Thr 


Phe Val 


Ala 


Glu 
85 


Asn 


Asp 


Ala 


Cys 


Lys 
90 


Cys 


Gly 


Ser 


Asp 


Cys 
95 


Lys 


Cys 


Asn Pro 


Cys 
100 


Thr 


Cys 


Lys 




















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO : 4 6 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580683 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 



Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 






5 










10 










15 




Gly Asn 


Gly 


Cys 


Gly 


Gly 


Cys 


Lys 


Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 








20 










25 










30 






Ala 


Thr 


Glu 
35 


Thr 


Leu 


Val 


Leu 


Gly 
40 


Val 


Ala 


Pro 


Ala 


Met 
45 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe 


Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


50 










55 










60 










Cys 


Gly 


Ser 


Asp 


Cys 


Lys 


Cys 


Asn 


Pro 


Cys 


Thr 


Cys 


Lys 








65 










70 










75 












(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO: 47 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 494 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 
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(A) NAME /KEY : - 

(B) LOCATION: 1..494 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580697 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgttgaag 
ctccggagaa aatcgagaag atcggatccg aaatctcatc cctaaccctc gaagaagctc 
gtatcctcgt cgac 

(2) INFORMATION FOR SEQ ID NO: 48: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
peptide 



60 
120 
180 
240 
300 
360 
420 
480 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 



peptide 
1. .103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580698 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
Ser Leu lie lie Val Leu Arg Gin lie Lys lie lie Phe Lys lie Phe 
15 10 15 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly 

20 25 30 

Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys Gly Gly 

35 40 45 

Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr Ala Thr Glu Thr Leu Val 

50 55 60 

Leu Gly Val Ala Pro Ala Met Asn Ser Gin Tyr Glu Ala Ser Gly Glu 
65 70 75 80 

Thr Phe Val Ala Glu Asn Asp Ala Cys Lys Cys Gly Ser Asp Cys Lys 

85 90 95 

Cys Asn Pro Cys Thr Cys Lys 
100 

(2) INFORMATION FOR SEQ ID NO: 49: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580699 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 

Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
5 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 
65 70 75 



Met 
1 
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(2) INFORMATION FOR SEQ ID NO: 50: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 485 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..485 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580706 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
gattattgtt ttaaggcaaa ttaagatcat cttcaaaatc ttctcagatc tcttccaatt 
ttctagaaaa aacatgtctt gctgtggtgg aagctgtggt tgtggatctg cctgcaagtg 
cggcaatggt tgcggaggtt gcaaaaggta ccctgacttg gagaacaccg ccaccgagac 
tcttgtcctc ggtgttgctc cggcgatgaa ctctcagtac gaggcttccg gcgaaacttt 
cgttgccgag aatgatgcct gcaaatgcgg atctgactgc aagtgcaacc cttgtacctg 
caaatgaaga acttcataaa ccctaagtct gtaataaccc taatgttatg ttaggtttgc 
ttatatgtaa taattggctg atttttccgg tagttttgcc ggcgacgttg gaggttattc 
agaagaagcg agcagagaag cctgaagttc gtgatgccgc tagagaagct gccctacgtg 
agate 

(2) INFORMATION FOR SEQ ID NO: 51: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 101 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..101 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580707 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 



He 


He 


Val 


Leu 


Arg 


Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


Phe 


Ser 


Asp 


1 








5 










10 










15 




Leu 


Phe 


Gin 


Phe 
20 


Ser 


Arg 


Lys 


Asn 


Met 
25 


Ser 


Cys 


Cys 


Gly 


Gly 
30 


Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


Gly 


Asn 


Gly 


Cys 


Gly 


Gly 


Cys 


Lys 


35 










40 










45 








Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 


Ala 


Thr 


Glu 


Thr 


Leu 


Val 


Leu 


Gly 


50 










55 










60 










Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe 


65 










70 










75 










80 


Val 


Ala 


Glu 


Asn 


Asp 
85 


Ala 


Cys 


Lys 


Cys 


Gly 
90 


Ser 


Asp 


Cys 


Lys 


Cys 
95 


Asn 


Pro 


Cys 


Thr 


Cys 
100 


Lys 
























(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:52: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580708 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 
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(ii) 
(ix) 



(xi) 



Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 
65 70 75 

(2) INFORMATION FOR SEQ ID NO: 53: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 630 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE : 

(A) NAME /KEY : - 

(B) LOCATION: 1..630 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580743 
SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
aattatttcc tacacaaaat tcccactcac cacacacaac aaaagaatag tgatcgaagc 
tcatggcgtc tcttgcaacc gtcgccgctg tgaaaccatc cgccgccata aaaggactcg 
gcggcagctc actcgccgga gctaagctct ccatcaagcc ttcccgcctg agctttaaac 
ccaaatccat ccgggctaat ggtgtggtgg ctaagtatgg agacaaaagt gtctactttg 
acttagaaga tttgggtaac acaacaggtc aatgggacgt atacggctct gatgctcctt 
ctccttacaa tcctcttcag agcaagttct ttgagacatt cgctgcccca ttcacaaaga 
gaggattgct cctcaagttc ttgatccttg gaggaggctc tttgcttact tatgtcagcg 
ctacctctac cggcgaagtt cttcccatca agagaggtcc tcaggagccg cctaagctcg 
gtcctcgcgg caagctctga tctatattca tgttaccttt ctcttcttcc ttctaaaact 
catcaacatt tctcaatact gcaaaccctt ttaagtaatt ttatgtatat tatgtttatc 
tgttacattt gaaacgtttt ttcttcccct 
(2) INFORMATION FOR SEQ ID NO: 54: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 145 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..145 

(D) OTHER INFORMATION: / Ceres Seq. ID 1530744 
SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
Met Ala Ser Leu Ala Thr Val Ala Ala Val Lys Pro Ser Ala Ala He 
15 10 15 

Lys Gly Leu Gly Gly Ser Ser Leu Ala Gly Ala Lys Leu Ser He Lys 

20 25 30 

Pro Ser Arg Leu Ser Phe Lys Pro Lys Ser lie Arg Ala Asn Gly Val 

35 40 45 

Val Ala Lys Tyr Gly Asp Lys Ser Val Tyr Phe Asp Leu Glu Asp Leu 

50 55 60 

Gly Asn Thr Thr Gly Gin Trp Asp Val Tyr Gly Ser Asp Ala Pro Ser 
65 70 75 80 

Pro Tyr Asn Pro Leu Gin Ser Lys Phe Phe Glu Thr Phe Ala Ala Pro 

85 90 95 

Phe Thr Lys Arg Gly Leu Leu Leu Lys Phe Leu He Leu Gly Gly Gly 

100 105 HO 

Ser Leu Leu Thr Tyr Val Ser Ala Thr Ser Thr Gly Glu Val Leu Pro 

115 120 125 

He Lys Arg Gly Pro Gin Glu Pro Pro Lys Leu Gly Pro Arg Gly Lys 
130 135 140 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 



(ii) 
(ix) 



(xi) 
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Leu 
145 

(2) 



INFORMATION FOR SEQ ID NO: 55: 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 993 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 



DNA (genomic) 



1, .993 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580787 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
gagacaatcg actttctaaa atcggtaaac gatttcatct cttaaagctt ctcctccatt 
cttcacacga gtcaatttaa ttgagggttt tgtttgatcg attcttgaat cagatcaatt 
tgatcttcaa aaactaggat ggcgactgag aaaccgatta cgacggagac tgttgctctc 
actgagaaga aaatggacat gtctttagat gagattatca agatggaaaa gagcaatacc 
aatgtgaata agggcaagaa acagagagta ttgaataaaa aggagaaatt tagtggtgct 
gcgaagaata gtgcggtgaa agcacagcgt tatatggact ctcggtctga tgttagacag 
ggtgcttttg ctaagaagag gtctaatttc caaggaaacc agtttcctgt aacaacaacc 
gttgctcgta aagccgcttc tgctactccg cgtggtagac cttataatgg tggaaggatg 
actaatacga atcaatcaag gtttattgct ccaccagctc agaatagagc ttcacaaaga 
gggtttgtcg caaagcagca gcagcagcaa agggagaaga tagtgcagca gcaggcaaat 
ggaggaggag gagggcaaag gcaatggcct cagacactgg attctcggtt tgcaaacatg 
aaggaagaga gaatgagaat gagaaggttt gcagacaata gaagcaatgt aggcaacaat 
ggagctggat cgcatcagca gcagcgttcg atggtcccgt gggtgagaag agctacaaga 
ttccccaact gatttatgac ctgcagaata gtgttgtttc aagggtaggg tgaacatatt 
tgctacttat gtagtttggt ttggattcat tgtatcaagt gtagaacatt cgtatgtgaa 
gctctaaaac cttgaatctt tttcttggct gtccttagtg tgttttgacc attttactct 
ttcatctctt cgttaaaaaa atcagttcag gtc 
(2) INFORMATION FOR SEQ ID NO: 56: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 217 amino acids 

(B) TYPE: amino acid 
CO STRANDEDNESS: 

(D) TOPOLOGY: linear 
peptide 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 
1. .217 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580788 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
Met Ala Thr Glu Lys Pro He Thr Thr Glu Thr Val Ala Leu Thr Glu 
15 10 15 

Lys Lys Met Asp Met Ser Leu Asp Glu lie lie Lys Met Glu Lys Ser 

20 25 30 

Asn Thr Asn Val Asn Lys Gly Lys Lys Gin Arg Val Leu Asn Lys Lys 

35 40 45 

Glu Lys Phe Ser Gly Ala Ala Lys Asn Ser Ala Val Lys Ala Gin Arg 

50 55 60 

Tyr Met Asp Ser Arg Ser Asp Val Arg Gin Gly Ala Phe Ala Lys Lys 
65 70 75 80 

Arg Ser Asn Phe Gin Gly Asn Gin Phe Pro Val Thr Thr Thr Val Ala 

85 90 95 

Arg Lys Ala Ala Ser Ala Thr Pro Arg Gly Arg Pro Tyr Asn Gly Gly 

100 105 HO 

Arg Met Thr Asn Thr Asn Gin Ser Arg Phe lie Ala Pro Pro Ala Gin 

115 120 125 

Asn Arg Ala Ser Gin Arg Gly Phe Val Ala Lys Gin Gin Gin Gin Gin 
130 135 140 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
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Arg Glu Lys He Val Gin Gin Gin Ala Asn Gly Gly Gly Gly Gly Gin 
145 150 155 160 

Arg Gin Trp Pro Gin Thr Leu Asp Ser Arg Phe Ala Asn Met Lys Glu 

165 170 175 

Glu Arg Met Arg Met Arg Arg Phe Ala Asp Asn Arg Ser Asn Val Gly 

180 185 190 

Asn Asn Gly Ala Gly Ser His Gin Gin Gin Arg Ser Met Val Pro Trp 

195 200 205 

Val Arg Arg Ala Thr Arg Phe Pro Asn 

210 215 
(2) INFORMATION FOR SEQ ID NO: 57: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 199 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..199 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580789 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 



Met 


Asp 


Met 


Ser 


Leu 


Asp 


Glu 


lie 


lie 


Lys 


Met 


Glu 


Lys 


Ser 


Asn 


Thr 


1 






5 










10 










15 




Asn 


Val 


Asn 


Lys 


Gly 


Lys 


Lys 


Gin 


Arg 


Val 


Leu 


Asn 


Lys 


Lys 


Glu 


Lys 








20 










25 










30 






Phe 


Ser 


Gly 


Ala 


Ala 


Lys 


Asn 


Ser 


Ala 


Val 


Lys 


Ala 


Gin Arg 


Tyr 


Met 






35 










40 










45 








Asp 


Ser 


Arg 


Ser 


Asp 


Val 


Arg 


Gin 


Gly Ala 


Phe 


Ala 


Lys 


Lys 


Arg 


Ser 




50 










55 










60 










Asn 


Phe 


Gin 


Gly 


Asn 


Gin 


Phe 


Pro 


Val 


Thr 


Thr 


Thr 


Val 


Ala 


Arg 


Lys 


65 








70 










75 










80 


Ala 


Ala 


Ser 


Ala 


Thr 


Pro 


Arg 


Gly 


Arg 


Pro 


Tyr 


Asn 


Gly 


Gly 


Arg 


Met 










85 










90 










95 




Thr 


Asn 


Thr 


Asn 


Gin 


Ser 


Arg 


Phe 


He 


Ala 


Pro 


Pro 


Ala 


Gin 


Asn 


Arg 








100 










105 










110 






Ala 


Ser 


Gin 


Arg 


Gly 


Phe 


Val 


Ala 


Lys 


Gin 


Gin 


Gin 


Gin 


Gin 


Arg 


Glu 






115 










120 










125 








Lys 


He 


Val 


Gin 


Gin 


Gin 


Ala 


Asn 


Gly 


Gly 


Gly 


Gly 


Gly 


Gin 


Arg 


Gin 


130 










135 










140 










Trp 


Pro 


Gin 


Thr 


Leu 


Asp 


Ser 


Arg 


Phe 


Ala 


Asn 


Met 


Lys 


Glu 


Glu 


Arg 


145 










150 










155 










160 


Met 


Arg 


Met 


Arg 


Arg 


Phe 


Ala 


Asp 


Asn 


Arg 


Ser 


Asn 


Val 


Gly 


Asn 


Asn 










165 










170 










175 




Gly 


Ala 


Gly 


Ser 


His 


Gin 


Gin 


Gin 


Arg 


Ser 


Met 


Val 


Pro 


Trp 


Val 


Arg 




180 










185 










190 






Arg 


Ala 


Thr 


Arg 


Phe 


Pro 


Asn 





















195 

(2) INFORMATION FOR SEQ ID NO: 58: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 197 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..197 

(D) OTHER INFORMATION: / Ceres Seq* ID 1580790 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
Met Ser Leu Asp Glu He He Lys Met Glu Lys Ser Asn Thr Asn Val 
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Asn Lys Gly Lys 
20 



5 10 15 

Lys Gin Arg Val Leu Asn Lys Lys Glu Lys Phe Ser 
25 30 



Gly Ala Ala Lys Asn Ser Ala Val Lys Ala Gin Arg Tyr Met Asp Ser 



35 



40 



45 



Arg Ser Asp Val Arg Gin Gly Ala Phe Ala Lys Lys Arg Ser Asn Phe 



50 55 
Gin Gly Asn Gin Phe 
65 70 



60 

Pro Val Thr Thr Thr Val Ala Arg Lys Ala Ala 



75 



80 



Ser Ala Thr Pro Arg Gly Arg Pro Tyr Asn Gly Gly Arg Met Thr Asn 



90 



95 



Thr Asn Gin Ser Arg Phe lie Ala Pro Pro Ala Gin Asn Arg Ala Ser 



100 



105 



110 



Gin Arg Gly Phe Val Ala Lys Gin Gin Gin Gin Gin Arg Glu Lys lie 



115 



120 



125 



(2) 



Val Gin Gin Gin Ala Asn Gly Gly Gly Gly Gly Gin Arg Gin Trp Pro 

130 135 140 

Gin Thr Leu Asp Ser Arg Phe Ala Asn Met Lys Glu Glu Arg Met Arg 
145 150 155 160 

Met Arg Arg Phe Ala Asp Asn Arg Ser Asn Val Gly Asn Asn Gly Ala 

165 170 175 

Gly Ser His Gin Gin Gin Arg Ser Met Val Pro Trp Val Arg Arg Ala 

180 185 190 

Thr Arg Phe Pro Asn 
195 

INFORMATION FOR SEQ ID NO: 59: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 468 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..468 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580805 
SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgttggtc 
tttctcttct ttctctctct cccttgtagg tttccattga aacagaac 
(2) INFORMATION FOR SEQ ID NO: 60: 



<ii) 
(ix) 



(xi) 



60 
120 
180 
240 
300 
360 
420 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 



(D) TOPOLOGY: 
MOLECULE TYPE: 
FEATURE : 

(A) NAME /KEY : 

(B) LOCATION: 



linear 
peptide 



peptide 
1. . 103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580806 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 

Leu lie lie Val Leu Arg Gin lie Lys lie lie Phe Lys lie Phe 
5 10 15 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly 
20 25 30 



Ser 
1 
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Ser 


Cys Gly 
35 


Cys 


Gly 


Ser 


Ala Cys 
40 


Lys 


Cys 


Gly Asn 


Gly 
45 


Cys 


Gly 


Gly 


Cys 


Lys Arg 
50 


Tyr 


Pro 


Asp 


Leu Glu 
55 


Asn 


Thr 


Ala Thr 
60 


Glu 


Thr 


Leu 


Val 


Leu 


Gly Val 


Ala 


Pro 


Ala 


Met Asn 


Ser 


Gin 


Tyr Glu 


Ala 


Ser 


Gly 


Glu 


65 






70 








75 








8 0 


Thr 


Phe Val 


Ala 


Glu 
85 


Asn 


Asp Ala 




Lys 
90 


Cys Gly 


Ser 


Asp 


Cys 
95 


Lys 


Cys 


Asn Pro 


Cys 
100 


Thr 


Cys 


Lys 
















(2) 


INFORMATION 


FOR 


SEQ 


ID NO: 61: 















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580807 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 



Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 








5 










10 










15 




Gly 


Asn 


Gly 


Cys 


Gly Gly 


Cys 


Lys 


Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 








20 










25 










30 






Ala 


Thr 


Glu 
35 


Thr 


Leu 


Val 


Leu 


Gly 
40 


Val 


Ala 


Pro 


Ala 


Met 

45 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe 


Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


50 










55 










60 










Cys 


Gly 


Ser 


Asp 


Cys 


Lys 


Cys 


Asn 


Pro 


Cys 


Thr 


Cys 


Lys 








65 










70 










75 












(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO: 62: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 477 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1 . . 477 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580822 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
gattattgtt ttaaggcaaa ttaagatcat cttcaaaatc ttctaagatc tcttccaatt 
ttctagaaaa aacatgtctt gctgtggtgg aagctgtggt tgtggatctg cctgcaagtg 
cggcaatggt tgcggaggtt gcaaaaggta ccctgacttg gagaacaccg ccaccgagac 
tcttgtcctc ggtgttgctc cggcgatgaa ctctcagtac gaggcttccg gcgagacttt 
cgttgccgag aatgatgcct gcaaatgcgg atctgactgc aagtgcaacc cttgtacctg 
caaatgaaga acttcataaa ccctaagtct gtaataaccc taatgttatg ttaggtttgc 
ttatatgtaa taattggctg atttttccgg tagttttgcc ggcgacgttg gtctttctct 
tcttcttctt cttctgtgtg tgtttttatg gtttggagaa tgtgaggttc tacgccg 
(2) INFORMATION FOR SEQ ID NO: 63: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 
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(B) LOCATION: 1..7 7 

(D) OTHER INFORMATION: / Ceres Seq, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 



ID 1580823 



Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 






5 








10 










15 




Gly 


Asn 


Gly 


Cys 


Gly 


Gly 


Cys Lys 


Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 






20 








25 










30 






Ala 


Thr 


Glu 
35 


Thr 


Leu 


Val 


Leu Gly 
40 


Val 


Ala 


Pro 


Ala 


Met 
45 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr Phe 


Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


50 










55 








60 










Cys 


Gly 


Ser 


Asp 


Cys 


Lys 


Cys Asn 


Pro 


Cys 


Thr 


Cys 


Lys 








65 










70 








75 












(2) 


INFORMATION 


FOR 


SEQ 


ID NO:64: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..43 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580824 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
Met Leu Gly Leu Leu He Cys Asn Asn Trp Leu He Phe Pro Val Val 
15 10 15 

Leu Pro Ala Thr Leu Val Phe Leu Phe Phe Phe Phe Phe Cys Val Cys 

20 25 30 

Phe Tyr Gly Leu Glu Asn Val Arg Phe Tyr Ala 

35 40 
(2) INFORMATION FOR SEQ ID NO: 65: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1552 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
{ D) TOPOLOGY : linear 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



DNA (genomic) 



(xi) 



1. . 1552 

(D) OTHER INFORMATION: / Ceres Seq 
SEQUENCE DESCRIPTION: SEQ ID NO: 65: 



ID 1580870 



ccccaaaaat 


ctagggtttt 


60 


cttgcgaaga 


tgagcagaca 


120 


cgaggaggat 


cgtgccgggc 


180 


acaacactct 


gatatccttg 


240 


aagagagatt 


cagaaccaat 


300 


tcttgaagct 


aagtatcaaa 


360 


gaatggtgtg 


gtcgaagttg 


420 


taaatcagct 


gaagagaaag 


480 


aattactgcg 


gaagagataa 


540 


caagtggagt 


agggttgaag 


600 


tccttacttc 


aagaacactg 


660 


tatccttgag 


aaggccctcg 


720 


gaagattcta 


aaaaagaagc 


780 


tgaggactgt 


gagagtttct 


840 


ggatcttgat 


gatgacatgg 


900 


cggttcaaca 


atcaaagaga 


960 


tgttgaggca 


gatgaccttg 


1020 


tgaagaggac 


gaggaagatg 


1080 
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atgaggatga cgaggaggat gatgaggatg atgacgagga ggaagaagca gatcaaggaa 
agaagagcaa aaagaagtca tcagctgggc acaagaaggc tggaagaagt caacttgcgg 
aaggtcaagc aggtgagagg ccaccggaat gtaagcagca gtgaagaagt gaagaatctt 
ggcttagtta tgatgaagaa gaagagtgaa gagtgtcttt gagccgaggt tgtgtttctt 
taatttgcag agtcatggtc cggtttatta tatatcagtt ttgggtgatt ggtttgctat 
ttaaaaaaaa aaaatgggtt ctttggtttg gtttgtgtct cttgattttt ccttttgtaa 
tgatcttatg aatttgtttc gagttaatgt cgttctctgg tcagatttcg aattcaattc 
tatttatcct ccctcgttaa tgagagaatt tgtgagacaa tctagtttac tt 
(2) INFORMATION FOR SEQ ID NO: 66: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 371 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..371 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580871 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 
Met Asn Met Ser Asp Leu Ser Thr Ala Leu Asn Glu Glu Asp Arg Ala 
15 10 15 

Gly Leu Val Asn Ala Leu Lys Asn Lys Leu Gin Asn Leu Ala Gly Gin 

20 25 30 

His Ser Asp lie Leu Glu Asn Leu Thr Pro Pro Val Arg Lys Arg Val 

35 40 45 

Glu Phe Leu Arg Glu lie Gin Asn Gin Tyr Asp Glu Met Glu Ala Lys 

50 55 60 

Phe Phe Glu Glu Arg Ala Ala Leu Glu Ala Lys Tyr Gin Lys Leu Tyr 
65 70 75 80 

Gin Pro Leu Tyr Thr Lys Arg Tyr Glu He Val Asn Gly Val Val Glu 

85 90 95 

Val Glu Gly Ala Ala Glu Glu Val Lys Ser Glu Gin Gly Glu Asp Lys 

100 105 HO 

Ser Ala Glu Glu Lys Gly Val Pro Asp Phe Trp Leu He Ala Leu Lys 

115 120 125 

Asn Asn Glu He Thr Ala Glu Glu He Thr Glu Arg Asp Glu Gly Ala 

130 135 140 

Leu Lys Tyr Leu Lys Asp He Lys Trp Ser Arg Val Glu Glu Pro Lys 
145 150 155 160 

Gly Phe Lys Leu Glu Phe Phe Phe Asp Gin Asn Pro Tyr Phe Lys Asn 

165 170 175 

Thr Val Leu Thr Lys Thr Tyr His Met He Asp Glu Asp Glu Pro He 

180 185 190 

Leu Glu Lys Ala Leu Gly Thr Glu He Glu Trp Tyr Pro Gly Lys Cys 

195 200 205 

Leu Thr Gin Lys He Leu Lys Lys Lys Pro Lys Lys Gly Ser Lys Asn 

210 215 220 

Thr Lys Pro He Thr Lys Thr Glu Asp Cys Glu Ser Phe Phe Asn Phe 
225 230 235 240 

Phe Ser Pro Pro Gin Val Pro Asp Asp Asp Glu Asp Leu Asp Asp Asp 

245 250 255 

Met Ala Asp Glu Leu Gin Gly Gin Met Glu His Asp Tyr Asp He Gly 

260 265 270 

Ser Thr He Lys Glu Lys He He Ser His Ala Val Ser Trp Phe Thr 

275 280 285 

Gly Glu Ala Val Glu Ala Asp Asp Leu Asp He Glu Asp Asp Asp Asp 

290 295 300 

Glu He Asp Glu Asp Asp Asp Glu Glu Asp Glu Glu Asp Asp Glu Asp 
305 310 315 320 

Asp Glu Glu Asp Asp Glu Asp Asp Asp Glu Glu Glu Glu Ala Asp Gin 



1140 
1200 
1260 
1320 
1380 
1440 
1500 
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325 330 335 

Gly Lys Lys Ser Lys Lys Lys Ser Ser Ala Gly His Lys Lys Ala Gly 

340 345 350 

Arg Ser Gin Leu Ala Glu Gly Gin Ala Gly Glu Arg Pro Pro Glu Cys 

355 360 365 

Lys Gin Gin 
370 

(2) INFORMATION FOR SEQ ID NO: 67: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 369 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..3 69 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580872 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
Met Ser Asp Leu Ser Thr Ala Leu Asn Glu Glu Asp Arg Ala Gly Leu 
15 10 15 

Val Asn Ala Leu Lys Asn Lys Leu Gin Asn Leu Ala Gly Gin His Ser 

20 25 30 

Asp lie Leu Glu Asn Leu Thr Pro Pro Val Arg Lys Arg Val Glu Phe 

35 40 45 

Leu Arg Glu lie Gin Asn Gin Tyr Asp Glu Met Glu Ala Lys Phe Phe 

50 55 60 

Glu Glu Arg Ala Ala Leu Glu Ala Lys Tyr Gin Lys Leu Tyr Gin Pro 
65 70 75 80 

Leu Tyr Thr Lys Arg Tyr Glu lie Val Asn Gly Val Val Glu Val Glu 

85 90 95 

Gly Ala Ala Glu Glu Val Lys Ser Glu Gin Gly Glu Asp Lys Ser Ala 

100 105 110 

Glu Glu Lys Gly Val Pro Asp Phe Trp Leu lie Ala Leu Lys Asn Asn 

115 120 125 

Glu lie Thr Ala Glu Glu lie Thr Glu Arg Asp Glu Gly Ala Leu Lys 

130 135 140 

Tyr Leu Lys Asp lie Lys Trp Ser Arg Val Glu Glu Pro Lys Gly Phe 
145 150 155 160 

Lys Leu Glu Phe Phe Phe Asp Gin Asn Pro Tyr Phe Lys Asn Thr Val 

165 170 175 

Leu Thr Lys Thr Tyr His Met lie Asp Glu Asp Glu Pro lie Leu Glu 

180 185 190 

Lys Ala Leu Gly Thr Glu He Glu Trp Tyr Pro Gly Lys Cys Leu Thr 

195 200 205 

Gin Lys He Leu Lys Lys Lys Pro Lys Lys Gly Ser Lys Asn Thr Lys 

210 215 220 

Pro He Thr Lys Thr Glu Asp Cys Glu Ser Phe Phe Asn Phe Phe Ser 
225 230 235 240 

Pro Pro Gin Val Pro Asp Asp Asp Glu Asp Leu Asp Asp Asp Met Ala 

245 250 255 

Asp Glu Leu Gin Gly Gin Met Glu His Asp Tyr Asp He Gly Ser Thr 

260 265 270 

He Lys Glu Lys He He Ser His Ala Val Ser Trp Phe Thr Gly Glu 

275 280 285 

Ala Val Glu Ala Asp Asp Leu Asp He Glu Asp Asp Asp Asp Glu He 

290 295 300 

Asp Glu Asp Asp Asp Glu Glu Asp Glu Glu Asp Asp Glu Asp Asp Glu 
305 310 315 320 

Glu Asp Asp Glu Asp Asp Asp Glu Glu Glu Glu Ala Asp Gin Gly Lys 
325 330 335 
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Lys Ser Lys Lys Lys Ser Ser Ala Gly His Lys Lys Ala Gly Arg Ser 

340 345 350 

Gin Leu Ala Glu Gly Gin Ala Gly Glu Arg Pro Pro Glu Cys Lys Gin 
355 360 365 

Gin 

(2) INFORMATION FOR SEQ ID NO: 68 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 311 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

( D } TOPOLOGY : 1 inear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..311 

(D) OTHER INFORMATION: / Ceres Seq, ID 1580873 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

Met Glu Ala Lys Phe Phe Glu Glu Arg Ala Ala Leu Glu Ala Lys Tyr 

15 10 15 

Gin Lys Leu Tyr Gin Pro Leu Tyr Thr Lys Arg Tyr Glu lie Val Asn 

20 25 30 

Gly Val Val Glu Val Glu Gly Ala Ala Glu Glu Val Lys Ser Glu Gin 

35 40 45 

Gly Glu Asp Lys Ser Ala Glu Glu Lys Gly Val Pro Asp Phe Trp Leu 

50 55 60 

He Ala Leu Lys Asn Asn Glu He Thr Ala Glu Glu He Thr Glu Arg 
65 70 75 80 

Asp Glu Gly Ala Leu Lys Tyr Leu Lys Asp He Lys Trp Ser Arg Val 

85 90 95 

Glu Glu Pro Lys Gly Phe Lys Leu Glu Phe Phe Phe Asp Gin Asn Pro 

100 105 HO 

Tyr Phe Lys Asn Thr Val Leu Thr Lys Thr Tyr His Met He Asp Glu 

115 120 125 

Asp Glu Pro He Leu Glu Lys Ala Leu Gly Thr Glu He Glu Trp Tyr 

130 135 140 

Pro Gly Lys Cys Leu Thr Gin Lys He Leu Lys Lys Lys Pro Lys Lys 
145 150 155 160 

Gly Ser Lys Asn Thr Lys Pro He Thr Lys Thr Glu Asp Cys Glu Ser 

165 170 175 

Phe Phe Asn Phe Phe Ser Pro Pro Gin Val Pro Asp Asp Asp Glu Asp 

180 185 190 

Leu Asp Asp Asp Met Ala Asp Glu Leu Gin Gly Gin Met Glu His Asp 

195 200 205 

Tyr Asp He Gly Ser Thr He Lys Glu Lys He He Ser His Ala Val 

210 215 220 

Ser Trp Phe Thr Gly Glu Ala Val Glu Ala Asp Asp Leu Asp He Glu 
225 230 235 240 

Asp Asp Asp Asp Glu He Asp Glu Asp Asp Asp Glu Glu Asp Glu Glu 

245 250 255 

Asp Asp Glu Asp Asp Glu Glu Asp Asp Glu Asp Asp Asp Glu Glu Glu 

260 265 270 

Glu Ala Asp Gin Gly Lys Lys Ser Lys Lys Lys Ser Ser Ala Gly His 

275 280 285 

Lys Lys Ala Gly Arg Ser Gin Leu Ala Glu Gly Gin Ala Gly Glu Arg 

290 295 300 

Pro Pro Glu Cys Lys Gin Gin 
305 310 
(2) INFORMATION FOR SEQ ID NO: 69: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 96 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..496 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580898 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gamgttggtc 
tttstcttct tcttcttctt ctgtgtgtgt ttttatggtt tggaattgta atatttggac 
catggttttt tagtac 

(2) INFORMATION FOR SEQ ID NO:70: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580899 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 



Ser 


Leu 


He 


He 


Val 


Leu 


Arg 


Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


Phe 


1 








5 










10 










15 




Ser 


As P 


Leu 


Phe 


Gin 


Phe 


Ser 


Arg 


Lys 


Asn 


Met 


Ser 


Cys 


Cys 


Gly 


Gly 






20 










25 










30 






Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


Gly 


Asn 


Gly 


Cys 


Gly 


Gly 




35 










40 










45 








Cys 


Lys 
50 


Arg 


Tyr 


Pro 


Asp 


Leu 
55 


Glu 


Asn 


Thr 


Ala 


Thr 
60 


Glu 


Thr 


Leu 


Val 


Leu 


Gly 


Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


65 








70 










75 










80 


Thr 


Phe 


Val 


Ala 


Glu 
85 


Asn 


Asp 


Ala 


Cys 


Lys 
90 


Cys 


Gly 


Ser 


Asp 


Cys 
95 


Lys 


Cys 


Asn 


Pro 


Cys 
100 


Thr 


Cys 


Lys 




















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO : 7 1 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1. .77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580900 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 
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35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 
65 70 75 

(2) INFORMATION FOR SEQ ID NO: 72: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 501 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. .501 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580910 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg caatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgttggtc 
tttctcttct tcttcttctt ctgtgtgtgt ttttatggtt tgctgtctgt ttttatttta 
atcaatggag ctatttgtta c 
(2) INFORMATION FOR SEQ ID NO: 73: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
peptide 



60 
120 
180 
240 
300 
360 
420 
480 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY: 

(B) LOCATION: 



(xi) 



peptide 
1. .103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580911 
SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
Ser Leu He He Val Leu Arg Gin He Lys He He Phe Lys He Phe 
15 10 15 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly 

20 25 30 

Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys Gly Gly 

35 40 45 

Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr Ala Thr Glu Thr Leu Val 

50 55 60 

Leu Gly Val Ala Pro Ala Met Asn Ser Gin Tyr Glu Ala Ser Gly Glu 



65 



70 



75 



80 



Thr Phe Val Ala Glu Asn Asp Ala Cys Lys Cys Gly Ser Asp Cys Lys 

85 90 95 

Cys Asn Pro Cys Thr Cys Lys 
100 

(2) INFORMATION FOR SEQ ID NO: 74: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

( D ) TOPOLOGY : linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 
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(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580912 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 4 : 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 
65 70 75 

(2) INFORMATION FOR SEQ ID NO: 75: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1591 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..1591 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580921 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
attgacgacg agacatagac gagagagacc ctctcaaatc tcaagctcct ttccttcctt 
ccttcgatct acactctcag attctctctc tctctttcct ctctctttgt cttctctcag 
gttttgtaat ggcggaatct tccaaagtcg ttcatgttcg taatgttggt catgagattt 
ctgaaaatga tttgcttcag ctttttcaac catttggtgt tataactaag cttgtgatgc 
tcagggcaaa gaatcaggct cttctccaaa tgcaagatgt ttcttctgca gttagtgctc 
ttcagttttt tactaatgtt cagccaacta taaggaatgt gtatgtccaa ttctcttctc 
atcaagaatt gacaacaata gagcagaata ttcatggaag ggaagatgag ccgaaccgta 
ttctcttagt cacaatccat cacatgctgt atccaattac tgttgacgtc ctgcatcaag 
ttttttctcc ctatggattt gtcgagaagc tcgtcacttt ccagaagtct gctggttttc 
aagctcttat acagtatcag gtacaacagt gtgctgcctc tgctagaact gctctacagg 
gtcgtaacat atatgacggg tgttgtcagt tggatatcca gttctcaaac cttgaggagc 
tgcaagtaaa ttacaacaat gatcgatcta gggattatac aaatccaaac ctgcctgcag 
aacaaaaagg ccgttcatcg catcctggct atggtgatac aggagtggca tatcctcaga 
tggcaaacac atcagcgatt gcagctgcct ttggaggagg cttgcctccg ggtataaccg 
gcacaaatga taggtgtaca gtccttgtct ctaacctgaa tgcagatagt attgatgaag 
ataagctatt caatctattt tctctttacg ggaacattgt aaggattaag cttcttcgga 
acaaacctga ccatgccctt gtccaaatgg gcaatggttt ccaggctgaa cttgcagtac 
atttcctcaa gggagcaatg ctgtttggta agcggttaga agtaaacttc tcaaagcatc 
caaatataac accaggcaca gactctcatg attatgtaaa ctctaacctg aaccgcttca 
accgcaatgc tgcaaagaat taccgctatt gctgctcccc tacaaagatg attcacttat 
cgactcttcc tcaagacgtg acagaggagg aagtgatgaa ccatgtccaa gaacacggtg 
cagtagtgaa cacaaaagtg tttgagatga acgggaaaaa gcaagctctg gtacagtttg 
agaacgagga ggaagctgcg gaagcattgg tatgcaagca cgcgacttct ctaggcggat 
caattatcag aatctccttc tcccagctac agacgattta aaatcacgaa agacctctct 
ctctctctct ttctctatct ctctttcctc taataacatt agggtttggt gaggattaat 
gaatgtatgt tgcttttaac ttaaaactct ctgctctttg ttctgttctc tctgtttttt 
ctttttaacg tgttggtttt aattgtttgt c 
(2) INFORMATION FOR SEQ ID NO: 76: 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 430 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME /KEY : peptide 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
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(B) LOCATION: 1..430 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580922 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 
Met Ala Glu Ser Ser Lys Val Val His Val Arg Asn Val Gly His Glu 
15 10 15 

lie Ser Glu Asn Asp Leu Leu Gin Leu Phe Gin Pro Phe Gly Val lie 

20 25 30 

Thr Lys Leu Val Met Leu Arg Ala Lys Asn Gin Ala Leu Leu Gin Met 

35 40 45 

Gin Asp Val Ser Ser Ala Val Ser Ala Leu Gin Phe Phe Thr Asn Val 

50 55 60 

Gin Pro Thr lie Arg Asn Val Tyr Val Gin Phe Ser Ser His Gin Glu 
65 70 75 80 

Leu Thr Thr lie Glu Gin Asn lie His Gly Arg Glu Asp Glu Pro Asn 

85 90 95 

Arg lie Leu Leu Val Thr lie His His Met Leu Tyr Pro lie Thr Val 

100 105 110 

Asp Val Leu His Gin Val Phe Ser Pro Tyr Gly Phe Val Glu Lys Leu 

115 120 125 

Val Thr Phe Gin Lys Ser Ala Gly Phe Gin Ala Leu lie Gin Tyr Gin 

130 135 140 

Val Gin Gin Cys Ala Ala Ser Ala Arg Thr Ala Leu Gin Gly Arg Asn 
145 150 155 160 

lie Tyr Asp Gly Cys Cys Gin Leu Asp lie Gin Phe Ser Asn Leu Glu 

165 170 175 

Glu Leu Gin Val Asn Tyr Asn Asn Asp Arg Ser Arg Asp Tyr Thr Asn 

180 185 190 

Pro Asn Leu Pro Ala Glu Gin Lys Gly Arg Ser Ser His Pro Gly Tyr 

195 200 205 

Gly Asp Thr Gly Val Ala Tyr Pro Gin Met Ala Asn Thr Ser Ala lie 

210 215 220 

Ala Ala Ala Phe Gly Gly Gly Leu Pro Pro Gly lie Thr Gly Thr Asn 
225 230 235 240 

Asp Arg Cys Thr Val Leu Val Ser Asn Leu Asn Ala Asp Ser lie Asp 

245 250 255 

Glu Asp Lys Leu Phe Asn Leu Phe Ser Leu Tyr Gly Asn lie Val Arg 

260 265 270 

lie Lys Leu Leu Arg Asn Lys Pro Asp His Ala Leu Val Gin Met Gly 

275 280 285 

Asn Gly Phe Gin Ala Glu Leu Ala Val His Phe Leu Lys Gly Ala Met 

290 295 300 

Leu Phe Gly Lys Arg Leu Glu Val Asn Phe Ser Lys His Pro Asn lie 
305 310 315 320 

Thr Pro Gly Thr Asp Ser His Asp Tyr Val Asn Ser Asn Leu Asn Arg 

325 330 335 

Phe Asn Arg Asn Ala Ala Lys Asn Tyr Arg Tyr Cys Cys Ser Pro Thr 

340 345 350 

Lys Met lie His Leu Ser Thr Leu Pro Gin Asp Val Thr Glu Glu Glu 

355 360 365 

Val Met Asn His Val Gin Glu His Gly Ala Val Val Asn Thr Lys Val 

370 375 380 

Phe Glu Met Asn Gly Lys Lys Gin Ala Leu Val Gin Phe Glu Asn Glu 
385 390 395 400 

Glu Glu Ala Ala Glu Ala Leu Val Cys Lys His Ala Thr Ser Leu Gly 

405 410 415 

Gly Ser lie lie Arg lie Ser Phe Ser Gin Leu Gin Thr lie 
420 425 430 

(2) INFORMATION FOR SEQ ID NO: 77: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 394 amino acids 
( B } TYPE: amino acid 
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(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..3 94 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580923 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
Met Leu Arg Ala Lys Asn Gin Ala Leu Leu Gin Met Gin Asp Val Ser 
15 10 15 

Ser Ala Val Ser Ala Leu Gin Phe Phe Thr Asn Val Gin Pro Thr lie 

20 25 30 

Arg Asn Val Tyr Val Gin Phe Ser Ser His Gin Glu Leu Thr Thr lie 

35 40 45 

Glu Gin Asn lie His Gly Arg Glu Asp Glu Pro Asn Arg He Leu Leu 

50 55 60 

Val Thr He His His Met Leu Tyr Pro He Thr Val Asp Val Leu His 
65 70 75 80 

Gin Val Phe Ser Pro Tyr Gly Phe Val Glu Lys Leu Val Thr Phe Gin 

85 90 95 

Lys Ser Ala Gly Phe Gin Ala Leu He Gin Tyr Gin Val Gin Gin Cys 

100 105 HO 

Ala Ala Ser Ala Arg Thr Ala Leu Gin Gly Arg Asn He Tyr Asp Gly 

115 120 125 

Cys Cys Gin Leu Asp He Gin Phe Ser Asn Leu Glu Glu Leu Gin Val 

130 135 140 

Asn Tyr Asn Asn Asp Arg Ser Arg Asp Tyr Thr Asn Pro Asn Leu Pro 
145 150 155 160 

Ala Glu Gin Lys Gly Arg Ser Ser His Pro Gly Tyr Gly Asp Thr Gly 

165 170 175 

Val Ala Tyr Pro Gin Met Ala Asn Thr Ser Ala He Ala Ala Ala Phe 

180 185 190 

Gly Gly Gly Leu Pro Pro Gly He Thr Gly Thr Asn Asp Arg Cys Thr 

195 200 205 

Val Leu Val Ser Asn Leu Asn Ala Asp Ser He Asp Glu Asp Lys Leu 

210 215 220 

Phe Asn Leu Phe Ser Leu Tyr Gly Asn He Val Arg He Lys Leu Leu 
225 230 235 240 

Arg Asn Lys Pro Asp His Ala Leu Val Gin Met Gly Asn Gly Phe Gin 

245 250 255 

Ala Glu Leu Ala Val His Phe Leu Lys Gly Ala Met Leu Phe Gly Lys 

260 265 270 

Arg Leu Glu Val Asn Phe Ser Lys His Pro Asn He Thr Pro Gly Thr 

275 280 285 

Asp Ser His Asp Tyr Val Asn Ser Asn Leu Asn Arg Phe Asn Arg Asn 

290 295 300 

Ala Ala Lys Asn Tyr Arg Tyr Cys Cys Ser Pro Thr Lys Met He His 
305 310 315 320 

Leu Ser Thr Leu Pro Gin Asp Val Thr Glu Glu Glu Val Met Asn His 

325 330 335 

Val Gin Glu His Gly Ala Val Val Asn Thr Lys Val Phe Glu Met Asn 

340 345 350 

Gly Lys Lys Gin Ala Leu Val Gin Phe Glu Asn Glu Glu Glu Ala Ala 

355 360 365 

Glu Ala Leu Val Cys Lys His Ala Thr Ser Leu Gly Gly Ser He He 

370 375 380 

Arg He Ser Phe Ser Gin Leu Gin Thr He 
385 390 
(2) INFORMATION FOR SEQ ID NO: 78: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 383 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY; linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..383 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580924 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 
Met Gin Asp Val Ser Ser Ala Val Ser Ala Leu Gin Phe Phe Thr Asn 
15 10 15 

Val Gin Pro Thr lie Arg Asn Val Tyr Val Gin Phe Ser Ser His Gin 

20 25 30 

Glu Leu Thr Thr lie Glu Gin Asn lie His Gly Arg Glu Asp Glu Pro 

35 40 45 

Asn Arg lie Leu Leu Val Thr lie His His Met Leu Tyr Pro He Thr 

50 55 60 

Val Asp Val Leu His Gin Val Phe Ser Pro Tyr Gly Phe Val Glu Lys 
65 70 75 80 

Leu Val Thr Phe Gin Lys Ser Ala Gly Phe Gin Ala Leu He Gin Tyr 

85 90 95 

Gin Val Gin Gin Cys Ala Ala Ser Ala Arg Thr Ala Leu Gin Gly Arg 

100 105 HO 

Asn He Tyr Asp Gly Cys Cys Gin Leu Asp He Gin Phe Ser Asn Leu 

115 120 125 

Glu Glu Leu Gin Val Asn Tyr Asn Asn Asp Arg Ser Arg Asp Tyr Thr 

130 135 140 

Asn Pro Asn Leu Pro Ala Glu Gin Lys Gly Arg Ser Ser His Pro Gly 
145 150 155 160 

Tyr Gly Asp Thr Gly Val Ala Tyr Pro Gin Met Ala Asn Thr Ser Ala 

165 170 175 

He Ala Ala Ala Phe Gly Gly Gly Leu Pro Pro Gly He Thr Gly Thr 

180 185 190 

Asn Asp Arg Cys Thr Val Leu Val Ser Asn Leu Asn Ala Asp Ser He 

195 200 205 

Asp Glu Asp Lys Leu Phe Asn Leu Phe Ser Leu Tyr Gly Asn He Val 

210 215 220 

Arg He Lys Leu Leu Arg Asn Lys Pro Asp His Ala Leu Val Gin Met 
225 230 235 240 

Gly Asn Gly Phe Gin Ala Glu Leu Ala Val His Phe Leu Lys Gly Ala 

245 250 255 

Met Leu Phe Gly Lys Arg Leu Glu Val Asn Phe Ser Lys His Pro Asn 

260 265 270 

He Thr Pro Gly Thr Asp Ser His Asp Tyr Val Asn Ser Asn Leu Asn 

275 280 285 

Arg Phe Asn Arg Asn Ala Ala Lys Asn Tyr Arg Tyr Cys Cys Ser Pro 

290 295 300 

Thr Lys Met He His Leu Ser Thr Leu Pro Gin Asp Val Thr Glu Glu 
305 310 315 320 

Glu Val Met Asn His Val Gin Glu His Gly Ala Val Val Asn Thr Lys 

325 330 335 

Val Phe Glu Met Asn Gly Lys Lys Gin Ala Leu Val Gin Phe Glu Asn 

340 345 350 

Glu Glu Glu Ala Ala Glu Ala Leu Val Cys Lys His Ala Thr Ser Leu 

355 360 365 

Gly Gly Ser He He Arg He Ser Phe Ser Gin Leu Gin Thr He 

370 375 380 

(2) INFORMATION FOR SEQ ID NO: 79: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 601 base pairs 

(B) TYPE: nucleic acid 
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(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO 


i:80: 












Met 


Ala 


Pro 


Lys 


Lys 


Asp 


Lys Val 


Pro 


Pro 


Pro 


Ser 


Ser 


Lys 


Pro 


Ala 


1 








5 








10 










15 




Lys 


Ser 


Gly 


Gly 


Gly 


Lys 


Gin Lys 


Lys 


Lys 


Lys 


Trp 


Ser 


Lys 


Gly 


Lys 






20 








25 










30 






Gin 


Lys 


Glu 


Lys 


Val 


Asn 


Asn Met 


Val 


Leu 


Phe 


Asp 


Gin 


Ala 


Thr 


Tyr 




35 






40 










45 








Asp 


Lys 


Leu 


Leu 


Thr 


Glu 


Ala Pro 


Lys 


Phe 


Lys 


Leu 


He 


Thr 


Pro 


Ser 


50 










55 








60 










He 


Leu 


Ser 


Asp 


Arg 


Met 


Arg He 


Asn 


Gly 


Ser 


Leu 


Ala 


Arg 


Arg 


Ala 


65 










70 








75 










80 


He 


Arg 


Glu 


Leu 


Met 


Ala 


Lys Gly 


Val 


He 


Arg 


Met 


Val 


Ala 


Ala 


His 








85 








90 










95 




Ser 


Ser 


Gin 


Gin 
100 


lie 


Tyr 


Thr Arg 


Ala 
105 


Thr 


Asn 


Thr 










(2) 


INFORMATION 


FOR 


SEQ 


ID NO:8 


1: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 69 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..69 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580934 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 
Met Val Leu Phe Asp Gin Ala Thr Tyr Asp Lys Leu Leu Thr Glu Ala 
15 10 15 

Pro Lys Phe Lys Leu He Thr Pro Ser He Leu Ser Asp Arg Met Arg 
20 25 30 



60 
120 
180 



300 
360 



(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..601 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580932 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:79: 
atagatattg acctcaaact tagggaaacc ctaatataag agcaaaccct agcagcgcta 
tctccctcag cctcatcttt cacccactac tattcgccgt tcgcttttcc ctcgccgtcg 
aaatcaacat ggcgccgaag aaggataagg ttccgccacc gtcatcgaag ccggcgaaat 
ccggaggtgg aaagcagaag aagaagaagt ggagcaaggg aaagcaaaag gagaaggtga 24 0 

acaacatggt gctgtttgat caagctactt atgacaagct tctcactgaa gctcccaagt 
tcaagctcat caccccttcc attctctctg accgtatgag gatcaatggg tcgctagcaa 
ggagagcgat tagggagcta atggccaaag gtgtgatcag gatggtcgct gctcactcga 420 
gccagcagat ctacactcgt gccaccaaca cctaagatgt ttggtttctt tatccaccta 480 
agatgtttgg cttctttttc ttatctcttg ttgtttgatt cttcaggaac ttgttagaat 540 
cttttgtttc gtattccgtt tttttgagat atatggttat ttaaaacgca tcaagacttc 600 
c 

(2) INFORMATION FOR SEQ ID NO: 80: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 108 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1. .108 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580933 
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lie Asn Gly Ser Leu Ala Arg Arg Ala lie Arg Glu Leu Met Ala Lys 

35 40 45 

Gly Val He Arg Met Val Ala Ala His Ser Ser Gin Gin He Tyr Thr 

50 55 60 

Arg Ala Thr Asn Thr 
65 

(2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1206 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1206 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580939 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 
attcagaaaa attttggact tattttcgat atgtttaggg gaaaaatatt 
ttattttcta tttacgtcct ctcatcagcg gctgaaaaaa atacatcatt 
tttgcttttg gtgattcagt cttagatacg ggaaacaata attttcttct 
aaaggaaatt attggccata tggtttgagt tttgactata aatttccgac 
ggaaatggaa gagttttcac cgatatagtt gccgaaggtt tacagatcaa 
ccagcttata gcaagattcg tcgcatcagt tctgaagacc ttaagactgg 
gcatcaggtg gttcaggaat tgacgatctt acatcccgaa cactgagagt 
ggtgaccaag taaaagactt taaagactat ttgaagaaac taagaagagt 
atgaaaaaag taaaagaaat agtttcaaac gctgtgtttc ttatttctga 
gatcttggat actttgtcgc tcccgctctt cttcgattac agtccacaac 
agtaaaatgg ttgtttggac cagaaagttt ctaaaagatt tatatgatct 
aaatttgcgg tgatgggagt gatgcctgtg ggatgtttgc ctatccatcg 
ggtggggtat tcggatggtg taacttcctc ttgaatagag ttacggagga 
aaattgcaga aaggtcttac aagttacgca gtagaatatg atttcaaaga 
gtttacgtcg acatatacgg tactcttatg gatcttgtca aaaatcctat 
tttacagaag cgaaaaaagc ttgttgttgt atgccgaatg caataatacc 
ccagataaat acgtcttcta tgactttgct cacccttccc agaaagccta 
tctaaaccta ttgtctatca gatagcgaaa ggccttgcct agctaattcc 
caatgtttta ttataatctt tgaaaattat caccatatcg ataaatatat 
aagattcatg acaatctatt cgtttttctt attgtaaaca atgtataatt 
gctttc 

(2) INFORMATION FOR SEQ ID NO: 83: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 353 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1. .353 







(D) OTHER 


INFORMATION: 


/ Ceres 


Seq 


. ID 15E 


30940 






(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO 


: 83: 








He 


Gin 


Lys Asn Phe 


Gly Leu 


He 


Phe Asp 


Met 


Phe Arg 


Gly Lys 


He 


1 




5 






10 






15 




Phe 


Val 


Leu Ser Leu 


Phe Ser 


He 


Tyr Val 


Leu 


Ser Ser 


Ala Ala 


Glu 






20 






25 






30 




Lys 


Asn 


Thr Ser Phe 


Ser Ala 


Leu 


Phe Ala 


Phe 


Gly Asp 


Ser Val 


Leu 




35 




40 






45 






Asp 


Thr 


Gly Asn Asn 


Asn Phe 


Leu 


Leu Thr 


Leu 


Leu Lys 


Gly Asn 


Tyr 


50 


55 








60 






Trp 


Pro 


Tyr Gly Leu 


Ser Phe 


Asp 


Tyr Lys 


Phe 


Pro Thr 


Gly Arg 


Phe 


65 






70 






75 






80 



tgtgttatca 60 

ttctgcgctt 120 

tactcttctc 180 

gggaagattc 240 

aagacttgta 300 

cgtttgtttt 360 

tttatcggca 420 

agtgaaacgt 480 

aggaaataat 54 0 

aacatacacc 600 

tggagcgaga 660 

agcctcattc 720 

cttcaacatg 780 

tgcaaagttt 840 

ggcctatggg 900 

atgcttccat 960 

cgaagtaata 1020 

ttatttccta 1080 

gtaaatgtat 1140 

caaattacag 1200 
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Gly 


Asn 


Gly 


Arg 


Val 


Phe 


Thr 


Asp 


He 


Val 


Ala 


Glu 


Gly 


Leu 


Gin 


He 










85 










90 










95 




Lys 


Arg 


Leu 


Val 


Pro 


Ala 


Tyr 


Ser 


Lys 


He 


Arg 


Arg 


He 


Ser 


Ser 


Glu 




100 










105 










110 






Asp 


Leu 


Lys 


Thr 


Gly 


Val 


Cys 


Phe 


Ala 


Ser 


Gly 


Gly 


Ser 


Gly 


He 


Asp 




115 










120 










125 








Asp 


Leu 


Thr 


Ser 


Arg 


Thr 


Leu 


Arg 


Val 


Leu 


Ser 


Ala 


Gly 


Asp 


Gin 


Val 




130 










135 










140 










Lys 


Asp 


Phe 


Lys 


Asp 


Tyr 


Leu 


Lys 


Lys 


Leu 


Arg 


Arg 


Val 


Val 


Lys 


Arg 


145 










150 










155 










160 


Met 


Lys 


Lys 


Val 


Lys 


Glu 


He 


Val 


Ser 


Asn 


Ala 


Val 


Phe 


Leu 


He 


Ser 










165 










170 










175 




Glu 


Gly 


Asn 


Asn 


Asp 


Leu 


Gly 


Tyr 


Phe 


Val 


Ala 


Pro 


Ala 


Leu 


Leu 


Arg 








180 










185 










190 






Leu 


Gin 


Ser 


Thr 


Thr 


Thr 


Tyr 


Thr 


Ser 


Lys 


Met 


Val 


Val 


Trp 


Thr 


Arg 






195 










200 










205 








Lys 


Phe 


Leu 


Lys 


Asp 


Leu 


Tyr 


Asp 


Leu 


Gly 


Ala 


Arg 


Lys 


Phe 


Ala 


Val 




210 










215 










220 










Met 


Gly 


Val 


Met 


Pro 


Val 


Gly 


Cys 


Leu 


Pro 


He 


His 


Arg 


Ala 


Ser 


Phe 


225 








230 










235 










240 


Gly 


Gly 


Val 


Phe 


Gly 


Trp 


Cys 


Asn 


Phe 


Leu 


Leu 


Asn 


Arg 


Val 


Thr 


Glu 










245 










250 










255 




Asp 


Phe 


Asn 


Met 


Lys 


Leu 


Gin 


Lys 


Gly 


Leu 


Thr 


Ser 


Tyr 


Ala 


Val 


Glu 






260 










265 










270 






Tyr 


Asp 


Phe 


Lys 


Asp 


Ala 


Lys 


Phe 


Val 


Tyr 


Val 


Asp 


He 


Tyr 


Gly 


Thr 






275 










280 










285 








Leu 


Met 


Asp 


Leu 


Val 


Lys 


Asn 


Pro 


Met 


Ala 


Tyr 


Gly 


Phe 


Thr 


Glu 


Ala 




290 










295 










300 










Lys 


Lys 


Ala 


Cys 


Cys 


Cys 


Met 


Pro 


Asn 


Ala 


He 


He 


Pro 


Cys 


Phe 


His 


305 






310 










315 










320 


Pro 


Asp 


Lys 


Tyr 


Val 


Phe 


Tyr 


Asp 


Phe 


Ala 


His 


Pro 


Ser 


Gin 


Lys 


Ala 










325 










330 










335 




Tyr 


Glu 


Val 


He 


Ser 


Lys 


Pro 


He 


Val 


Tyr 


Gin 


He 


Ala 


Lys 


Gly 


Leu 



340 345 350 

Ala 

(2) INFORMATION FOR SEQ ID NO: 84: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 343 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..343 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580941 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 



Met 


Phe 


Arg 


Gly 


Lys 


He 


Phe 


Val 


Leu 


Ser 


Leu 


Phe 


Ser 


He 


Tyr 


Val 


1 








5 










10 










15 




Leu 


Ser 


Ser 


Ala 


Ala 


Glu 


Lys 


Asn 


Thr 


Ser 


Phe 


Ser 


Ala 


Leu 


Phe 


Ala 








20 










25 










30 






Phe 


Gly Asp 


Ser 


Val 


Leu 


Asp 


Thr 


Gly 


Asn 


Asn 


Asn 


Phe 


Leu 


Leu 


Thr 






35 










40 










45 








Leu 


Leu 


Lys 


Gly 


Asn 


Tyr 


Trp 


Pro 


Tyr 


Gly 


Leu 


Ser 


Phe 


Asp 


Tyr 


Lys 




50 








55 










60 










Phe 


Pro 


Thr 


Gly 


Arg 


Phe 


Gly 


Asn 


Gly Arg 


Val 


Phe 


Thr 


Asp 


He 


Val 


65 










70 










75 










80 


Ala 


Glu 


Gly 


Leu 


Gin 


He 


Lys 


Arg 


Leu 


Val 


Pro 


Ala 


Tyr 


Ser 


Lys 


He 








85 










90 










95 




Arg 


Arg 


He 


Ser 


Ser 


Glu 


Asp 


Leu 


Lys 


Thr 


Gly 


Val 


Cys 


Phe 


Ala 


Ser 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 2 
Page 45 









100 










105 










110 






Gly 


Gly 


Ser 


Gly 


He 


Asp 


Asp 


Leu 


Thr 


Ser 


Arg 


Thr 


Leu 


Arg 


Val 


Leu 






115 










120 










125 








Ser 


Ala 


Gly 


Asp 


Gin 


Val 


Lys 


Asp 


Phe 


Lys 


Asp 


Tyr 


Leu 


Lys 


Lys 


Leu 




130 










135 










140 










Arg 


Arg 


Val 


Val 


Lys 


Arg 


Met 


Lys 


Lys 


Val 


Lys 


Glu 


He 


Val 


Ser 


Asn 


145 










150 










155 










160 


Ala 


Val 


Phe 


Leu 


He 


Ser 


Glu 


Gly 


Asn 


Asn 


Asp 


Leu 


Gly 


Tyr 


Phe 


Val 










165 










170 










175 




Ala 


Pro 


Ala 


Leu 


Leu 


Arg 


Leu 


Gin 


Ser 


Thr 


Thr 


Thr 


Tyr 


Thr 


Ser 


Lys 








180 










185 










190 






Met 


Val 


Val 


Trp 


Thr 


Arg 


Lys 


Phe 


Leu 


Lys 


Asp 


Leu 


Tyr 


Asp 


Leu 


Gly 






195 










200 










205 








Ala 


Arg 


Lys 


Phe 


Ala 


Val 


Met 


Gly 


Val 


Met 


Pro 


Val 


Gly 


Cys 


Leu 


Pro 




210 










215 










220 










He 


His 


Arg 


Ala 


Ser 


Phe 


Gly 


Gly 


Val 


Phe 


Gly 


Trp 


Cys 


Asn 


Phe 


Leu 


225 










230 










235 










240 


Leu 


Asn 


Arg 


Val 


Thr 


Glu 


Asp 


Phe 


Asn 


Met 


Lys 


Leu 


Gin 


Lys 


Gly 


Leu 










245 










250 










255 




Thr 


Ser 


Tyr 


Ala 


Val 


Glu 


Tyr 


Asp 


Phe 


Lys 


Asp 


Ala 


Lys 


Phe 


Val 


Tyr 








260 










265 










270 






Val 


Asp 


He 


Tyr 


Gly 


Thr 


Leu 


Met 


Asp 


Leu 


Val 


Lys 


Asn 


Pro 


Met 


Ala 






275 










280 










285 








Tyr 


Gly 


Phe 


Thr 


Glu 


Ala 


Lys 


Lys 


Ala 


Cys 


Cys 


Cys 


Met 


Pro 


Asn 


Ala 




290 










295 










300 










He 


He 


Pro 


Cys 


Phe 


His 


Pro 


Asp 


Lys 


Tyr 


Val 


Phe 


Tyr 


Asp 


Phe 


Ala 


305 










310 










315 










320 


His 


Pro 


Ser 


Gin 


Lys 


Ala 


Tyr 


Glu 


Val 


He 


Ser 


Lys 


Pro 


He 


Val 


Tyr 










325 










330 










335 




Gin 


He 


Ala 


Lys 


Gly 


Leu 


Ala 





















340 



(2) INFORMATION FOR SEQ ID NO: 85: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 193 amino acids 

(B) TYPE: amino acid 
CC) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..193 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580942 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 
Met Lys Lys Val Lys Glu He Val Ser Asn Ala Val Phe Leu He Ser 
15 10 15 

Glu Gly Asn Asn Asp Leu Gly Tyr Phe Val Ala Pro Ala Leu Leu Arg 

20 25 30 

Leu Gin Ser Thr Thr Thr Tyr Thr Ser Lys Met Val Val Trp Thr Arg 

35 40 45 

Lys Phe Leu Lys Asp Leu Tyr Asp Leu Gly Ala Arg Lys Phe Ala Val 

50 55 60 

Met Gly Val Met Pro Val Gly Cys Leu Pro He His Arg Ala Ser Phe 
65 70 75 80 

Gly Gly Val Phe Gly Trp Cys Asn Phe Leu Leu Asn Arg Val Thr Glu 

85 90 95 

Asp Phe Asn Met Lys Leu Gin Lys Gly Leu Thr Ser Tyr Ala Val Glu 

100 105 HO 

Tyr Asp Phe Lys Asp Ala Lys Phe Val Tyr Val Asp He Tyr Gly Thr 

115 120 125 

Leu Met Asp Leu Val Lys Asn Pro Met Ala Tyr Gly Phe Thr Glu Ala 
130 135 140 
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Lys Lys Ala Cys Cys Cys Met Pro 
145 150 
Pro Asp Lys Tyr Val Phe Tyr Asp 
165 

Tyr Glu Val lie Ser Lys Pro lie 
180 

Ala 



Asn Ala lie lie Pro Cys Phe His 
155 160 
Phe Ala His Pro Ser Gin Lys Ala 

170 175 
Val Tyr Gin lie Ala Lys Gly Leu 
185 190 



(2) INFORMATION FOR SEQ ID NO: 86: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 499 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..4 99 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580968 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 60 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 120 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 180 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 240 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 300 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 360 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgttggtc 420 
tttctcttct tcttcttctt ctgtgtgtgt ttttatggtt tgtgacccaa taaatacgat 480 
tgttatttga ttaaggacc 
(2) INFORMATION FOR SEQ ID NO: 87: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 
<B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580969 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 



Ser 


Leu 


He 


He 


Val 


Leu 


Arg 


Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


Phe 


1 








5 










10 










15 




Ser 


Asp 


Leu 


Phe 


Gin 


Phe 


Ser 


Arg 


Lys 


Asn 


Met 


Ser 


Cys 


Cys 


Gly 


Gly 






20 










25 










30 






Ser 


Cys 


Gly 
35 


Cys 


Gly 


Ser 


Ala 


Cys 
40 


Lys 


Cys 


Gly 


Asn 


Gly 
45 


Cys 


Gly 


Gly 


Cys 


Lys 
50 


Arg 


T Y r 


Pro 


Asp 


Leu 
55 


Glu 


Asn 


Thr 


Ala 


Thr 
60 


Glu 


Thr 


Leu 


Val 


Leu 


Gly 


Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


65 








70 










75 










80 


Thr 


Phe 


Val 


Ala 


Glu 
85 


Asn 


Asp 


Ala 


Cys 


Lys 

90 


Cys 


Gly 


Ser 


Asp 


Cys 
95 


Lys 


Cys 


Asn 


Pro 


Cys 
100 


Thr 


Cys 


Lys 




















(2) 


INFORMATION 


FOR 


SEQ 


ID : 


NO: 8i 


3 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 2 
Page 47 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580970 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 
65 70 75 

(2) INFORMATION FOR SEQ ID NO: 89: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2098 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..2098 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580983 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 
attcatcaaa agccacataa gaagttgatg cagaagcaat aataacaaaa aacattaaaa 
aaaaacctct tcagttttgg tttctgcagt ctccaatggc tctcaagctc tttctctttg 
ctcttcttct ctgtctcccg acttctctct cctccactgc ctctaaaggt aaagagaaga 
agtcgaagtt taatccatac agatacacat tcatcgataa agcaagcaca ttctcatcat 
cttcctcatc ttcattttca tccaacggtc aagattcgtc ctacgactac atagtcatcg 
gaggtggaac cgcaggtgtc ctctcgccgc aacgttgtcg cagaatttca gcgttcttgt 
tttagagaga ggtggcgttc cgtttacaaa cgcaaacgtt tctttcctca ggaattttca 
catcggactt gctgacattt cagcttcttc cgcgtctcaa gcgtttgttt ccactgacgg 
cgtttacaac gctcgtgcta gagttctcgg tggcggttcc tgtattaacg ccggttttta 540 
ctccagagcc gatgctgcgt tcgtgaagcg agcaggatgg gatccgaagc tggtgaagga 
atcgtatcca tgggtggaga gagagattgt tcatcagcca aagttaacgt tatggcagaa 
aqctctcaga gacagtcttt tagaggttgg agtcagacct ttcaatggtt tcacctacga 720 

780 
840 
900 
960 
1020 
1080 



60 
120 
180 
240 
300 
360 
420 
480 



600 
660 



tcacgtctcc ggaaccaaaa tcggcgggac aattttcgat agattcggcc gtcgtcacac 
ggcggcggag cttctcgctt acgctaatcc tcagaagctt agagtcttga tctacgccac 
tgtgcaaaaa atcgtctttg acacttcagg aacaaggcct agagtaacag gagtgatatt 
caaagatgag aaaggtaatc aacaccaggc tttactctcg aatagaaagg gaagtgaagt 
gatattatct agtggagcta ttgggtgacc acagatgctg atgttaagtg ggattggacc 
taagaaggag cttcagaggc tgaagattcc tgtggtttta gagaatgagc atgtagggaa 
aggaatggct gataatccca tgaacacgat cttggtacct tcaaaggcgc ctatagagca 1140 
gtcacttatt cagactgttg gaattacaaa gatgggtgtg tatgttgaag ccagtactgg 1200 
ctttgggcaa tctcctgaga gtattcatac tcactatggg attatgtcga acaagaatga 1260 
attgttttcc accattcctg caaagcagag aagaccagaa gcaacgcaag cttacatcac 1320 
aagaaacaaa taccaacttc acgaagcatt caatggaagt ttcatcttgg agaaacttgc 1380 
ttacccgatc tctagagggc atttgagctt ggtcaacaca aatgttgatg acaacccttc 1440 
agtcaccttc aattacttta aacacccggt ggatctccaa cgctgtgttg aagccattcg 
tcttgtttcc aaagtggtga cgtctaagcg tttcttaaac tacacgcagt gtgacaagca 
aaacgtacac aagatgctta gcttaagcgt caaggcaaac atcaatctaa ggccaaagca 1620 
actgaacgat accaaatcaa tggctcagtt ctgcaaagac actgttgtca caatctggca 1680 
ctaccatggt ggatgtcttg tgggtaaagt tgtgagccct aaccgcaaag ttcttggtgt 1740 
cgacaggctt agagttattg atggttcaac gtttgacgag tctccaggaa ccaacccgca 1800 
agctactatg atgatgatgg gaagatacat gggagtcaag attcttcggg agagacttgg 18 60 
aaacaaagct ggtgtttagt ttgcagattg agcttttatg gtagacaaat tcgtagcaga 1920 
taattctgat gtggaattgt gttggagaat atctctctct gtctccttct ctgttatttg 1980 



1500 
1560 
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atattcgatt cattgaagta taggatcata ttgtctaatg aactgtgtaa ccctctattg 2040 
ggcaatcggc tctgttgctt attagcttgt gtgaaaagtt aatcacgttt tctgtttc 
(2) INFORMATION FOR SEQ ID NO: 90: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 94 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..2 94 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580984 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 



Met 


Leu 


Met 


Leu 


Ser 


Gly 


He 


Gly 


Pro 


Lys 


Lys 


Glu 


Leu 


Gin 


Arg 


Leu 


1 








5 










10 










15 




Lys 


He 


Pro 


Val 


Val 


Leu 


Glu 


Asn 


Glu 


His 


Val 


Gly 


Lys 


Gly 


Met 


Ala 








20 










25 










30 






Asp 


Asn 


Pro 


Met 


Asn 


Thr 


He 


Leu 


Val 


Pro 


Ser 


Lys 


Ala 


Pro 


He 


Glu 






35 










40 










45 








Gin 


Ser 


Leu 


He 


Gin 


Thr 


Val 


Gly 


He 


Thr 


Lys 


Met 


Gly 


Val 


Tyr 


Val 




50 










55 










60 










Glu 


Ala 


Ser 


Thr 


Gly 


Phe 


Gly 


Gin 


Ser 


Pro 


Glu 


Ser 


He 


His 


Thr 


His 


65 










70 










75 










80 


Tyr 


Gly 


He 


Met 


Ser 


Asn 


Lys 


Asn 


Glu 


Leu 


Phe 


Ser 


Thr 


He 


Pro 


Ala 








85 










90 










95 




Lys 


Gin 


Arg 


Arg 


Pro 


Glu 


Ala 


Thr 


Gin 


Ala 


Tyr 


He 


Thr 


Arg 


Asn 


Lys 








100 










105 










110 






Tyr 


Gin 


Leu 


His 


Glu 


Ala 


Phe 


Asn 


Gly 


Ser 


Phe 


He 


Leu 


Glu 


Lys 


Leu 




115 










120 










125 








Ala 


Tyr 


Pro 


He 


Ser 


Arg 


Gly 


His 


Leu 


Ser 


Leu 


Val 


Asn 


Thr 


Asn 


Val 




130 










135 










140 










Asp 


Asp 


Asn 


Pro 


Ser 


Val 


Thr 


Phe 


Asn 


Tyr 


Phe 


Lys 


His 


Pro 


Val 


Asp 


145 










150 










155 










160 


Leu 


Gin 


Arg 


Cys 


Val 


Glu 


Ala 


He 


Arg 


Leu 


Val 


Ser 


Lys 


Val 


Val 


Thr 










165 










170 










175 




Ser 


Lys 


Arg 


Phe 


Leu 


Asn 


Tyr 


Thr 


Gin 


Cys 


Asp 


Lys 


Gin 


Asn 


Val 


His 








180 










185 










190 






Lys 


Met 


Leu 


Ser 


Leu 


Ser 


Val 


Lys 


Ala 


Asn 


He 


Asn 


Leu 


Arg 


Pro 


Lys 






195 










200 










205 








Gin 


Leu 


Asn 


Asp 


Thr 


Lys 


Ser 


Met 


Ala 


Gin 


Phe 


Cys 


Lys 


Asp 


Thr 


Val 




210 










215 










220 










Val 


Thr 


He 


Trp 


His 


Tyr 


His 


Gly 


Gly 


Cys 


Leu 


Val 


Gly 


Lys 


Val 


Val 


225 










230 










235 










240 


Ser 


Pro 


Asn 


Arg 


Lys 


Val 


Leu 


Gly 


Val 


Asp 


Arg 


Leu 


Arg 


Val 


He 


Asp 










245 










250 










255 




Gly 


Ser 


Thr 


Phe 


Asp 


Glu 


Ser 


Pro 


Gly 


Thr 


Asn 


Pro 


Gin 


Ala 


Thr 


Met 








260 










265 










270 






Met 


Met 


Met 


Gly 


Arg 


Tyr 


Met 


Gly 


Val 


Lys 


He 


Leu 


Arg 


Glu 


Arg 


Leu 






275 










280 










285 








Gly 


Asn 


Lys 


Ala 


Gly 


Val 























290 

(2) INFORMATION FOR SEQ ID NO: 91: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 292 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 2 
Page 4 9 



(B) LOCATION: 1..292 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580985 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 



Met 


Leu 


Ser 


Gly 


He 


Gly 


Pro 


Lys 


Lys 


Glu 


Leu 


Gin 


Arg 


Leu 


Lys 


He 


1 








5 










10 










15 




Pro 


Val 


Val 


Leu 


Glu 


Asn 


Glu 


His 


Val 


Gly 


Lys 


Gly 


Met 


Ala 


Asp 


Asn 








20 










25 










30 






Pro 


Met 


Asn 


Thr 


He 


Leu 


Val 


Pro 


Ser 


Lys 


Ala 


Pro 


He 


Glu 


Gin 


Ser 






35 










40 










45 








Leu 


He 


Gin 


Thr 


Val 


Gly 


He 


Thr 


Lys 


Met 


Gly 


Val 


Tyr 


Val 


Glu 


Ala 




50 










55 










60 










Ser 


Thr 


Gly 


Phe 


Gly 


Gin 


Ser 


Pro 


Glu 


Ser 


He 


His 


Thr 


His 


Tyr 


Gly 


65 










70 










75 










80 


He 


Met 


Ser 


Asn 


Lys 


Asn 


Glu 


Leu 


Phe 


Ser 


Thr 


He 


Pro 


Ala 


Lys 


Gin 










85 










90 










95 




Arg 


Arg 


Pro 


Glu 


Ala 


Thr 


Gin 


Ala 


Tyr 


He 


Thr 


Arg 


Asn 


Lys 


Tyr 


Gin 






100 










105 










110 






Leu 


His 


Glu 


Ala 


Phe 


Asn 


Gly 


Ser 


Phe 


He 


Leu 


Glu 


Lys 


Leu 


Ala 


Tyr 






115 










120 










125 








Pro 


He 


Ser 


Arg 


Gly 


His 


Leu 


Ser 


Leu 


Val 


Asn 


Thr 


Asn 


Val 


Asp 


Asp 




130 










135 










140 










Asn 


Pro 


Ser 


Val 


Thr 


Phe 


Asn 


Tyr 


Phe 


Lys 


His 


Pro 


Val 


Asp 


Leu 


Gin 


145 










150 










155 










160 


Arg 


Cys 


Val 


Glu 


Ala 


He 


Arg 


Leu 


Val 


Ser 


Lys 


Val 


Val 


Thr 


Ser 


Lys 








165 










170 










175 




Arg 


Phe 


Leu 


Asn 


Tyr 


Thr 


Gin 


Cys 


Asp 


Lys 


Gin 


Asn 


Val 


His 


Lys 


Met 








180 










185 










190 






Leu 


Ser 


Leu 


Ser 


Val 


Lys 


Ala 


Asn 


He 


Asn 


Leu 


Arg 


Pro 


Lys 


Gin 


Leu 






195 










200 










205 








Asn 


Asp 


Thr 


Lys 


Ser 


Met 


Ala 


Gin 


Phe 


Cys 


Lys 


Asp 


Thr 


Val 


Val 


Thr 




210 










215 










220 










He 


Trp 


His 


Tyr 


His 


Gly 


Gly 


Cys 


Leu 


Val 


Gly 


Lys 


Val 


Val 


Ser 


Pro 


225 










230 










235 










240 


Asn 


Arg 


Lys 


Val 


Leu 


Gly 


Val 


Asp 


Arg 


Leu 


Arg 


Val 


He 


Asp 


Gly 


Ser 








245 










250 










255 




Thr 


Phe 


Asp 


Glu 


Ser 


Pro 


Gly 


Thr 


Asn 


Pro 


Gin 


Ala 


Thr 


Met 


Met 


Met 








260 










265 










270 






Met 


Gly 


Arg 


Tyr 


Met 


Gly Val 


Lys 


He 


Leu Arg 


Glu 


Arg 


Leu 


Gly 


Asn 






275 










280 










285 








Lys 


Ala 


Gly 


Val 



























290 

(2) INFORMATION FOR SEQ ID NO: 92: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 264 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..264 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580986 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 
Met Ala Asp Asn Pro Met Asn Thr He Leu Val Pro Ser Lys Ala Pro 
15 10 15 

He Glu Gin Ser Leu He Gin Thr Val Gly He Thr Lys Met Gly Val 

20 25 30 

Tyr Val Glu Ala Ser Thr Gly Phe Gly Gin Ser Pro Glu Ser He His 

35 40 45 

Thr His Tyr Gly He Met Ser Asn Lys Asn Glu Leu Phe Ser Thr He 
50 55 60 
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Pro 


Ala 


Lys 


Gin 


Arg 


Arg 


Pro 


Glu 


Ala 


Thr 


Gin 


Ala 


Tyr 


He 


Thr 


Arg 


65 








70 










75 










80 


Asn 


Lys 


Tyr 


Gin 


Leu 


His 


Glu 


Ala 


Phe 


Asn 


Gly 


Ser 


Phe 


He 


Leu 


Glu 






85 










90 










95 




Lys 


Leu 


Ala 


Tyr 


Pro 


He 


Ser 


Arg 


Gly 


His 


Leu 


Ser 


Leu 


Val 


Asn 


Thr 






100 










105 










110 






Asn 


Val 


Asp 
115 


Asp 


Asn 


Pro 


Ser 


Val 
120 


Thr 


Phe 


Asn 


Tyr 


Phe 
125 


Lys 


His 


Pro 


Val 


Asp 
130 


Leu 


Gin 


Arg 


Cys 


Val 
135 


Glu 


Ala 


He 


Arg 


Leu 
140 


Val 


Ser 


Lys 


Val 


Val 


Thr 


Ser 


Lys 


Arg 


Phe 


Leu 


Asn 


Tyr 


Thr 


Gin 


Cys 


Asp 


Lys 


Gin 


Asn 


145 










150 










155 










160 


Val 


His 


Lys 


Met 


Leu 


Ser 


Leu 


Ser 


Val 


Lys 


Ala 


Asn 


He 


Asn 


Leu 


Arg 








165 










170 










175 




Pro 


Lys 


Gin 


Leu 


Asn 


Asp 


Thr 


Lys 


Ser 


Met 


Ala 


Gin 


Phe 


Cys 


Lys 


Asp 






180 










185 










190 






Thr 


Val 


Val 
195 


Thr 


lie 


Trp 


His 


Tyr 
200 


His 


Gly 


Gly 


Cys 


Leu 
205 


Val 


Gly 


Lys 


Val 


Val 
210 


Ser 


Pro 


Asn 


Arg 


Lys 
215 


Val 


Leu 


Gly 


Val 


Asp 
220 


Arg 


Leu 


Arg 


Val 


lie 


Asp 


Gly 


Ser 


Thr 


Phe 


Asp 


Glu 


Ser 


Pro 


Gly 


Thr 


Asn 


Pro 


Gin 


Ala 


225 








230 










235 










240 


Thr 


Met 


Met 


Met 


Met 
245 


Gly 


Arg 


Tyr 


Met 


Gly 
250 


Val 


Lys 


lie 


Leu 


Arg 
255 


Glu 


Arg 


Leu 


Gly 


Asn 
260 


Lys 


Ala 


Gly 


Val 


















(2) 


INFORMATION 


FOR 


SEQ 


ID ] 


NO: 93: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1318 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1318 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580995 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 
atctcgaatc cgattccttt tttctcttct gcaatctctt tgaaacctca agcttctctt 
caaaatttcg attacagctg aagattctct cgattttctg gttttgactt tttttttgtt 
tggtttggtg aaagatttgg atcaaatggg ggtggattac tacaatgttc tgaaagtgaa 
tcgaaatgct aacgaagatg atctgaagaa gtcttaccgg cgaatggcta tgaaatggca 
ccctgataaa aaccctacga gcaagaaaga agctgaagct aagttcaagc agatctctga 
agcttacgac gtcttgagcg atcctcagag acgtcagatc tacgatcagt acggagaaga 
aggtcttaaa tccaccgatt tacctactgc ggcggagacg gcggcgcaGc agcagcagcg 
gaGctactct tctagcaact ctgagtttcg gtattatccg cgtgatgctg aagatatctt 
tgctgagttt ttcggcgaat ccggtgatgc ttttggcgga gggagtagcg gaaggacacg 
tggagatggc ggtgatggtg gcggtcggag atttaaaagt gcagaagcag ggagccaggc 
taatagaaag acgccgccga cgaacaaaaa aacgacgccg ccggcgaata ggaaggctcc 
ggctattgag agtaaattgg cttgcacatt ggaggagctc tacaaaggtg caaagaagaa 
gatgcgaatt tctcgcgttg ttcctgatga ttttggaaag ccaaagacag tccaagagat 
attgaagata gacataaaac caggttggaa gaaaggcaca aagatcactt ttccagagaa 
aggcaaccaa gaacctggtg tcactCctgc ggatcttata ttcgtggttg acgagaaacc 
gcattcggta ttcaagagag acggtaacga tctgattctc gagaagaaag tctctcttat 
agatgcttta accggtctca ccattagcgt aacgactcta gacgggagga gcctaactat 
cccggtgctg gatatcgtaa aaccgggtca ggagattgtg atcccgaatg agggaatgcc 
tactaaagac cctttgaaga gaggagacct tagagttacc tttgaaatct tgttcccgtc 
aaggctaacg tcagaacaga agaatgacct caagagagtt cttggtggaa gctgatgagt 
tgacgtttag ttttcaagta agctattgat cttctttaac aacggattcc acaggaagtt 
gctagtcatt tgagtgtaca tacatttacg tttttccttt gtaaacttgt gttataag 
(2) INFORMATION FOR SEQ ID NO: 94: 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
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(i) SEQUENCE CHARACTERISTICS: 

{A} LENGTH: 34 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

{A) NAME/ KEY : peptide 
(B) LOCATION: 1..349 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580996 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 
Met Gly Val Asp Tyr Tyr Asn Val Leu Lys Val Asn Arg Asn Ala Asn 
15 10 15 

Glu Asp Asp Leu Lys Lys Ser Tyr Arg Arg Met Ala Met Lys Trp His 

20 25 30 

Pro Asp Lys Asn Pro Thr Ser Lys Lys Glu Ala Glu Ala Lys Phe Lys 

35 40 45 

Gin lie Ser Glu Ala Tyr Asp Val Leu Ser Asp Pro Gin Arg Arg Gin 

50 55 60 

lie Tyr Asp Gin Tyr Gly Glu Glu Gly Leu Lys Ser Thr Asp Leu Pro 
65 70 75 80 

Thr Ala Ala Glu Thr Ala Ala Gin Gin Gin Gin Arg Ser Tyr Ser Ser 

85 90 95 

Ser Asn Ser Glu Phe Arg Tyr Tyr Pro Arg Asp Ala Glu Asp lie Phe 

100 105 HO 

Ala Glu Phe Phe Gly Glu Ser Gly Asp Ala Phe Gly Gly Gly Ser Ser 

115 120 125 

Gly Arg Thr Arg Gly Asp Gly Gly Asp Gly Gly Gly Arg Arg Phe Lys 

130 135 140 

Ser Ala Glu Ala Gly Ser Gin Ala Asn Arg Lys Thr Pro Pro Thr Asn 
145 150 155 160 

Lys Lys Thr Thr Pro Pro Ala Asn Arg Lys Ala Pro Ala lie Glu Ser 

165 170 175 

Lys Leu Ala Cys Thr Leu Glu Glu Leu Tyr Lys Gly Ala Lys Lys Lys 

180 185 190 

Met Arg lie Ser Arg Val Val Pro Asp Asp Phe Gly Lys Pro Lys Thr 

195 200 205 

Val Gin Glu lie Leu Lys lie Asp lie Lys Pro Gly Trp Lys Lys Gly 

210 215 220 

Thr Lys lie Thr Phe Pro Glu Lys Gly Asn Gin Glu Pro Gly Val Thr 
225 230 235 240 

Pro Ala Asp Leu lie Phe Val Val Asp Glu Lys Pro His Ser Val Phe 

245 250 255 

Lys Arg Asp Gly Asn Asp Leu lie Leu Glu Lys Lys Val Ser Leu lie 

260 265 270 

Asp Ala Leu Thr Gly Leu Thr He Ser Val Thr Thr Leu Asp Gly Arg 

275 280 285 

Ser Leu Thr He Pro Val Leu Asp He Val Lys Pro Gly Gin Glu He 

290 295 300 

Val He Pro Asn Glu Gly Met Pro Thr Lys Asp Pro Leu Lys Arg Gly 
305 310 315 320 

Asp Leu Arg Val Thr Phe Glu He Leu Phe Pro Ser Arg Leu Thr Ser 

325 330 335 

Glu Gin Lys Asn Asp Leu Lys Arg Val Leu Gly Gly Ser 

340 345 
(2) INFORMATION FOR SEQ ID NO: 95: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 323 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..323 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580997 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 
Met Ala Met Lys Trp His Pro Asp Lys Asn Pro Thr Ser Lys Lys Glu 
15 10 15 

Ala Glu Ala Lys Phe Lys Gin lie Ser Glu Ala Tyr Asp Val Leu Ser 

20 25 30 

Asp Pro Gin Arg Arg Gin lie Tyr Asp Gin Tyr Gly Glu Glu Gly Leu 

35 40 45 

Lys Ser Thr Asp Leu Pro Thr Ala Ala Glu Thr Ala Ala Gin Gin Gin 

50 55 60 

Gin Arg Ser Tyr Ser Ser Ser Asn Ser Glu Phe Arg Tyr Tyr Pro Arg 
65 70 75 80 

Asp Ala Glu Asp lie Phe Ala Glu Phe Phe Gly Glu Ser Gly Asp Ala 

85 90 95 

Phe Gly Gly Gly Ser Ser Gly Arg Thr Arg Gly Asp Gly Gly Asp Gly 

100 105 110 

Gly Gly Arg Arg Phe Lys Ser Ala Glu Ala Gly Ser Gin Ala Asn Arg 

115 120 125 

Lys Thr Pro Pro Thr Asn Lys Lys Thr Thr Pro Pro Ala Asn Arg Lys 

130 135 140 

Ala Pro Ala He Glu Ser Lys Leu Ala Cys Thr Leu Glu Glu Leu Tyr 
145 150 155 160 

Lys Gly Ala Lys Lys Lys Met Arg He Ser Arg Val Val Pro Asp Asp 

165 170 175 

Phe Gly Lys Pro Lys Thr Val Gin Glu He Leu Lys He Asp He Lys 

180 185 190 

Pro Gly Trp Lys Lys Gly Thr Lys He Thr Phe Pro Glu Lys Gly Asn 

195 200 205 

Gin Glu Pro Gly Val Thr Pro Ala Asp Leu He Phe Val Val Asp Glu 

210 215 220 

Lys Pro His Ser Val Phe Lys Arg Asp Gly Asn Asp Leu He Leu Glu 
225 230 235 240 

Lys Lys Val Ser Leu He Asp Ala Leu Thr Gly Leu Thr He Ser Val 

245 250 255 

Thr Thr Leu Asp Gly Arg Ser Leu Thr He Pro Val Leu Asp He Val 

260 265 270 

Lys Pro Gly Gin Glu He Val He Pro Asn Glu Gly Met Pro Thr Lys 

275 280 285 

Asp Pro Leu Lys Arg Gly Asp Leu Arg Val Thr Phe Glu He Leu Phe 

290 295 300 

Pro Ser Arg Leu Thr Ser Glu Gin Lys Asn Asp Leu Lys Arg Val Leu 
305 310 315 320 

Gly Gly Ser 

(2) INFORMATION FOR SEQ ID NO: 96: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 321 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..321 

(D) OTHER INFORMATION: / Ceres Seq. ID 1580998 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 
Met Lys Trp His Pro Asp Lys Asn Pro Thr Ser Lys Lys Glu Ala Glu 
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1 








5 










10 










15 




Ala 


Lys 


Phe 


Lys 


Gin 


He 


Ser 


Glu 


Ala 


Tyr 


Asp 


Val 


Leu 


Ser 


Asp 


Pro 








20 










25 










30 






Gin 


Arg 


Arg 


Gin 


lie 


Tyr 


Asp 


Gin 




Gly 


Glu 


Glu 


Gly 


Leu 


Lys 


Ser 






35 










40 










45 








Thr 


Asp 


Leu 


Pro 


Thr 


Ala 


Ala 


Glu 


Thr 


Ala 


Ala 


Gin 


Gin 


Gin 


Gin 


Arg 




50 










55 










60 










Ser 


Tyr 


Ser 


Ser 


Ser 


Asn 


Ser 


Glu 


Phe 


Arg 


Tyr 


Tyr 


Pro 


Arg 


Asp 


Ala 


65 










70 










75 










80 


Glu 


Asp 


He 


Phe 


Ala 


Glu 


Phe 


Phe 


Gly 


Glu 


Ser 


Gly Asp 


Ala 


Phe 


Gly 










85 










90 










95 




Gly 


Gly 


Ser 


Ser 


Gly 


Arg 


Thr 


Arg 


Gly Asp 


Gly 


Gly Asp 


Gly 


Gly 


Gly 








100 










105 










110 






Arg 


Arg 


Phe 


Lys 


Ser 


Ala 


Glu 


Ala 


Gly 


Ser 


Gin 


Ala 


Asn 


Arg 


Lys 


Thr 






115 










120 










125 








Pro 


Pro 


Thr 


Asn 


Lys 


Lys 


Thr 


Thr 


Pro 


Pro 


Ala 


Asn 


Arg 


Lys 


Ala 


Pro 




130 










135 










140 










Ala 


He 


Glu 


Ser 


Lys 


Leu 


Ala 


Cys 


Thr 


Leu 


Glu 


Glu 


Leu 


Tyr 


Lys 


Gly 


145 










150 










155 










160 


Ala 


Lys 


Lys 


Lys 


Met 


Arg 


He 


Ser 


Arg 


Val 


Val 


Pro 


Asp 


Asp 


Phe 


Gly 










165 










170 










175 




Lys 


Pro 


Lys 


Thr 


Val 


Gin 


Glu 


He 


Leu 


Lys 


He 


Asp 


He 


Lys 


Pro 


Gly 








180 










185 










190 






Trp 


Lys 


Lys 


Gly 


Thr 


Lys 


He 


Thr 


Phe 


Pro 


Glu 


Lys 


Gly 


Asn 


Gin 


Glu 






195 










200 










205 








Pro 


Gly 


Val 


Thr 


Pro 


Ala 


Asp 


Leu 


He 


Phe 


Val 


Val 


Asp 


Glu 


Lys 


Pro 




210 










215 










220 










His 


Ser 


Val 


Phe 


Lys 


Arg 


Asp 


Gly 


Asn 


Asp 


Leu 


He 


Leu 


Glu 


Lys 


Lys 


225 










230 










235 










240 


Val 


Ser 


Leu 


He 


Asp 


Ala 


Leu 


Thr 


Gly 


Leu 


Thr 


He 


Ser 


Val 


Thr 


Thr 










245 










250 










255 




Leu 


Asp 


Gly 


Arg 


Ser 


Leu 


Thr 


He 


Pro 


Val 


Leu 


Asp 


He 


Val 


Lys 


Pro 








260 










265 










270 






Gly 


Gin 


Glu 


He 


Val 


He 


Pro 


Asn 


Glu 


Gly 


Met 


Pro 


Thr 


Lys 


Asp 


Pro 




275 










280 










285 








Leu 


Lys 


Arg 


Gly 


Asp 


Leu 


Arg 


Val 


Thr 


Phe 


Glu 


He 


Leu 


Phe 


Pro 


Ser 




290 










295 










300 










Arg 


Leu 


Thr 


Ser 


Glu 


Gin 


Lys 


Asn 


Asp 


Leu 


Lys 


Arg 


Val 


Leu 


Gly 


Gly 


305 










310 










315 










320 



Ser 

(2) INFORMATION FOR SEQ ID NO: 97: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1168 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..1168 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581038 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 
atcaaaaaat ctctatccat ccacaacaac aacaaaaact caagaacaat atctccactt 60 
gatcactacc acattgtcca atcactctac aaagcctgta cgtacacaac aacattacca 120 
tggtgaaaca agaacgcaag atccaaacca gcagcacaaa aaaggaaatg cctttgtcat 18 0 

catcaccatc ttcttcttct tcttcttcat cttcttcctc gtcttcgtct tcgtgtaaga 240 
acaagaacaa gaagagtaag attaagaagt acaaaggagt gaggatgaga agttggggat 300 
catgggtctc tgagattagg gcaccaaatc aaaagacaag gatttggtta ggttcttact 360 
caacagctga agcagctgct agagcttacg atgttgcact cttatgtctc aaaggccctc 420 
aagccaatct caacttccct acttcttctt cttctcatca tcttcttgat aatctcttag 480 
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atgaaaatac ccttttgtcc cccaaatcca tccaaagagt agctgctcaa gctgccaact 
catttaacca ttttgcccct acttcatcag ccgtctcgtc accgtccgat catgatcatc 
accatgatga tgggatgcaa tctttgatgg gatcttttgt ggacaatcat gtgtctttga 
tggattcaac atcttcatgg tatgatgatc ataatgggat gttcttgttt gataatggag 
ctccattcaa ttactctcct caactaaact cgacgacgat gctcgatgaa tacttctacg 
aagatgctga cattccgctt tggagtttca attaatccga cggtccataa tacatacttt 
aattagtttt ctaaatcatt ttttaaattc ttttttatat atgtaacata tatttaccct 
cgtcgtaatt attcatttcc aagttgtttg ttgcttggag agaattgacg acggatacga 
ttatataata taattgtaag tttttatatg tagcaaactc ggtacaattc ggatttaata 
gggatctagg aagttcacat gaacaagcga ctcttttttt tgtcacttct ttgacttttt 
catgtgaggt ttcttttcat caattttttt taacaatgtg tgaaaatatt tctgtattgt 
aatgatatat ttcgcacaat agttttgc 
{2} INFORMATION FOR SEQ ID NO: 98: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 231 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

CD) TOPOLOGY: linear 
peptide 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 



peptide 
1. .231 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581039 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 
Met Val Lys Gin Glu Arg Lys He Gin Thr Ser Ser Thr Lys Lys Glu 
15 10 15 

Met Pro Leu Ser Ser Ser Pro Ser Ser Ser Ser Ser Ser Ser Ser Ser 

20 25 30 

Ser Ser Ser Ser Ser Ser Cys Lys Asn Lys Asn Lys Lys Ser Lys He 

35 40 45 

Lys Lys Tyr Lys Gly Val Arg Met Arg Ser Trp Gly Ser Trp Val Ser 

50 55 60 

Glu He Arg Ala Pro Asn Gin Lys Thr Arg He Trp Leu Gly Ser Tyr 
65 70 75 80 

Ser Thr Ala Glu Ala Ala Ala Arg Ala Tyr Asp Val Ala Leu Leu Cys 

85 90 95 

Leu Lys Gly Pro Gin Ala Asn Leu Asn Phe Pro Thr Ser Ser Ser Ser 

100 105 HO 

His His Leu Leu Asp Asn Leu Leu Asp Glu Asn Thr Leu Leu Ser Pro 

115 120 125 

Lys Ser He Gin Arg Val Ala Ala Gin Ala Ala Asn Ser Phe Asn His 

130 135 140 

Phe Ala Pro Thr Ser Ser Ala Val Ser Ser Pro Ser Asp His Asp His 
145 150 155 160 

His His Asp Asp Gly Met Gin Ser Leu Met Gly Ser Phe Val Asp Asn 

165 170 175 

His Val Ser Leu Met Asp Ser Thr Ser Ser Trp Tyr Asp Asp His Asn 

180 185 190 

Gly Met Phe Leu Phe Asp Asn Gly Ala Pro Phe Asn Tyr Ser Pro Gin 

195 200 205 

Leu Asn Ser Thr Thr Met Leu Asp Glu Tyr Phe Tyr Glu Asp Ala Asp 

210 215 220 

He Pro Leu Trp Ser Phe Asn 
225 230 
(2) INFORMATION FOR SEQ ID NO: 99: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 215 amino acids 

(B) TYPE: amino acid 
CC) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 



540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
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(ix) FEATURE: 

{A} NAME/KEY: peptide 
(B) LOCATION: 1..215 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581040 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 



Met 


Pro 


Leu 


Ser 


Ser 


Ser 


Pro 


Ser 


Ser 


Ser 


Ser 


Ser 


Ser 


Ser 


Ser 


Ser 


1 








5 










10 










15 




Ser 


Ser 


Ser 


Ser 


Ser 


Ser 


Cys 


Lys 


Asn 


Lys 


Asn 


Lys 


Lys 


Ser 


Lys 


He 








20 










25 










30 






Lys 


Lys 


Tyr 


Lys 


Gly 


Val 


Arg 


Met 


Arg 


Ser 


Trp 


Gly 


Ser 


Trp 


Val 


Ser 






35 










40 










45 








Glu 


He 


Arg 


Ala 


Pro 


Asn 


Gin 


Lys 


Thr 


Arg 


He 


Trp 


Leu 


Gly 


Ser 


Tyr 




50 










55 










60 










Ser 


Thr 


Ala 


Glu 


Ala 


Ala 


Ala 


Arg 


Ala 


Tyr 


Asp 


Val 


Ala 


Leu 


Leu 


Cys 


65 










70 










75 










80 


Leu 


Lys 


Gly 


Pro 


Gin 


Ala 


Asn 


Leu 


Asn 


Phe 


Pro 


Thr 


Ser 


Ser 


Ser 


Ser 










85 










90 










95 




His 


His 


Leu 


Leu 


Asp 


Asn 


Leu 


Leu 


Asp 


Glu 


Asn 


Thr 


Leu 


Leu 


Ser 


Pro 








100 










105 










110 






Lys 


Ser 


He 


Gin 


Arg 


Val 


Ala 


Ala 


Gin 


Ala 


Ala 


Asn 


Ser 


Phe 


Asn 


His 






115 










120 










125 








Phe 


Ala 


Pro 


Thr 


Ser 


Ser 


Ala 


Val 


Ser 


Ser 


Pro 


Ser 


Asp 


His 


Asp 


His 




130 










135 










140 










His 


His 


Asp 


Asp 


Gly 


Met 


Gin 


Ser 


Leu 


Met 


Gly 


Ser 


Phe 


Val 


Asp 


Asn 


145 










150 










155 










160 


His 


Val 


Ser 


Leu 


Met 


Asp 


Ser 


Thr 


Ser 


Ser 


Trp 


Tyr 


Asp 


Asp 


His 


Asn 










165 










170 










175 




Gly 


Met 


Phe 


Leu 


Phe 


Asp 


Asn 


Gly 


Ala 


Pro 


Phe 


Asn 


Tyr 


Ser 


Pro 


Gin 






180 










185 










190 






Leu 


Asn 


Ser 


Thr 


Thr 


Met 


Leu 


Asp 


Glu 


Tyr 


Phe 


Tyr 


Glu 


Asp 


Ala 


Asp 






195 










200 










205 








He 


Pro 


Leu 


Trp 


Ser 


Phe 


Asn 





















210 215 



(2) INFORMATION FOR SEQ ID NO: 100: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..17 6 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581041 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 


>:100 












Met 


Arg 


Ser 


Trp 


Gly 


Ser 


Trp 


Val 


Ser 


Glu 


He 


Arg 


Ala 


Pro 


Asn 


Gin 


1 








5 










10 










15 




Lys 


Thr 


Arg 


He 

20 


Trp 


Leu 


Gly 


Ser 


Tyr 
25 


Ser 


Thr 


Ala 


Glu 


Ala 
30 


Ala 


Ala 


Arg 


Ala 


Tyr 
35 


Asp 


Val 


Ala 


Leu 


Leu 

40 


Cys 


Leu 


Lys 


Gly 


Pro 
45 


Gin 


Ala 


Asn 


Leu 


Asn 
50 


Phe 


Pro 


Thr 


Ser 


Ser 
55 


Ser 


Ser 


His 


His 


Leu 
60 


Leu 


Asp 


Asn 


Leu 


Leu 


Asp 


Glu 


Asn 


Thr 


Leu 


Leu 


Ser 


Pro 


Lys 


Ser 


He 


Gin 


Arg 


Val 


Ala 


65 










70 










75 










80 


Ala 


Gin 


Ala 


Ala 


Asn 
85 


Ser 


Phe 


Asn 


His 


Phe 
90 


Ala 


Pro 


Thr 


Ser 


Ser 
95 


Ala 


Val 


Ser 


Ser 


Pro 
100 


Ser 


Asp 


His 


Asp 


His 

105 


His 


His 


Asp 


Asp 


Gly 
110 


Met 


Gin 


Ser 


Leu 


Met 
115 


Gly 


Ser 


Phe 


Val 


Asp 

120 


Asn 


His 


Val 


Ser 


Leu 
125 


Met 


Asp 


Ser 
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Thr Ser Ser Trp Tyr Asp Asp His 

130 135 
Gly Ala Pro Phe Asn Tyr Ser Pro 
145 150 
Asp Glu Tyr Phe Tyr Glu Asp Ala 
165 



Asn Gly Met Phe Leu Phe Asp Asn 
140 

Gin Leu Asn Ser Thr Thr Met Leu 

155 160 

Asp lie Pro Leu Trp Ser Phe Asn 

170 175 



(2) INFORMATION FOR SEQ ID NO: 101: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 505 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1 . .505 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581046 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 
attttttatt tgttgttctg tgtcagatta tggggagcac agacgcagag atcgcttcag 
aacttcgaga ttggtggtga ccgcgagcga atgcccgaac caatcgtccg agcttttggt 
gtcttgaaga aatgtgctgc caaggttaac atggagtatg gtcttgatcc aatgattggg 
gaagccataa tggaagctgc acaagaagta gcagaaggaa agctcaatga tcatttccct 
cttgttgtat ggcaaactgg tagtgggacg cagagtaata tgaatgctaa tgaggtcatt 
gccaatagag cagctgagat tcttggtcac aaacgtggtg aaaaaattgt gcacccaaat 
gaccatgtga acagatcaca atcttctaat gacacttttc caactgtcat gcacattgca 
gctgcaaccg agattacttc gaggctaatc cctagtttga aaaatttgca tagctctttg 
gaatctaagt ccttcgagtt taaag 
(2) INFORMATION FOR SEQ ID NO: 102: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 168 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..168 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581047 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 



lie 


Phe 


Tyr 


Leu 


Leu 


Phe 


Cys 


Val 


Arg 


Leu 


Trp 


Gly Ala 


Gin 


Thr 


Gin 


1 






5 










10 










15 




Arg 


Ser 


Leu 


Gin 


Asn 


Phe 


Glu 


He 


Gly 


Gly 


Asp 


Arg 


Glu 


Arg 


Met 


Pro 






20 










25 










30 






Glu 


Pro 


He 


Val 


Arg 


Ala 


Phe 


Gly 


Val 


Leu 


Lys 


Lys 


Cys 


Ala 


Ala 


Lys 






35 










40 










45 








Val 


Asn 


Met 


Glu 


Tyr 


Gly 


Leu 


Asp 


Pro 


Met 


He 


Gly 


Glu 


Ala 


He 


Met 




50 










55 










60 










Glu 


Ala 


Ala 


Gin 


Glu 


Val 


Ala 


Glu 


Gly 


Lys 


Leu 


Asn 


Asp 


His 


Phe 


Pro 


65 










70 










75 










80 


Leu 


Val 


Val 


Trp 


Gin 


Thr 


Gly 


Ser 


Gly 


Thr 


Gin 


Ser 


Asn 


Met 


Asn 


Ala 








85 










90 










95 




Asn 


Glu 


Val 


He 


Ala 


Asn 


Arg 


Ala 


Ala 


Glu 


He 


Leu 


Gly 


His 


Lys 


Arg 








100 










105 










110 






Gly 


Glu 


Lys 


He 


Val 


His 


Pro 


Asn 


Asp 


His 


Val 


Asn 


Arg 


Ser 


Gin 


Ser 




115 










120 










125 








Ser 


Asn 


Asp 


Thr 


Phe 


Pro 


Thr 


Val 


Met 


His 


He 


Ala 


Ala 


Ala 


Thr 


Glu 




130 










135 










140 










He 


Thr 


Ser 


Arg 


Leu 


He 


Pro 


Ser 


Leu 


Lys 


Asn 


Leu 


His 


Ser 


Ser 


Leu 



145 150 155 160 
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Glu Ser Lys Ser Phe Glu Phe Lys 
165 

(2) INFORMATION FOR SEQ ID NO: 103: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 138 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..138 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581048 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 



Met 


Pro 


Glu 


Pro 


He 


Val 


Arg 


Ala 


Phe 


Gly 


Val 


Leu 


Lys 


Lys 


Cys 


Ala 


1 








5 










10 










15 




Ala 


Lys 


Val 


Asn 


Met 


Glu 


Tyr 


Gly 


Leu 


Asp 


Pro 


Met 


He 


Gly 


Glu 


Ala 






20 










25 










30 






He 


Met 


Glu 


Ala 


Ala 


Gin 


Glu 


Val 


Ala 


Glu 


Gly 


Lys 


Leu 


Asn 


Asp 


His 






35 










40 










45 








Phe 


Pro 


Leu 


Val 


Val 


Trp 


Gin 


Thr 


Gly 


Ser 


Gly 


Thr 


Gin 


Ser 


Asn 


Met 




50 










55 










60 










Asn 


Ala 


Asn 


Glu 


Val 


He 


Ala 


Asn 


Arg 


Ala 


Ala 


Glu 


He 


Leu 


Gly 


His 


65 










70 










75 










80 


Lys 


Arg 


Gly 


Glu 


Lys 


He 


Val 


His 


Pro 


Asn 


Asp 


His 


Val 


Asn 


Arg 


Ser 








85 










90 










95 




Gin 


Ser 


Ser 


Asn 


Asp 


Thr 


Phe 


Pro 


Thr 


Val 


Met 


His 


He 


Ala 


Ala 


Ala 








100 










105 










110 






Thr 


Glu 


He 


Thr 


Ser 


Arg 


Leu 


He 


Pro 


Ser 


Leu 


Lys 


Asn 


Leu 


His 


Ser 






115 










120 










125 








Ser 


Leu 


Glu 


Ser 


Lys 


Ser 


Phe 


Glu 


Phe 


Lys 
















130 










135 




















(2) 


INFORMATION 


FOR 


SEQ 


ID 1 


NO: 104 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..118 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581049 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104: 



Met 


Glu 


Tyr 


Gly 


Leu 


Asp 


Pro 


Met 


He 


Gly 


Glu 


Ala 


He 


Met 


Glu 


Ala 


1 








5 










10 










15 




Ala 


Gin 


Glu 


Val 
20 


Ala 


Glu 


Gly 


Lys 


Leu 
25 


Asn 


Asp 


His 


Phe 


Pro 
30 


Leu 


Val 


Val 


Trp 


Gin 


Thr 


Gly 


Ser 


Gly 


Thr 


Gin 


Ser 


Asn 


Met 


Asn 


Ala 


Asn 


Glu 




35 










40 










45 








Val 


He 
50 


Ala 


Asn 


Arg 


Ala 


Ala 
55 


Glu 


He 


Leu 


Gly 


His 
60 


Lys 


Arg 


Gly 


Glu 


Lys 


He 


Val 


His 


Pro 


Asn 


Asp 


His 


Val 


Asn 


Arg 


Ser 


Gin 


Ser 


Ser 


Asn 


65 










70 










75 










80 


Asp 


Thr 


Phe 


Pro 


Thr 


Val 


Met 


His 


He 


Ala 


Ala 


Ala 


Thr 


Glu 


He 


Thr 








85 










90 










95 




Ser 


Arg 


Leu 


He 


Pro 


Ser 


Leu 


Lys 


Asn 


Leu 


His 


Ser 


Ser 


Leu 


Glu 


Ser 






100 










105 










110 






Lys 


Ser 


Phe 
115 


Glu 


Phe 


Lys 






















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:105: 

















Attorney Docket No. 2750-1237P 
Client Docket No* 80146.003 



Table 2 
Page 58 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 441 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/ KEY : - 

(B) LOCATION: 1..441 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581056 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 105: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 60 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 120 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 180 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 240 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 300 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 360 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgtttggg 420 
acccacagcc agctcctaag g 
(2) INFORMATION FOR SEQ ID NO: 10 6: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581057 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 



Ser 


Leu 


He 


He 


Val 


Leu 


Arg 


Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


Phe 


1 








5 










10 










15 




Ser 


Asp 


Leu 


Phe 
20 


Gin 


Phe 


Ser 


Arg 


Lys 

25 


Asn 


Met 


Ser 


Cys 


Cys 
30 


Gly 


Gly 


Ser 


Cys 


Gly 
35 


Cys 


Gly 


Ser 


Ala 


Cys 
40 


Lys 


Cys 


Gly 


Asn 


Gly 
45 


Cys 


Gly 


Gly 


Cys 


Lys 
50 


Arg 


Tyr 


Pro 


Asp 


Leu 
55 


Glu 


Asn 


Thr 


Ala 


Thr 
60 


Glu 


Thr 


Leu 


Val 


Leu 


Gly 


Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


65 










70 










75 










80 


Thr 


Phe 


Val 


Ala 


Glu 
85 


Asn 


Asp 


Ala 


Cys 


Lys 
90 


Cys 


Gly 


Ser 


Asp 


Cys 
95 


Lys 


Cys 


Asn 


Pro 


Cys 
100 


Thr 


Cys 


Lys 




















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:107: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq, ID 1581058 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 
20 25 30 
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Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 
65 70 75 

(2) INFORMATION FOR SEQ ID NO: 108: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 521 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..521 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581067 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108: 
actgattatt gttttaaggc aaattaagat catcttcaaa atcttctcag atctcttcca 
attttctaga aaaaacatgt cttgctgtgg tggaagctgt ggttgtggat ctgcctgcaa 
gtgcggcaat ggttgcggag gttgcaaaag gtaccctgac ttggagaaca ccgccaccga 
gactcttgtc ctcggtgttg ctccggcgat gaactctcag tacgaggctt ccggcgagac 
tttcgttgcc gagaatgatg cctgcaaatg cggatctgac tgcaagtgca acccttgtac 
ctgcaaatga agaacttcat aaaccctaag tctgtaataa ccctaatgtt atgttaggtt 
tgcttatatg taataattgg ctgatttttc cggtagtttt gccggcgacg ttggtctttc 
tcttcttctt cttcttctgt gtgttctgtg gcaagtacgg agtgangcgw aaggctgttg 
gtatctgggg ttgcaaggat tgtggcaagg tcaaggcagg t 
(2) INFORMATION FOR SEQ ID NO: 109: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 102 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS: 

{D} TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..102 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581068 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 



Leu 


He 


He 


Val 


Leu 


Arg 


Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


Phe 


Ser 


1 








5 










10 










15 




Asp 


Leu 


Phe 


Gin 


Phe 


Ser 


Arg 


Lys 


Asn 


Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 






20 










25 










30 






Cys 


Gly 


Cys 
35 


Gly 


Ser 


Ala 


Cys 


Lys 
40 


Cys 


Gly 


Asn 


Gly 


Cys 
45 


Gly 


Gly 


Cys 


Lys 


Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 


Ala 


Thr 


Glu 


Thr 


Leu 


Val 


Leu 


50 










55 










60 










Gly 


Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


65 










70 










75 










80 


Phe 


Val 


Ala 


Glu 


Asn 
85 


Asp 


Ala 


Cys 


Lys 


Cys 
90 


Gly 


Ser 


Asp 


Cys 


Lys 
95 


Cys 


Asn 


Pro 


Cys 


Thr 


Cys 


Lys 























100 

(2) INFORMATION FOR SEQ ID NO: 110: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 
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(A) NAME /KEY: peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581069 
{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 
65 70 75 

(2) INFORMATION FOR SEQ ID NO: 111: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 57 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581070 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111: 
Met Leu Gly Leu Leu He Cys Asn Asn Trp Leu He Phe Pro Val Val 
15 10 15 

Leu Pro Ala Thr Leu Val Phe Leu Phe Phe Phe Phe Phe Cys Val Phe 

20 25 30 

Cys Gly Lys Tyr Gly Val Xaa Xaa Lys Ala Val Gly He Trp Gly Cys 

35 40 45 

Lys Asp Cys Gly Lys Val Lys Ala Gly 

50 55 
{2} INFORMATION FOR SEQ ID NO: 112 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 62 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. . 4 62 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581075 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 
cattgggaca ccactttctg gagtgagagg cttcgtgaat ccaaatacga tattaatgag 
gaagaactac gtccatactt ctcactgcca aaggttatgg atggactttt cagtctagct 
aagacacttt ttggaattga tattgaacca gcagatggtt tagctccggt ctggaacaat 
gatgttcggt tctaccgcgt caaagattct tctgggaacc caatcgctta cttttacttt 
gatccatact ctcgcccgtc agagaaaaga ggtggcgctt ggatggatga ggttgtttcc 
cgtagccgag tcatggctca gaaaggctct tctgttaggc tgcctgttgc acacatggtc 
tgcaaccaaa ctccaccagt aggtgacaag ccaagcctta tgacattccg tgagtagaga 
cagtatttca tgaatttgga catgctcttc agccacatgc tt 
(2) INFORMATION FOR SEQ ID NO: 113: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 138 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



60 
120 
180 
240 
300 
360 
420 
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(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1.H38 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581076 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: 



His 


Trp 


Asp 


Thr 


Thr 


Phe 


Trp 


Ser 


Glu 


Arg 


Leu 


Arg 


Glu 


Ser 


Lys 


Tyr 


1 




5 










10 










15 




Asp 


He 


Asn 


Glu 


Glu 


Glu 


Leu 


Arg 


Pro 


Tyr 


Phe 


Ser 


Leu 


Pro 


Lys 


Val 






20 










25 










30 






Met 


Asp 


Gly 


Leu 


Phe 


Ser 


Leu 


Ala 


Lys 


Thr 


Leu 


Phe 


Gly 


He 


Asp 


He 




35 










40 










45 








Glu 


Pro 
50 


Ala 


Asp 


Gly 


Leu 


Ala 
55 


Pro 


Val 


Trp 


Asn 


Asn 
60 


Asp 


Val 


Arg 


Phe 




Arg 


Val 


Lys 


Asp 


Ser 


Ser 


Gly 


Asn 


Pro 


He 


Ala 


Tyr 


Phe 


Tyr 


Phe 


65 










70 










75 










80 


Asp 


Pro 


Tyr 


Ser 


Arg 


Pro 


Ser 


Glu 


Lys 


Arg 


Gly 


Gly 


Ala 


Trp 


Met 


Asp 






85 










90 










95 




Glu 


Val 


Val 


Ser 
100 


Arg 


Ser 


Arg 


Val 


Met 

105 


Ala 


Gin 


Lys 


Gly 


Ser 
110 


Ser 


Val 


Arg 


Leu 


Pro 


Val 


Ala 


His 


Met 


Val 


Cys 


Asn 


Gin 


Thr 


Pro 


Pro 


Val 


Gly 




115 










120 










125 








Asp 


Lys 
130 


Pro 


Ser 


Leu 


Met 


Thr 
135 


Phe 


Arg 


Glu 














(2) 


INFORMATION 


FOR 


SEQ 


ID ; 


N0:114 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 106 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..106 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581077 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 


n 114 












Met 


Asp 


Gly 


Leu 


Phe 


Ser 


Leu Ala 


Lys 


Thr 


Leu 


Phe 


Gly 


He 


Asp 


He 


1 




5 








10 










15 




Glu 


Pro 


Ala 


Asp 


Gly 


Leu 


Ala Pro 


Val 


Trp 


Asn 


Asn 


Asp 


Val 


Arg 


Phe 








20 








25 










30 






Tyr 


Arg 


Val 


Lys 


Asp 


Ser 


Ser Gly 


Asn 


Pro 


He 


Ala 


Tyr 


Phe 


Tyr 


Phe 




35 








40 










45 








Asp 


Pro 


Tyr 


Ser 


Arg 


Pro 


Ser Glu 


Lys 


Arg 


Gly 


Gly 


Ala 


Trp 


Met 


Asp 


50 








55 








60 










Glu 


Val 


Val 


Ser 


Arg 


Ser 


Arg Val 


Met 


Ala 


Gin 


Lys 


Gly 


Ser 


Ser 


Val 


65 










70 








75 










80 


Arg 


Leu 


Pro 


Val 


Ala 


His 


Met Val 


Cys 


Asn 


Gin 


Thr 


Pro 


Pro 


Val 


Gly 








85 








90 










95 




Asp 


Lys 


Pro 


Ser 


Leu 


Met 


Thr Phe 


Arg 


Glu 


















100 








105 
















(2) 


INFORMATION 


FOR 


SEQ 


ID NO:115: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 485 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1. .485 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581118 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115; 
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attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgttggta 
aaatcggaga ggttactaag attttcactc acaacagtac cattgtgatc aaagatgtga 
acctc 

(2) INFORMATION FOR SEQ ID NO: 116: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581119 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 



Ser 


Leu 


He 


He 


Val 


Leu 


Arg 


Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


Phe 


1 








5 










10 










15 




Ser 


Asp 


Leu 


Phe 


Gin 


Phe 


Ser 


Arg 


Lys 


Asn 


Met 


Ser 


Cys 


Cys 


Gly 


Gly 






20 










25 










30 






Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


Gly 


Asn 


Gly 


Cys 


Gly 


Gly 




35 










40 










45 








Cys 


Lys 
50 


Arg 


Tyr 


Pro 


Asp 


Leu 
55 


Glu 


Asn 


Thr 


Ala 


Thr 
60 


Glu 


Thr 


Leu 


Val 


Leu 


Gly 


Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


65 








70 










75 










80 


Thr 


Phe 


Val 


Ala 


Glu 
85 


Asn 


Asp 


Ala 


Cys 


Lys 
90 


Cys 


Gly 


Ser 


Asp 


Cys 
95 


Lys 


Cys 


Asn 


Pro 


Cys 
100 


Thr 


Cys 


Lys 




















(2) 


INFORMATION 


FOR 


SEQ 


ID ' 


NO:117 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581120 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO 


-117 












Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 




5 










10 










15 




Gly 


Asn 


Gly 


Cys 


Gly 


Gly 


Cys 


Lys 


Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 






20 










25 










30 






Ala 


Thr 


Glu 


Thr 


Leu 


Val 


Leu 


Gly 


Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 






35 










40 










45 








Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe 


Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


50 










55 










60 










Cys 


Gly 


Ser 


Asp 


Cys 


Lys 


Cys 


Asn 


Pro 


Cys 


Thr 


Cys 


Lys 









65 70 75 

(2} INFORMATION FOR SEQ ID NO: 118: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 468 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..468 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581137 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: 
gattattgtt ttaaggcaaa ttaagatcat cttcaaaatc ttctcagatc tcttccaatt 
ttctagaaag aaacatgtct tgctgtggtg gaagctgtgg ttgtggatct gcctgcaagt 
gcggcaatgg ttgcggaggt tgcaaaaggt accctgactt ggagaacacc gccaccgaga 
ctcttgtcct cggtgttgct ccggcgatga actctcagta cgaggcttcc ggcgagactt 
tcgttgccga gaatgatgcc tgcaaatgcg gatctgactg caagtgcaac ccttgtacct 
gcaaatgaag aacttcataa accctaagtc tgtaataacc ctaatgttat gttaggtttg 
cttatatgta ataattggct gatttttccg gtagttttgc cggcgacgtt ggtctttctc 
ttcttcttct agatctcttc tttggtgatg tgtatatata tacacatc 
(2) INFORMATION FOR SEQ ID NO: 119 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1. .51 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581138 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119 : 
lie lie Val Leu Arg Gin lie Lys lie lie Phe Lys lie Phe Ser Asp 
15 10 15 

Leu Phe Gin Phe Ser Arg Lys Lys His Val Leu Leu Trp Trp Lys Leu 

20 25 30 

Trp Leu Trp lie Cys Leu Gin Val Arg Gin Trp Leu Arg Arg Leu Gin 

35 40 45 

Lys Val Pro 
50 

(2) INFORMATION FOR SEQ ID NO: 120 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581139 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 


:120: 










Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 






5 








10 










15 




Gly Asn 


Gly 


Cys 


Gly 


Gly 


Cys 


Lys Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 








20 








25 










30 






Ala 


Thr 


Glu 


Thr 


Leu 


Val 


Leu 


Gly Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 






35 










40 








45 








Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


50 










55 








60 










Cys 


Gly 


Ser 


Asp 


Cys 


Lys 


Cys 


Asn Pro 


Cys 


Thr 


Cys 


Lys 









65 70 75 

(2) INFORMATION FOR SEQ ID NO: 121: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1181 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1181 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581140 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 
aaacatcaaa gcaattaaca aatacacaac ttgtaacttc aaatagatta 
agagagagag aaagaaagat ggagaagaac atgaagtttc cagtagtaga 
ctcaatgggg aagagagaga ccaaaccatg gctctaatca atgaagcttg 
ggcttctttg agatagtgaa ccatggatta ccacatgact taatggacaa 
atgacaaagg accattacaa gacatgccaa gaacaaaagt tcaatgacat 
aaaggtttgg ataatcttga gacagaagtc gaagatgtcg attgggaaag 
gttcgtcacc tccctcaatc caatctcaat gacatttcag atgtgtctga 
acggccatga aagactttgg taagagactg gagaatcttg ctgaggattt 
ctgtgtgaga atctagggtt agagaaaggg tatttgaaga aagtgtttca 
ggcccaacct ttgggacaaa ggtgagcaat tatccaccat gtcctaaacc 
aaaggtctta gggcccacac tgatgcagga ggcatcatct tgttgtttca 
gtcagtggtc tccagcttct taaagatggt gactggattg atgttcctcc 
tctattgtca tcaatcttgg tgaccaactt gaggtgataa ccaacgggaa 
gtgctgcacc gtgtggtgac tcaacaagaa ggaaacagga tgtcggttgc 
aacccgggaa gcgatgcgga gatctcacca gctacttcgc ttgtcgagaa 
tacccgagtt tcgtctttga tgactacatg aagctttatg caggggtcaa 
aaggagccac ggttcgcagc aatgaagaat gcttctgcag ttacagaact 
gcagccgtag agactttcta aaaatggatt tgagattcaa gtgaagcaga 
tgagtttgtg ttgtgtgtta tggcaataag ttaaaacttg tattagtgtt 
tggtcaattg gtgtgtttta aagtgtgggg tgtttatgtt t 
(2) INFORMATION FOR SEQ ID NO: 122: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 320 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..320 



(D) OTHER INFORMATION: / Ceres Seq. ID 1581141 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 



Met 


Glu 


Lys 


Asn 


Met 


Lys 


Phe 


Pro 


Val 


Val 


Asp 


Leu 


Ser 


Lys 


Leu 


Asn 


1 






5 










10 










15 




Gly 


Glu 


Glu 


Arg 


Asp 


Gin 


Thr 


Met 


Ala 


Leu 


He 


Asn 


Glu 


Ala 


Cys 


Glu 






20 










25 










30 






Asn 


Trp 


Gly 


Phe 


Phe 


Glu 


He 


Val 


Asn 


His 


Gly 


Leu 


Pro 


His 


Asp 


Leu 




35 










40 










45 








Met 


Asp 


Lys 


lie 


Glu 


Lys 


Met 


Thr 


Lys 


Asp 


His 


Tyr 


Lys 


Thr 


Cys 


Gin 




50 








55 










60 










Glu 


Gin 


Lys 


Phe 


Asn 


Asp 


Met 


Leu 


Lys 


Ser 


Lys 


Gly 


Leu 


Asp 


Asn 


Leu 


65 








70 










75 










80 


Glu 


Thr 


Glu 


Val 


Glu 


Asp 


Val 


Asp 


Trp 


Glu 


Ser 


Thr 


Phe 


Tyr 


Val 


Arg 










85 










90 










95 




His 


Leu 


Pro 


Gin 


Ser 


Asn 


Leu 


Asn 


Asp 


He 


Ser 


Asp 


Val 


Ser 


Asp 


Glu 








100 










105 










110 






Tyr 


Arg 


Thr 


Ala 


Met 


Lys 


Asp 


Phe 


Gly 


Lys 


Arg 


Leu 


Glu 


Asn 


Leu 


Ala 




115 










120 










125 








Glu 


Asp 


Leu 


Leu 


Asp 


Leu 


Leu 


Cys 


Glu 


Asn 


Leu 


Gly 


Leu 


Glu 


Lys 


Gly 




130 










135 










140 










Tyr 


Leu 


Lys 


Lys 


Val 


Phe 


His 


Gly 


Thr 


Lys 


Gly 


Pro 


Thr 


Phe 


Gly 


Thr 



ctttcaaaga 60 

cttgtccaag 120 

tgagaattgg 18 0 

gatcgagaag 24 0 

gctcaagtcc 300 

cactttctac 360 

tgaatacagg 420 

gttggatcta 4 80 

tggaacaaaa 540 

agagatgatc 600 

agacgacaag 660 

tctcaaccac 720 

gtataagagt 780 

atcgttttac 840 

agattccgag 900 

gtttcagccc 960 

gaatcctaca 1020 

gaaagaaaga 1080 

gattaattgt 1140 
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Lys 


Val 


Ser 


Asn 


Tyr 


Pro 


Pro 


Cys 


Pro 


Lys 


Pro 


Glu 


Met 


He 


Lys 


Gly 








165 










170 










175 




Leu Arg 


Ala 


His 


Thr 


Asp 


Ala 


Gly 


Gly 


He 


He 


Leu 


Leu 


Phe 


Gin 


Asp 








180 










185 










190 






Asp 


Lys 


Val 


Ser 


Gly 


Leu 


Gin 


Leu 


Leu 


Lys 


Asp 


Gly 


Asp 


Trp 


He 


Asp 


195 










200 










205 








Val 


Pro 


Pro 


Leu 


Asn 


His 


Ser 


He 


Val 


He 


Asn 


Leu 


Gly Asp 


Gin 


Leu 




210 










215 










220 










Glu 


Val 


He 


Thr 


Asn 


Gly 


Lys 


Tyr 


Lys 


Ser 


Val 


Leu 


His 


Arg 


Val 


Val 


225 










230 










235 










240 


Thr 


Gin 


Gin 


Glu 


Gly 
245 


Asn 


Arg 


Met 


Ser 


Val 
250 


Ala 


Ser 


Phe 


Tyr 


Asn 
255 


Pro 


Gly 


Ser 


Asp 


Ala 


Glu 


He 


Ser 


Pro 


Ala 


Thr 


Ser 


Leu 


Val 


Glu 


Lys 


Asp 




260 










265 










270 






Ser 


Glu 


Tyr 
275 


Pro 


Ser 


Phe 


Val 


Phe 
280 


Asp 


Asp 


Tyr 


Met 


Lys 
285 


Leu 


Tyr 


Ala 


Gly 


Val 


Lys 


Phe 


Gin 


Pro 


Lys 


Glu 


Pro 


Arg 


Phe 


Ala 


Ala 


Met 


Lys 


Asn 


290 










295 










300 










Ala 


Ser 


Ala 


Val 


Thr 


Glu 


Leu 


Asn 


Pro 


Thr 


Ala 


Ala 


Val 


Glu 


Thr 


Phe 


305 










310 










315 










320 



(2) INFORMATION FOR SEQ ID NO: 123: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 316 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..316 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581142 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 123: 
Met Lys Phe Pro Val Val Asp Leu Ser Lys Leu Asn Gly Glu Glu Arg 
15 10 15 

Asp Gin Thr Met Ala Leu He Asn Glu Ala Cys Glu Asn Trp Gly Phe 

20 25 30 

Phe Glu He Val Asn His Gly Leu Pro His Asp Leu Met Asp Lys He 

35 40 45 

Glu Lys Met Thr Lys Asp His Tyr Lys Thr Cys Gin Glu Gin Lys Phe 

50 55 60 

Asn Asp Met Leu Lys Ser Lys Gly Leu Asp Asn Leu Glu Thr Glu Val 
65 70 75 80 

Glu Asp Val Asp Trp Glu Ser Thr Phe Tyr Val Arg His Leu Pro Gin 

85 90 95 

Ser Asn Leu Asn Asp He Ser Asp Val Ser Asp Glu Tyr Arg Thr Ala 

100 105 HO 

Met Lys Asp Phe Gly Lys Arg Leu Glu Asn Leu Ala Glu Asp Leu Leu 

115 120 125 

Asp Leu Leu Cys Glu Asn Leu Gly Leu Glu Lys Gly Tyr Leu Lys Lys 

130 135 140 

Val Phe His Gly Thr Lys Gly Pro Thr Phe Gly Thr Lys Val Ser Asn 
145 150 155 160 

Tyr Pro Pro Cys Pro Lys Pro Glu Met He Lys Gly Leu Arg Ala His 

165 170 175 

Thr Asp Ala Gly Gly He He Leu Leu Phe Gin Asp Asp Lys Val Ser 

180 185 190 

Gly Leu Gin Leu Leu Lys Asp Gly Asp Trp He Asp Val Pro Pro Leu 

195 200 205 

Asn His Ser He Val He Asn Leu Gly Asp Gin Leu Glu Val He Thr 
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210 215 220 

Asn Gly Lys Tyr Lys Ser Val Leu His Arg Val Val Thr Gin Gin Glu 
225 230 235 240 

Gly Asn Arg Met Ser Val Ala Ser Phe Tyr Asn Pro Gly Ser Asp Ala 

245 250 255 

Glu He Ser Pro Ala Thr Ser Leu Val Glu Lys Asp Ser Glu Tyr Pro 

260 265 270 

Ser Phe Val Phe Asp Asp Tyr Met Lys Leu Tyr Ala Gly Val Lys Phe 

275 280 285 

Gin Pro Lys Glu Pro Arg Phe Ala Ala Met Lys Asn Ala Ser Ala Val 

290 295 300 

Thr Glu Leu Asn Pro Thr Ala Ala Val Glu Thr Phe 
305 310 315 

(2) INFORMATION FOR SEQ ID NO: 124: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 297 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..2 97 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581143 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 124: 
Met Ala Leu He Asn Glu Ala Cys Glu Asn Trp Gly Phe Phe Glu He 
15 10 15 

Val Asn His Gly Leu Pro His Asp Leu Met Asp Lys He Glu Lys Met 

20 25 30 

Thr Lys Asp His Tyr Lys Thr Cys Gin Glu Gin Lys Phe Asn Asp Met 

35 40 45 

Leu Lys Ser Lys Gly Leu Asp Asn Leu Glu Thr Glu Val Glu Asp Val 

50 55 60 

Asp Trp Glu Ser Thr Phe Tyr Val Arg His Leu Pro Gin Ser Asn Leu 
65 70 75 80 

Asn Asp He Ser Asp Val Ser Asp Glu Tyr Arg Thr Ala Met Lys Asp 

85 90 95 

Phe Gly Lys Arg Leu Glu Asn Leu Ala Glu Asp Leu Leu Asp Leu Leu 

100 105 HO 

Cys Glu Asn Leu Gly Leu Glu Lys Gly Tyr Leu Lys Lys Val Phe His 

115 120 125 

Gly Thr Lys Gly Pro Thr Phe Gly Thr Lys Val Ser Asn Tyr Pro Pro 

130 135 140 

Cys Pro Lys Pro Glu Met He Lys Gly Leu Arg Ala His Thr Asp Ala 
145 150 155 160 

Gly Gly He He Leu Leu Phe Gin Asp Asp Lys Val Ser Gly Leu Gin 

165 170 175 

Leu Leu Lys Asp Gly Asp Trp He Asp Val Pro Pro Leu Asn His Ser 

180 185 190 

He Val He Asn Leu Gly Asp Gin Leu Glu Val He Thr Asn Gly Lys 

195 200 205 

Tyr Lys Ser Val Leu His Arg Val Val Thr Gin Gin Glu Gly Asn Arg 

210 215 220 

Met Ser Val Ala Ser Phe Tyr Asn Pro Gly Ser Asp Ala Glu He Ser 
225 230 235 240 

Pro Ala Thr Ser Leu Val Glu Lys Asp Ser Glu Tyr Pro Ser Phe Val 

245 250 255 

Phe Asp Asp Tyr Met Lys Leu Tyr Ala Gly Val Lys Phe Gin Pro Lys 

260 265 270 

Glu Pro Arg Phe Ala Ala Met Lys Asn Ala Ser Ala Val Thr Glu Leu 
275 280 285 
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60 



180 
240 
300 



Asn Pro Thr Ala Ala Val Glu Thr Phe 

290 295 
(2) INFORMATION FOR SEQ ID NO: 125: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1202 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1202 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581172 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 125: 
atcattctcg gtttaactgt gaaacacata ataaaacaaa gagaaagaga tataatatgg 
gtgggtgggc aatcgcagta cacggtggtg ccggtatcga ccctaatctt ccggcagaga 120 
gacaagaaga ggcgaaacag cttttaactc gttgtctcaa cctcggcata atagctttgc i «n 

gttccaatgt ttccgccatt gacgtcgttg agctcgtcat tagagaattg gagacggatc 
ctctgtttaa ttcaggccgt ggatctgctt tgacggagaa aggaacggtt gagatggaag 
ctagcattat ggacggtacg aagagacgat gcggtgccgt ttcggggata accaccgtga 360 
aaaatcctat atctcttgct cgtctcgtca tggacaaatc tccccactct taccttgctt 420 
tctcaggtgc agaggatttc gcccgcaaac agggagttga aattgtggac aacgagtact 480 
ttgtcacgga cgacaacgta ggaatgctca agttggccaa ggaagctaac tccatcttgt 540 
ttgattaccg gattccgccg atgggatgtg ccggcgcagc tgcgaccgac agtccaatcc 600 
aaatgaacgg tcttccgatc agcatttacg caccggagac agtcgggtgc gttgtggttg 660 
acgggaaagg acattgtgcc gccgggacat ccacgggtgg tttaatgaac aagatgatgg 720 
gaaggattgg tgactcgccg ctgataggag ccgggacgta tgcgtcggag ttttgtggtg 780 
tgtcgtgtac cggagaagga gaagccatta taagagcaac cctagctcgt gatgtgtcag 840 
ctgttatgga gtataaagga cttaacctcc aagaagcggt tgattacgtc atcaagcatc 
gacttgacga agggttcgct ggactcattg ctgtctcgaa taaaggagag gtggtttgtg 
gttttaactc taatgggatg ttcaggggat gtgcaactga ggatggattc atggaggttg 
ctatttggga gtgagaaata ttttagatta agaaaatgtc ttactagtat ttaatcagtc 
atcgctctat taatttggtt attcattatc ataaagctgg agtagtaaat ttagttctgt 1140 
cgttatcacc agtcctatat tgatttgtgt ttaatgcggt ttcaaatggt tagttcgtgt 1200 
tt 

(2) INFORMATION FOR SEQ ID NO: 12 6: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 343 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..343 

(D) OTHER INFORMATION: / Ceres Seq, ID 1581173 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 126: 
His Ser Arg Phe Asn Cys Glu Thr His Asn Lys Thr Lys Arg Lys Arg 
1 5 10 15 

Tyr Asn Met Gly Gly Trp Ala He Ala Val His Gly Gly Ala Gly He 

20 25 30 

Asp Pro Asn Leu Pro Ala Glu Arg Gin Glu Glu Ala Lys Gin Leu Leu 

35 40 45 

Thr Arg Cys Leu Asn Leu Gly He He Ala Leu Arg Ser Asn Val Ser 

50 55 60 

Ala He Asp Val Val Glu Leu Val He Arg Glu Leu Glu Thr Asp Pro 
65 70 75 80 

Leu Phe Asn Ser Gly Arg Gly Ser Ala Leu Thr Glu Lys Gly Thr Val 

85 90 95 

Glu Met Glu Ala Ser He Met Asp Gly Thr Lys Arg Arg Cys Gly Ala 
100 105 HO 



900 
960 
1020 
1080 
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Val Ser Gly lie Thr Thr Val Lys Asn Pro lie Ser Leu Ala Arg Leu 

115 120 125 

Val Met Asp Lys Ser Pro His Ser Tyr Leu Ala Phe Ser Gly Ala Glu 

130 135 140 

Asp Phe Ala Arg Lys Gin Gly Val Glu lie Val Asp Asn Glu Tyr Phe 
145 150 155 160 

Val Thr Asp Asp Asn Val Gly Met Leu Lys Leu Ala Lys Glu Ala Asn 

165 170 175 

Ser lie Leu Phe Asp Tyr Arg lie Pro Pro Met Gly Cys Ala Gly Ala 

180 185 190 

Ala Ala Thr Asp Ser Pro lie Gin Met Asn Gly Leu Pro lie Ser lie 

195 200 205 

Tyr Ala Pro Glu Thr Val Gly Cys Val Val Val Asp Gly Lys Gly His 

210 215 220 

Cys Ala Ala Gly Thr Ser Thr Gly Gly Leu Met Asn Lys Met Met Gly 
225 230 235 240 

Arg lie Gly Asp Ser Pro Leu lie Gly Ala Gly Thr Tyr Ala Ser Glu 

245 250 255 

Phe Cys Gly Val Ser Cys Thr Gly Glu Gly Glu Ala He He Arg Ala 

260 265 270 

Thr Leu Ala Arg Asp Val Ser Ala Val Met Glu Tyr Lys Gly Leu Asn 

275 280 285 

Leu Gin Glu Ala Val Asp Tyr Val He Lys His Arg Leu Asp Glu Gly 

290 295 300 

Phe Ala Gly Leu He Ala Val Ser Asn Lys Gly Glu Val Val Cys Gly 
305 310 315 320 

Phe Asn Ser Asn Gly Met Phe Arg Gly Cys Ala Thr Glu Asp Gly Phe 

325 330 335 

Met Glu Val Ala He Trp Glu 
340 

(2) INFORMATION FOR SEQ ID NO: 127: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 325 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..325 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581174 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127: 
Met Gly Gly Trp Ala He Ala Val His Gly Gly Ala Gly He Asp Pro 
15 10 15 

Asn Leu Pro Ala Glu Arg Gin Glu Glu Ala Lys Gin Leu Leu Thr Arg 

20 25 30 

Cys Leu Asn Leu Gly He He Ala Leu Arg Ser Asn Val Ser Ala He 

35 40 45 

Asp Val Val Glu Leu Val He Arg Glu Leu Glu Thr Asp Pro Leu Phe 

50 55 60 

Asn Ser Gly Arg Gly Ser Ala Leu Thr Glu Lys Gly Thr Val Glu Met 
65 70 75 80 

Glu Ala Ser He Met Asp Gly Thr Lys Arg Arg Cys Gly Ala Val Ser 

85 90 95 

Gly He Thr Thr Val Lys Asn Pro He Ser Leu Ala Arg Leu Val Met 

100 105 HO 

Asp Lys Ser Pro His Ser Tyr Leu Ala Phe Ser Gly Ala Glu Asp Phe 

115 120 125 

Ala Arg Lys Gin Gly Val Glu He Val Asp Asn Glu Tyr Phe Val Thr 

130 135 140 

Asp Asp Asn Val Gly Met Leu Lys Leu Ala Lys Glu Ala Asn Ser He 
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145 150 155 160 

Leu Phe Asp Tyr Arg lie Pro Pro Met Gly Cys Ala Gly Ala Ala Ala 

165 170 175 

Thr Asp Ser Pro lie Gin Met Asn Gly Leu Pro He Ser He Tyr Ala 

180 185 190 

Pro Glu Thr Val Gly Cys Val Val Val Asp Gly Lys Gly His Cys Ala 

195 200 205 

Ala Gly Thr Ser Thr Gly Gly Leu Met Asn Lys Met Met Gly Arg Tie 

210 215 220 

Gly Asp Ser Pro Leu He Gly Ala Gly Thr Tyr Ala Ser Glu Phe Cys 
225 230 235 240 

Gly Val Ser Cys Thr Gly Glu Gly Glu Ala He He Arg Ala Thr Leu 

245 250 255 

Ala Arg Asp Val Ser Ala Val Met Glu Tyr Lys Gly Leu Asn Leu Gin 

260 265 270 

Glu Ala Val Asp Tyr Val He Lys His Arg Leu Asp Glu Gly Phe Ala 

275 280 285 

Gly Leu He Ala Val Ser Asn Lys Gly Glu Val Val Cys Gly Phe Asn 

290 295 300 

Ser Asn Gly Met Phe Arg Gly Cys Ala Thr Glu Asp Gly Phe Met Glu 
305 310 315 320 

Val Ala He Trp Glu 
325 

(2) INFORMATION FOR SEQ ID NO: 128: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..246 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581175 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128: 
Met Glu Ala Ser He Met Asp Gly Thr Lys Arg Arg Cys Gly Ala Val 
15 10 15 

Ser Gly He Thr Thr Val Lys Asn Pro He Ser Leu Ala Arg Leu Val 

20 25 30 

Met Asp Lys Ser Pro His Ser Tyr Leu Ala Phe Ser Gly Ala Glu Asp 

35 40 45 

Phe Ala Arg Lys Gin Gly Val Glu He Val Asp Asn Glu Tyr Phe Val 

50 55 60 

Thr Asp Asp Asn Val Gly Met Leu Lys Leu Ala Lys Glu Ala Asn Ser 
65 70 75 80 

He Leu Phe Asp Tyr Arg He Pro Pro Met Gly Cys Ala Gly Ala Ala 

85 90 95 

Ala Thr Asp Ser Pro He Gin Met Asn Gly Leu Pro He Ser He Tyr 

100 105 HO 

Ala Pro Glu Thr Val Gly Cys Val Val Val Asp Gly Lys Gly His Cys 

115 120 125 

Ala Ala Gly Thr Ser Thr Gly Gly Leu Met Asn Lys Met Met Gly Arg 

130 135 140 

He Gly Asp Ser Pro Leu He Gly Ala Gly Thr Tyr Ala Ser Glu Phe 
145 150 155 160 

Cys Gly Val Ser Cys Thr Gly Glu Gly Glu Ala He He Arg Ala Thr 

165 170 175 

Leu Ala Arg Asp Val Ser Ala Val Met Glu Tyr Lys Gly Leu Asn Leu 

180 185 190 

Gin Glu Ala Val Asp Tyr Val He Lys His Arg Leu Asp Glu Gly Phe 
195 200 205 
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Ala Gly Leu He Ala Val Ser Asn Lys Gly Glu Val Val Cys Gly Phe 

210 215 220 

Asn Ser Asn Gly Met Phe Arg Gly Cys Ala Thr Glu Asp Gly Phe Met 
225 230 235 240 

Glu Val Ala He Trp Glu 
245 

(2) INFORMATION FOR SEQ ID NO: 129: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 480 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..480 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581200 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 129: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 60 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 120 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tganttggag aacaccgcca 180 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 240 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 300 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 360 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgtagcga 420 
tcgccgtcat cttgactctc ccttacccaa tgctcctgaa gaaacgcatc gtatcactta 480 

(2) INFORMATION FOR SEQ ID NO: 130: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581201 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:130: 
Ser Leu He He Val Leu Arg Gin He Lys He He Phe Lys He Phe 
15 10 15 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly 

20 25 30 

Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys Gly Gly 

35 40 45 

Cys Lys Arg Tyr Pro Xaa Leu Glu Asn Thr Ala Thr Glu Thr Leu Val 

50 55 60 

Leu Gly Val Ala Pro Ala Met Asn Ser Gin Tyr Glu Ala Ser Gly Glu 
65 70 75 80 

Thr Phe Val Ala Glu Asn Asp Ala Cys Lys Cys Gly Ser Asp Cys Lys 

85 90 95 

Cys Asn Pro Cys Thr Cys Lys 
100 

(2) INFORMATION FOR SEQ ID NO: 131: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 
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(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581202 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Xaa Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 
65 70 75 

(2) INFORMATION FOR SEQ ID NO: 132: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 420 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. .420 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581205 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132: 
tcactgatta ttgttttaag gcaaattaag atcatcttca aaatcttctc agatctcttc 60 
caattttcta gaaaaaacat gtcttgctgt ggtggaagct gtggttgtgg atctgcctgc 120 
aagtgcggca atggttgcgg aggttgcaaa aggtaccctg acttggagaa caccgccacc 
gagactcttg tcctcggtgt tgctccggcg atgaactctc agtacgaggc ttccggcgag 
actttcgttg ccgagaatga tgcctgcaaa tgcggatctg actgcaagtg caacccttgt 
acctgcaaat gaagaacttc ataaacccta agtctgtaat aaccctaatg ttatgttagg 
tttgcttata tgtaataatt ggctgatttt tccggtagtt ttgccggcga acaacaacca 420 

(2) INFORMATION FOR SEQ ID NO: 133: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1. .103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581206 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 133: 
Ser Leu He He Val Leu Arg Gin He Lys He He Phe Lys He Phe 
1 5 10 15 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly 

20 25 30 

Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys Gly Gly 

35 40 45 

Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr Ala Thr Glu Thr Leu Val 

50 55 60 

Leu Gly Val Ala Pro Ala Met Asn Ser Gin Tyr Glu Ala Ser Gly Glu 
65 70 75 80 

Thr Phe Val Ala Glu Asn Asp Ala Cys Lys Cys Gly Ser Asp Cys Lys 

85 90 95 

Cys Asn Pro Cys Thr Cys Lys 
100 

(2) INFORMATION FOR SEQ ID NO: 134: 



180 
240 
300 
360 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1. .77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581207 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 
65 70 75 

(2) INFORMATION FOR SEQ ID NO: 135: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 214 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..2146 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581223 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 135: 



ctctgtgcct tttttcctct tttctcgttc ccctctactt catttagggt ttatcatcct 60 

ctaccaatcc gccgattcta tacatacact tttcatatat atacatacat cgtctccact 120 

ctcgttggct catttttcat tcaccgccac gaagcaagag gagattcgta ttagccacca 180 

tgtctttgag acctaacgct aagactgagg ttcgccggaa ccgctacaaa gtggcggtgg 240 

atgctgagga aggaagacgg aggagagagg acaacatggt cgagatccgt aagagtaagc 300 

gtgaagaaag cttacagaag aagcgtcgtg aaggccttca agctaatcag ctacctcaat 360 

tcgctccttc ttctgttcct gcttcatcaa ctgttgagaa gaagttggag agtttgcctg 420 

ctatggttgg tggtgtttgg tcagatgata gaagtttaca gcttgaagcg actactcagt 480 

tccggaaatt gctttctatt gagcgtagtc ctccgattga agaagtgatt gatgctggtg 540 

ttgtgcctcg gtttgtggaa tttctcacta gagaggatta tccacagctt cagtttgagg 600 

ctgcttgggc attgacaaac attgcttctg ggacttcaga gaatacaaag gtggtcattg 660 

aacatggtgc tgttccgatt tttgttcaac tattggcttc tcagagtgat gatgtccgtg 720 

agcaggctgt gtgggcatta gggaatgttg ctggtgattc tccacggtgc agagatcttg 7 80 

tccttggtca aggagctttg ataccattac tctctcagtt gaatgagcat gctaaactct 840 

cgatgctgag aaatgctact tggaccctct ctaatttctg caggggcaag ccacagcctc 900 

cctttgacca ggtccgtcca gcactcccag cccttgaacg cctgattcat tcgactgatg 960 

aggaagtgtt aacggatgca tgttgggctc tctcttacct ttccgatggc acaaatgaca 1020 

aaatccaatc tgtcattgag gcaggtgttg ttcctcgact cgtcgagctt ctccagcatc 1080 

aatcaccatc tgtgcttatt cctgcccttc gtagtattgg taacattgtc actggagatg 1140 

atctgcagac acagtgtgta attagtcatg gtgcacttct tagccttttg agccttctga 1200 

ctcacaatca taagaagagc atcaaaaagg aagcttgctg gacaatctca aatatcactg 1260 

ctggaaacag ggaccagatt caggctgtgt gtgaagctgg tttgatttgc cctcttgtca 1320 

atctgcttca aaatgcggag tttgatataa agaaagaagc tgcgtgggca atatcaaatg 1380 

ccacttcagg tggttctcct gaccaaatca agtacatggt ggaacaggga gtcgtgaaac 1440 

cactgtgtga tcttttggtg tgccctgatc caaggattat cactgtgtgt ctggaaggat 1500 

tggagaacat tttgaaagtt ggggaagctg aaaaggtcac aggaaatacc ggggatgtaa 15 60 

acttttatgc acagctgatc gatgatgctg agggattaga aaagattgag aatctgcaga 1620 

gtcatgacaa cagcgaaatc tatgagaagg cagtgaagat tcttgaaacg tactggttgg 1680 
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aagaggaaga tgaaacttta ccacctggtg atccctctgc acaaggcttc 
gaggtaatga tgcagccgta ccaccaggtg gattcaactt ccagtgaagg 
gacgatttga agagtcgaag gagcttaaag agtttggggg cactggttaa 
attagagagt ctggtcttct gtgtaaaagt catatggggt gccttactta 
gtccggtcag atccagtggt tggtgggttt gagtcatgga ataagcttgt 
ttgggtagga gtcatcaagg gatatttatt tttgggaaag ggtttataag 
gtgatctgat gagaatgtgt ctctgggttt ttgttacatt tgtactacac 
agtcaaacat aaaaaagttg gtcaaacttt atttggtctt attggc 
(2) INFORMATION FOR SEQ ID NO: 136 * 



cagtttggag 
agctgaacat 
aaagggtatg 
gtctggcctg 
tgggggggag 
tttttaaagt 
aacactgcaa 



<i) 



(ii) 
(ix) 



(xi) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 535 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE : 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 



peptide 
1. .535 

(D) OTHER INFORMATION: / Ceres Seq. ID 158122^ 
SEQUENCE DESCRIPTION: SEQ ID NO: 136: 



Met Ser Leu Arg Pro Asn Ala Lys Thr Glu Val Arg Arg Asn Arg Tyr 
15 10 15 

Lys Val Ala Val Asp Ala Glu Glu Gly Arg Arg Arg Arg Glu Asp Asn 

30 



20 



25 



Met Val Glu He Arg Lys Ser Lys Arg Glu Glu Ser Leu Gin 

35 40 45 

Arg Arg Glu Gly Leu Gin Ala Asn Gin Leu Pro Gin Phe Ala 

50 55 60 

Ser Val Pro Ala Ser Ser Thr Val Glu Lys Lys Leu Glu Ser 
65 70 75 

Ala Met Val Gly Gly Val Trp Ser Asp Asp Arg Ser Leu Gin 

85 90 
Ala Thr Thr Gin Phe Arg Lys Leu Leu Ser He Glu Arg Ser 
100 105 HO 

He Glu Glu Val He Asp Ala Gly Val Val Pro Arg Phe Val 

115 120 125 

Leu Thr Arg Glu Asp Tyr Pro Gin Leu Gin Phe Glu Ala Ala 

130 135 140 

Leu Thr Asn He Ala Ser Gly Thr Ser Glu Asn Thr Lys Val 
145 150 155 

Glu His Gly Ala Val Pro He Phe Val Gin Leu Leu Ala Ser 

165 170 
Asp Asp Val Arg Glu Gin Ala Val Trp Ala Leu Gly Asn Val 
180 185 190 

Asp Ser Pro Arg Cys Arg Asp Leu Val Leu Gly Gin Gly Ala 

195 200 205 

Pro Leu Leu Ser Gin Leu Asn Glu His Ala Lys Leu Ser Met 

210 215 220 

Asn Ala Thr Trp Thr Leu Ser Asn Phe Cys Arg Gly Lys Pro 
225 230 235 

Pro Phe Asp Gin Val Arg Pro Ala Leu Pro Ala Leu Glu Arg 

245 250 
His Ser Thr Asp Glu Glu Val Leu Thr Asp Ala Cys Trp Ala 
260 265 270 

Tyr Leu Ser Asp Gly Thr Asn Asp Lys He Gin Ser Val He 

275 280 285 

Gly Val Val Pro Arg Leu Val Glu Leu Leu Gin His Gin Ser 

290 295 300 

Val Leu He Pro Ala Leu Arg Ser He Gly Asn He Val Thr 
305 310 315 

Asp Leu Gin Thr Gin Cys Val He Ser His Gly Ala Leu Leu 



Lys Lys 

Pro Ser 

Leu Pro 
80 

Leu Glu 
95 

Pro Pro 

Glu Phe 

Trp Ala 

Val He 
160 
Gin Ser 
175 

Ala Gly 

Leu He 

Leu Arg 

Gin Pro 
240 
Leu He 
255 

Leu Ser 

Glu Ala 

Pro Ser 

Gly Asp 
320 
Ser Leu 



1740 
1800 
1860 
1920 
1980 
2040 
2100 
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325 330 335 



Leu 


Ser 


Leu 


Leu 
340 


Thr 


His 


Asn 


His 


Lys 
345 


Lys 


Ser 


He 


Lys 


Lys 
350 


Glu 


Ala 


Cys 


Trp 


Thr 


He 


Ser 


Asn 


He 


Thr 


Ala 


Gly 


Asn 


Arg 


Asp 


Gin 


He 


Gin 


355 










360 










365 








Ala 


Val 
370 


Cys 


Glu 


Ala 


Gly 


Leu 
375 


He 


Cys 


Pro 


Leu 


Val 
380 


Asn 


Leu 


Leu 


Gin 


Asn 


Ala 


Glu 


Phe 


Asp 


He 


Lys 


Lys 


Glu 


Ala 


Ala 


Trp 


Ala 


He 


Ser 


Asn 


385 










390 










395 










400 


Ala 


Thr 


Ser 


Gly 


Gly 


Ser 


Pro 


Asp 


Gin 


He 


Lys 


Tyr 


Met 


Val 


Glu 


Gin 








405 










410 










415 




Gly 


Val 


Val 


Lys 


Pro 


Leu 


Cys 


Asp 


Leu 


Leu 


Val 


Cys 


Pro 


Asp 


Pro 


Arg 






420 










425 










430 






lie 


lie 


Thr 
435 


Val 


Cys 


Leu 


Glu 


Gly 
440 


Leu 


Glu 


Asn 


He 


Leu 
445 


Lys 


Val 


Gly 


Glu 


Ala 


Glu 


Lys 


Val 


Thr 


Gly 


Asn 


Thr 


Gly 


Asp 


Val 


Asn 


Phe 


Tyr 


Ala 




450 








455 










460 










Gin 


Leu 


He 


Asp 


Asp 


Ala 


Glu 


Gly 


Leu 


Glu 


Lys 


He 


Glu 


Asn 


Leu 


Gin 


465 








470 










475 










480 


Ser 


His 


Asp 


Asn 


Ser 


Glu 


He 


Tyr 


Glu 


Lys 


Ala 


Val 


Lys 


He 


Leu 


Glu 








485 










490 










495 




Thr 


Tyr 


Trp 


Leu 


Glu 


Glu 


Glu 


Asp 


Glu 


Thr 


Leu 


Pro 


Pro 


Gly 


Asp 


Pro 




500 










505 










510 






Ser 


Ala 


Gin 


Gly 


Phe 


Gin 


Phe 


Gly 


Gly 


Gly 


Asn 


Asp 


Ala 


Ala 


Val 


Pro 






515 








520 










525 








Pro 


Gly 
530 


Gly 


Phe 


Asn 


Phe 


Gin 
535 





















(2) INFORMATION FOR SEQ ID NO: 137: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 503 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1. . 503 

(D) OTHER INFORMATION: / Ceres Seq* ID 1581225 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137: 



Met 


Val 


Glu 


He 


Arg 


Lys 


Ser 


Lys 


Arg 


Glu 


Glu 


Ser 


Leu 


Gin 


Lys 


Lys 


1 








5 










10 










15 




Arg 


Arg 


Glu 


Gly 


Leu 


Gin 


Ala 


Asn 


Gin 


Leu 


Pro 


Gin 


Phe 


Ala 


Pro 


Ser 




20 










25 










30 






Ser 


Val 


Pro 


Ala 


Ser 


Ser 


Thr 


Val 


Glu 


Lys 


Lys 


Leu 


Glu 


Ser 


Leu 


Pro 






35 










40 










45 






Glu 


Ala 


Met 


Val 


Gly 


Gly 


Val 


Trp 


Ser 


Asp 


Asp 


Arg 


Ser 


Leu 


Gin 


Leu 




50 










55 










60 










Ala 


Thr 


Thr 


Gin 


Phe 


Arg 


Lys 


Leu 


Leu 


Ser 


He 


Glu 


Arg 


Ser 


Pro 


Pro 


65 










70 










75 










80 


He 


Glu 


Glu 


Val 


He 


Asp 


Ala 


Gly 


Val 


Val 


Pro 


Arg 


Phe 


Val 


Glu 


Phe 










85 








90 










95 




Leu 


Thr 


Arg 


Glu 


Asp 


Tyr 


Pro 


Gin 


Leu 


Gin 


Phe 


Glu 


Ala 


Ala 


Trp 


Ala 






100 










105 










110 






Leu 


Thr 


Asn 


He 


Ala 


Ser 


Gly 


Thr 


Ser 


Glu 


Asn 


Thr 


Lys 


Val 


Val 


He 






115 










120 










125 








Glu 


His 


Gly 


Ala 


Val 


Pro 


He 


Phe 


Val 


Gin 


Leu 


Leu 


Ala 


Ser 


Gin 


Ser 




130 








135 










140 










Asp 


Asp 


Val 


Arg 


Glu 


Gin 


Ala 


Val 


Trp 


Ala 


Leu 


Gly 


Asn 


Val 


Ala 


Gly 


145 








150 










155 










160 


Asp 


Ser 


Pro 


Arg 


Cys 


Arg 


Asp 


Leu 


Val 


Leu 


Gly 


Gin 


Gly Ala 


Leu 


He 








165 










170 










175 
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Pro Leu Leu Ser Gin Leu Asn Glu His Ala Lys Leu Ser Met Leu Arg 

180 185 190 

Asn Ala Thr Trp Thr Leu Ser Asn Phe Cys Arg Gly Lys Pro Gin Pro 

195 200 205 

Pro Phe Asp Gin Val Arg Pro Ala Leu Pro Ala Leu Glu Arg Leu He 

210 215 220 

His Ser Thr Asp Glu Glu Val Leu Thr Asp Ala Cys Trp Ala Leu Ser 
225 230 235 240 

Tyr Leu Ser Asp Gly Thr Asn Asp Lys He Gin Ser Val He Glu Ala 

245 250 255 

Gly Val Val Pro Arg Leu Val Glu Leu Leu Gin His Gin Ser Pro Ser 

260 265 270 

Val Leu He Pro Ala Leu Arg Ser He Gly Asn He Val Thr Gly Asp 

275 280 285 

Asp Leu Gin Thr Gin Cys Val He Ser His Gly Ala Leu Leu Ser Leu 

290 295 300 

Leu Ser Leu Leu Thr His Asn His Lys Lys Ser He Lys Lys Glu Ala 
305 310 315 320 

Cys Trp Thr He Ser Asn He Thr Ala Gly Asn Arg Asp Gin He Gin 

325 330 335 

Ala Val Cys Glu Ala Gly Leu He Cys Pro Leu Val Asn Leu Leu Gin 

340 345 350 

Asn Ala Glu Phe Asp He Lys Lys Glu Ala Ala Trp Ala He Ser Asn 

355 360 365 

Ala Thr Ser Gly Gly Ser Pro Asp Gin He Lys Tyr Met Val Glu Gin 

370 375 380 

Gly Val Val Lys Pro Leu Cys Asp Leu Leu Val Cys Pro Asp Pro Arg 
385 390 395 400 

He He Thr Val Cys Leu Glu Gly Leu Glu Asn He Leu Lys Val Gly 

405 410 415 

Glu Ala Glu Lys Val Thr Gly Asn Thr Gly Asp Val Asn Phe Tyr Ala 

420 425 430 

Gin Leu He Asp Asp Ala Glu Gly Leu Glu Lys He Glu Asn Leu Gin 

435 440 445 

Ser His Asp Asn Ser Glu He Tyr Glu Lys Ala Val Lys He Leu Glu 

450 455 460 

Thr Tyr Trp Leu Glu Glu Glu Asp Glu Thr Leu Pro Pro Gly Asp Pro 
465 470 475 480 

Ser Ala Gin Gly Phe Gin Phe Gly Gly Gly Asn Asp Ala Ala Val Pro 

485 490 495 

Pro Gly Gly Phe Asn Phe Gin 
500 

(2) INFORMATION FOR SEQ ID NO: 138: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 454 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..454 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581226 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 138: 
Met Val Gly Gly Val Trp Ser Asp Asp Arg Ser Leu Gin Leu Glu Ala 
15 10 15 

Thr Thr Gin Phe Arg Lys Leu Leu Ser He Glu Arg Ser Pro Pro He 

20 25 30 

Glu Glu Val He Asp Ala Gly Val Val Pro Arg Phe Val Glu Phe Leu 

35 40 45 

Thr Arg Glu Asp Tyr Pro Gin Leu Gin Phe Glu Ala Ala Trp Ala Leu 
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50 55 60 

Thr Asn He Ala Ser Gly Thr Ser Glu Asn Thr Lys Val Val He Glu 
65 70 75 80 

His Gly Ala Val Pro He Phe Val Gin Leu Leu Ala Ser Gin Ser Asp 

85 90 95 

Asp Val Arg Glu Gin Ala Val Trp Ala Leu Gly Asn Val Ala Gly Asp 

100 105 HO 

Ser Pro Arg Cys Arg Asp Leu Val Leu Gly Gin Gly Ala Leu He Pro 

115 120 125 

Leu Leu Ser Gin Leu Asn Glu His Ala Lys Leu Ser Met Leu Arg Asn 

130 135 140 

Ala Thr Trp Thr Leu Ser Asn Phe Cys Arg Gly Lys Pro Gin Pro Pro 
145 150 155 160 

Phe Asp Gin Val Arg Pro Ala Leu Pro Ala Leu Glu Arg Leu He His 

165 170 175 

Ser Thr Asp Glu Glu Val Leu Thr Asp Ala Cys Trp Ala Leu Ser Tyr 

180 185 190 

Leu Ser Asp Gly Thr Asn Asp Lys He Gin Ser Val He Glu Ala Gly 

195 200 205 

Val Val Pro Arg Leu Val Glu Leu Leu Gin His Gin Ser Pro Ser Val 

210 215 220 

Leu He Pro Ala Leu Arg Ser He Gly Asn He Val Thr Gly Asp Asp 
225 230 235 240 

Leu Gin Thr Gin Cys Val He Ser His Gly Ala Leu Leu Ser Leu Leu 

245 250 255 

Ser Leu Leu Thr His Asn His Lys Lys Ser He Lys Lys Glu Ala Cys 

260 265 270 

Trp Thr He Ser Asn He Thr Ala Gly Asn Arg Asp Gin lie Gin Ala 

275 280 285 

Val Cys Glu Ala Gly Leu He Cys Pro Leu Val Asn Leu Leu Gin Asn 

290 295 300 

Ala Glu Phe Asp He Lys Lys Glu Ala Ala Trp Ala He Ser Asn Ala 
305 310 315 320 

Thr Ser Gly Gly Ser Pro Asp Gin He Lys Tyr Met Val Glu Gin Gly 

325 330 335 

Val Val Lys Pro Leu Cys Asp Leu Leu Val Cys Pro Asp Pro Arg He 

340 345 350 

He Thr Val Cys Leu Glu Gly Leu Glu Asn He Leu Lys Val Gly Glu 

355 360 365 

Ala Glu Lys Val Thr Gly Asn Thr Gly Asp Val Asn Phe Tyr Ala Gin 

370 375 380 

Leu He Asp Asp Ala Glu Gly Leu Glu Lys He Glu Asn Leu Gin Ser 
385 390 395 400 

His Asp Asn Ser Glu He Tyr Glu Lys Ala Val Lys He Leu Glu Thr 

405 410 415 

Tyr Trp Leu Glu Glu Glu Asp Glu Thr Leu Pro Pro Gly Asp Pro Ser 

420 425 430 

Ala Gin Gly Phe Gin Phe Gly Gly Gly Asn Asp Ala Ala Val Pro Pro 

435 440 445 

Gly Gly Phe Asn Phe Gin 
450 

(2) INFORMATION FOR SEQ ID NO: 139: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 479 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. . 479 
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(D) OTHER INFORMATION: / Ceres Seq. ID 158138 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 139: 
tccgatgaga gtgtcaaaaa ggttccaaag ttgtcgtata agataagttt 
gctttgaagt tcaatgaaga ggctgttctt gctttatact caggagatgt 
accgggcttc aaacatactt gttatcaagg gatcattcga atttgaagtc 
gccagagatg ggaagataac tgtaagctgc attgaaaatg tgcctgttgc 
cttggcgaac atctccattt gagtgtcggc gatgacttat tgagcaagag 
gccgtccctc cggttttgtt tgatttattg tggtaacttg ggatttattt 
tcttcttcca tatcttacct ataaactact tatttagatt gttgcaaaaa 
agtgttgaga ctctctttta ctcgttctat ttgtacaata ctaaactttt 
(2) INFORMATION FOR SEQ ID NO: 140 



ggctagaccg 
gaaatctgca 
agagttccaa 
gactgtggtt 
gaacgcatga 
cccttatttt 
gttaaatctc 
attcatctc 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 99 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 

FEATURE: 
(A) NAME /KEY: 
<B) LOCATION: 



peptide 



peptide 
1. . 99 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581383 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140: 
Ser Asp Glu Ser Val Lys Lys Val Pro Lys Leu Ser Tyr Lys 
15 10 

Leu Ala Arg Pro Ala Leu Lys Phe Asn Glu Glu Ala Val Leu 

20 25 30 

Tyr Ser Gly Asp Val Lys Ser Ala Thr Gly Leu Gin Thr Tyr 

35 40 45 

Ser Arg Asp His Ser Asn Leu Lys Ser Glu Phe Gin Ala Arg 

50 55 60 

Lys lie Thr Val Ser Cys lie Glu Asn Val Pro Val Ala Thr 
65 70 75 

Leu Gly Glu His Leu His Leu Ser Val Gly Asp Asp Leu Leu 

85 90 
Arg Asn Ala 

(2) INFORMATION FOR SEQ ID NO: 141: 



lie Ser 
15 

Ala Leu 

Leu Leu 

Asp Gly 

Val Val 

80 
Ser Lys 
95 



<i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1066 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE : 

(A) NAME /KEY: 

(B) LOCATION: 



DNA (genomic) 



1. .1066 

(D) OTHER INFORMATION: / Ceres Seq. ID 158138 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141: 
ctaagattaa tgcaccaccc cagtataaac atttgattgc aatccacgct 
aaggcggaat tactgagtgt taaaaggtag atcctgacaa ggttagacgc 
gcgaagaaca acttgaaaag gcagcagaaa aatatgcgaa GcagaNttgt 
cagcacaatt tgctgaggat atacccaaaa ggaactagag tcacatcatc 
ccgttggttg gctggagcca cggtgctcaa atggtggctt tcaatatgca 
agatcattgt ggctaatgca gggaatgttt agagccaatg gcggatgtgg 
aaacctgacc ttctgctgaa aagtggttcg gatagtgaca tatttgaccc 
ctacctgtga aaacaacact tagggtaact gtatacatgg gagaaggctg 
ttccgccaca cacactttga ccaatattca ccgcctgact tctatacaag 
gctggagttc caggggatac agtaatgaag aagacgaaaa cactggagga 
ccagcttggg atgaggtgtt tgagttccca ttaaccgttc cagagctggc 
ttagaagtgc atgagtatga catgtcagag aaagatgatt ttggaggaca 
ccggtttggg agctaagcga aggaataaga gcatttcctt tacatagccg 



gggaaaccaa 
cttagcttaa 
gaggtttact 
aaattacaac 
ggggtatgga 
ctacatcaag 
aaaagctact 
gtactttgat 
ggttgggata 
taattggata 
cctgctacga 
gacatgctta 
gaaaggagag 



60 
120 
180 
240 
300 
360 
420 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
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aagtacaaat ccgtaagctt ctcgtgaagg tggagtttgt gtgagacatg attaagaaag 
atacagcatt taaacttgtc tggtaggttt gagtgccaca gagtttgtga atttggagga 
tttgtatgtg tgagagagaa tgtatgcttg atttgttgtt tgtaatatta tgtacaaaaa 
agaccaaagt gagagagaga gacggtcttg tatcgtgtgt tgttgtatta ttatgtagta 
gtctcgtgta acacttgaat cgtatgatga tttgtggttt ttgtgt 
(2) INFORMATION FOR SEQ ID NO: 142: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 217 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . .217 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581385 
SEQUENCE DESCRIPTION: SEQ ID NO: 142: 
Arg' Ser Arg Xaa Val Arg Phe Thr Gin His Asn Leu Leu Arg He 
5 10 15 

Tyr Pro Lys Gly Thr Arg Val Thr Ser Ser Asn Tyr Asn Pro Leu Val 

20 25 30 

Gly Trp Ser His Gly Ala Gin Met Val Ala Phe Asn Met Gin Gly Tyr 



840 
900 
960 
1020 



(ii) 
(ix) 



(xi) 



Met 
1 



35 

Gly Arg Ser Leu Trp Leu 
50 

Gly Tyr He Lys 



Cys 
65 
Ser 



Asp He Phe 



Arg Val Thr Val 
100 

Thr His Phe Asp 
115 

He Ala Gly Val 
130 

Glu Asp Asn Trp 
145 

Thr Val Pro Glu 



Asp 

85 

Tyr 



Lys 
70 
Pro 

Met 



40 45 
Gin Gly Met Phe Arg Ala Asn Gly Gly 
60 

Asp Leu Leu Leu Lys Ser Gly Ser Asp 

75 80 
Ala Thr Leu Pro Val Lys Thr Thr Leu 
90 95 
Gly Glu Gly Trp Tyr Phe Asp Phe Arg His 



Met 
55 
Pro 

Lys 



105 

Gin Tyr Ser Pro Pro Asp Phe 
120 

Pro Gly Asp Thr Val Met Lys Lys 
135 140 
He Pro Ala Trp Asp Glu Val Phe 

150 155 
Leu Ala Leu Leu Arg Leu Glu Val 
165 170 



110 

Tyr Thr Arg Val 
125 



Gly 



Thr Lys Thr Leu 



Glu Phe Pro 



His Glu 



Tyr 
175 



Leu 
160 
Asp 



Met 



Ser Glu Lys Asp Asp Phe Gly Gly Gin Thr Cys Leu Pro Val Trp 



190 



Glu 



Glu 



Pro Leu His Ser Arg Lys Gly 
205 



(2 



180 185 
Leu Ser Glu Gly He Arg Ala Phe 

195 200 
Lys Tyr Lys Ser Val Ser Phe Ser 
210 215 
INFORMATION FOR SEQ ID NO: 14 3: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: 
FEATURE: 

(A) NAME /KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: / Ceres Seq. 
SEQUENCE DESCRIPTION: SEQ ID NO: 143: 
Val' Ala Phe Asn Met Gin Gly Tyr Gly Arg Ser Leu Trp Leu Met 
5 10 15 

Gin Gly Met Phe Arg Ala Asn Gly Gly Cys Gly Tyr He Lys Lys Pro 
20 25 30 



(ii) 
(ix) 



(xi) 



peptide 

peptide 
1. .178 



ID 1581386 



Met 
1 
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Asp 


Leu 


Leu 


Leu 


Lys 


Ser 


Gly 


Ser 


Asp 


Ser 


Asp 


He 


Phe 


Asp 


Pro 


Lys 




35 










40 










45 








Ala 


Thr 
50 


Leu 


Pro 


Val 


Lys 


Thr 
55 


Thr 


Leu 


Arg 


Val 


Thr 
60 


Val 


Tyr 


Met 


Gly 


Glu 


Gly 


Trp 


Tyr 


Phe 


Asp 


Phe 


Arg 


His 


Thr 


His 


Phe 


Asp 


Gin 


Tyr 


Ser 


65 






7 0* 










75 










80 


Pro 


Pro 


Asp 


Phe 


Tyr 


Thr 


Arg 


Val 


Gly 


He 


Ala 


Gly 


Val 


Pro 


Gly 


Asp 








85 










90 










95 




Thr 


Val 


Met 


Lys 


Lys 


Thr 


Lys 


Thr 


Leu 


Glu 


Asp 


Asn 


Trp 


He 


Pro 


Ala 








100 








105 










110 






Trp 


Asp 


Glu 


Val 


Phe 


Glu 


Phe 


Pro 


Leu 


Thr 


Val 


Pro 


Glu 


Leu 


Ala 


Leu 


115 










120 










125 








Leu 


Arg 
130 


Leu 


Glu 


Val 


His 


Glu 
135 


Tyr 


Asp 


Met 


Ser 


Glu 
140 


Lys 


Asp 


Asp 


Phe 


Gly 


Gly 


Gin 


Thr 


Cys 


Leu 


Pro 


Val 


Trp 


Glu 


Leu 


Ser 


Glu 


Gly 


lie 


Arg 


145 








150 










155 










160 


Ala 


Phe 


Pro 


Leu 


His 
165 


Ser 


Arg 


Lys 


Gly 


Glu 
170 


Lys 


Tyr 


Lys 


Ser 


Val 
175 


Ser 



Phe Ser 



(2) INFORMATION FOR SEQ ID NO: 144: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 173 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..173 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581387 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 


:144 












Met 


Gin 


Gly 


Tyr 


Gly 


Arg 


Ser 


Leu 


Trp 


Leu 


Met 


Gin 


Gly 


Met 


Phe 


Arg 


1 




5 










10 










15 




Ala 


Asn 


Gly 


Gly 


Cys 


Gly 


Tyr 


He 


Lys 


Lys 


Pro 


Asp 


Leu 


Leu 


Leu 


Lys 






20 










25 










30 






Ser 


Gly 


Ser 


Asp 


Ser 


Asp 


He 


Phe 


Asp 


Pro 


Lys 


Ala 


Thr 


Leu 


Pro 


Val 




35 










40 










45 








Lys 


Thr 


Thr 


Leu 


Arg 


Val 


Thr 


Val 


Tyr 


Met 


Gly 


Glu 


Gly 


Trp 


Tyr 


Phe 


50 










55 










60 










Asp 


Phe 


Arg 


His 


Thr 


His 


Phe 


Asp 


Gin 


Tyr 


Ser 


Pro 


Pro 


Asp 


Phe 


Tyr 


65 








70 










75 










80 


Thr 


Arg 


Val 


Gly 


He 


Ala 


Gly 


Val 


Pro 


Gly Asp 


Thr 


Val 


Met 


Lys 


lys 






85 










90 










95 




Thr 


Lys 


Thr 


Leu 


Glu 


Asp 


Asn 


Trp 


He 


Pro 


Ala 


Trp 


Asp 


Glu 


Val 


Phe 






100 










105 










110 






Glu 


Phe 


Pro 
115 


Leu 


Thr 


Val 


Pro 


Glu 
120 


Leu 


Ala 


Leu 


Leu 


Arg 
125 


Leu 


Glu 


Val 


His 


Glu 


Tyr 


Asp 


Met 


Ser 


Glu 


Lys 


Asp 


Asp 


Phe 


Gly 


Gly 


Gin 


Thr 


Cys 




130 








135 










140 










Leu 


Pro 


Val 


Trp 


Glu 


Leu 


Ser 


Glu 


Gly 


He 


Arg 


Ala 


Phe 


Pro 


Leu 


His 


145 








150 










155 










160 


Ser 


Arg 


Lys 


Gly 


Glu 
165 


Lys 


Tyr 


Lys 


Ser 


Val 
170 


Ser 


Phe 


Ser 








(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO: 1 


45: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1224 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Ii) MOLECULE TYPE: DNA (genomic) 
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85 



90 95 



Leu Val Met Pro Ser Thr Met Ser Leu Glu Arg Arg lie lie Leu Arg 

100 105 HO 

Ala Leu Gly Ala Glu Leu His Leu Ser Asp Gin Arg He Gly Leu Lys 

115 120 125 

Gly Met Leu Glu Lys Thr Glu Ala He Leu Ser Lys Thr Pro Gly Gly 



130 



135 140 



Tyr He Pro Gin Gin Phe Glu Asn Pro Ala Asn Pro Glu He His Tyr 

145 150 155 160 

Arg Thr Thr Gly Pro Glu He Trp Arg Asp Ser Ala Gly Lys Val Asp 

165 170 175 

He Leu Val Ala Gly Val Gly Thr Gly Gly Thr Ala Thr Gly Val Gly 



60 



(ix) FEATURE; 

(A) NAME /KEY : - 

(B) LOCATION : 1..1224 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581438 
{xi} SEQUENCE DESCRIPTION: SEQ ID NO: 145: 
actaataatt cgtttaaacg caaagtttca gaaaacagac caacatggaa gatcggtgct 
tgatcaagaa cgatgtcact gaattgattg gtaacacacc aatggtgtat ctgaacaatg 120 

■ . _ _ j J ~ -I- -~r4- 4- /-.+- -3 180 

240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 



L.yU UVW^y^w u^wuv, v — ~ — , — _! ~ -J -J - - 

ttgttgatgg ttgcgtggct cgtatcgctg caaagcttga gatgatggag ccttgttcta 
gcgtcaaaga cagaatcgcg tatagtatga tcaaagatgc agaagaaaaa ggattgatta 
ctcccggaaa gagtacattg atagagccaa ctgctggtaa caccgggatt ggtttagctt 
gcatgggagc tgcaagaggc tataaagtga tccttgtgat gccttcaact atgagcttag 
agagaagaat cattctgagg gcactaggtg cagagcttca tctctcggac cagcgcatag 
gccttaaagg aatgttggag aaaactgaag cgattttaag caaaactcct ggtggttaca 
ttccacaaca atttgaaaat cctgcaaacc ccgagattca ttaccgaacc acgggaccgg 
aaatatggag agattcagcc gggaaagtag atatattggt cgctggcgta gggactggtg 
gaactgctac tggagtaggg aagttcctca aggagcagaa caaagacatc aaggtttgtg 
tggtggaacc agtagaaagt ccggtactta gcggaggtca accaggtcca catttgattc 
agggaattgg ctctggtatc gtcccattca atttggactt aaccattgtt gatgaaatta 
ttcaagtggc aggtgaagag gctattgaaa cagccaagct tcttgccctc aaagaaggat 
tactggtggg aatatcctct ggagccgcag cagcggctgc gttaaaggtt gcaaagcggc 
cagaaaacgc ggggaaactc attgtggtgg tttttcctag tggaggagaa cgttatttat 
cgactaaact gttcgattcg attagatatg aagcagagaa tttgcctatt gaatgaatgt 
tggacgtgtg ccggtttcgt gtgtaataag agacgttcgt tccaagagtc caagatttca 
atttgcttat tactggtcaa tgtgaaatgt ataaaaagtt gcatattcga ttatgaataa 1140 
ttttagttga ctacttccat tgcaattctt ggtcgaaaac atgaatttta ataaaacaga 1200 
ataatctaca tgaacgtttc attg 
(2) INFORMATION FOR SEQ ID NO: 146: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 323 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

{ D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..323 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581439 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 6: 

Met Glu Asp Arg Cys Leu He Lys Asn Asp Val Thr Glu Leu He Gly 

15 10 15 

Asn Thr Pro Met Val Tyr Leu Asn Asn Val Val Asp Gly Cys Val Ala 

20 25 30 

Arg He Ala Ala Lys Leu Glu Met Met Glu Pro Cys Ser Ser Val Lys 

35 40 45 

Asp Arg He Ala Tyr Ser Met He Lys Asp Ala Glu Glu Lys Gly Leu 

50 55 60 

He Thr Pro Gly Lys Ser Thr Leu He Glu Pro Thr Ala Gly Asn Thr 
65 70 75 80 

Gly He Gly Leu Ala Cys Met Gly Ala Ala Arg Gly Tyr Lys Val He 
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180 

Lys Phe Leu Lys 
195 

Pro Val Glu Ser 
210 

lie Gin Gly lie 
225 

lie Val Asp Glu 

Ala Lys Leu Leu 
260 

Gly Ala Ala Ala 
275 

Ala Gly Lys Leu 
290 

Leu Ser Thr Lys 
305 

Pro lie Glu 



Glu Gin Asn Lys 
200 

Pro Val Leu Ser 
215 

Gly Ser Gly lie 
230 

lie lie Gin Val 
245 

Ala Leu Lys Glu 

Ala Ala Ala Leu 
280 

lie Val Val Val 
295 

Leu Phe Asp Ser 

310 



185 

Asp lie Lys Val 

Gly Gly Gin Pro 
220 

Val Pro Phe Asn 
235 

Ala Gly Glu Glu 
250 

Gly Leu Leu Val 
265 

Lys Val Ala Lys 

Phe Pro Ser Gly 
300 

lie Arg Tyr Glu 
315 



190 

Cys Val Val Glu 
205 

Gly Pro His Leu 

Leu Asp Leu Thr 
240 

Ala He Glu Thr 
255 

Gly He Ser Ser 
270 

Arg Pro Glu Asn 
285 

Gly Glu Arg Tyr 

Ala Glu Asn Leu 
320 



{2) INFORMATION FOR SEQ ID NO: 147: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 304 amino acids 

(B) TYPE: amino acid 
<C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
<ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..304 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581440 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 147: 



Met 


Val 


Tyr 


Leu 


Asn 


Asn 


Val 


Val 


Asp 


Gly 


Cys 


Val 


Ala 


Arg 


He 


Ala 


1 








5 










10 










15 




Ala 


Lys 


Leu 


Glu 


Met 


Met 


Glu 


Pro 


Cys 


Ser 


Ser 


Val 


Lys 


Asp 


Arg 


He 








20 










25 










30 






Ala 


Tyr 


Ser 


Met 


He 


Lys 


Asp 


Ala 


Glu 


Glu 


Lys 


Gly 


Leu 


He 


Thr 


Pro 






35 










40 










45 








Gly 


Lys 


Ser 


Thr 


Leu 


He 


Glu 


Pro 


Thr 


Ala 


Gly 


Asn 


Thr 


Gly 


He 


Gly 




50 










55 










60 










Leu 


Ala 


Cys 


Met 


Gly 


Ala 


Ala 


Arg 


Gly 


Tyr 


Lys 


Val 


lie 


Leu 


Val 


Met 


65 










70 










75 










80 


Pro 


Ser 


Thr 


Met 


Ser 


Leu 


Glu 


Arg 


Arg 


He 


He 


Leu 


Arg 


Ala 


Leu 


Gly 










85 










90 










95 




Ala 


Glu 


Leu 


His 


Leu 


Ser 


Asp 


Gin 


Arg 


He 


Gly 


Leu 


Lys 


Gly 


Met 


Leu 








100 










105 










110 






Glu 


Lys 


Thr 


Glu 


Ala 


He 


Leu 


Ser 


Lys 


Thr 


Pro 


Gly 


Gly 


Tyr 


He 


Pro 






115 










120 










125 








Gin 


Gin 


Phe 


Glu 


Asn 


Pro 


Ala 


Asn 


Pro 


Glu 


He 


His 


Tyr 


Arg 


Thr 


Thr 




130 










135 










140 










Gly 


Pro 


Glu 


He 


Trp 


Arg 


Asp 


Ser 


Ala 


Gly 


Lys 


Val 


Asp 


He 


Leu 


Val 


145 










150 










155 










160 


Ala 


Gly 


Val 


Gly 


Thr 


Gly 


Gly 


Thr 


Ala 


Thr 


Gly 


Val 


Gly 


Lys 


Phe 


Leu 










165 










170 










175 




Lys 


Glu 


Gin 


Asn 


Lys 


Asp 


He 


Lys 


Val 


Cys 


Val 


Val 


Glu 


Pro 


Val 


Glu 








180 










185 










190 






Ser 


Pro 


Val 


Leu 


Ser 


Gly 


Gly 


Gin 


Pro 


Gly 


Pro 


His 


Leu 


He 


Gin 


Gly 






195 










200 










205 








He 


Gly 


Ser 


Gly 


He 


Val 


Pro 


Phe 


Asn 


Leu 


Asp 


Leu 


Thr 


He 


Val 


Asp 




210 










215 










220 










Glu 


He 


He 


Gin 


Val 


Ala 


Gly 


Glu 


Glu 


Ala 


He 


Glu 


Thr 


Ala 


Lys 


Leu 


225 










230 










235 










240 
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Leu Ala Leu Lys Glu Gly Leu Leu Val Gly lie Ser Ser Gly Ala Ala 

245 250 255 

Ala Ala Ala Ala Leu Lys Val Ala Lys Arg Pro Glu Asn Ala Gly Lys 

260 265 270 

Leu lie Val Val Val Phe Pro Ser Gly Gly Glu Arg Tyr Leu Ser Thr 

275 280 285 

Lys Leu Phe Asp Ser lie Arg Tyr Glu Ala Glu Asn Leu Pro lie Glu 
290 295 300 



(2) INFORMATION FOR SEQ ID NO: 148: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 284 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..284 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581441 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 148: 



Met 


Met 


Glu 


Pro 


Cys 


Ser 


Ser 


Val 


Lys 


Asp 


Arg 


He 


Ala 


Tyr 


Ser 


Met 


1 








5 










10 










15 




He 


Lys 


Asp 


Ala 


Glu 


Glu 


Lys 


Gly 


Leu 


He 


Thr 


Pro 


biy 


Lys 


Ser 


Thr 








20 










25 










30 






Leu 


He 


GlU 


Pro 


Thr 


Ala 


Oly 


Asn 


Thr 


Gly 




pi „ 
biy 


Leu 


/ira 


uys 


jxier. 






35 










4 0 










4 o 








Gly Ala 


-f-ij_a 


Arg 


j_y 


Tyr 


Lys 


Vdl 


He 


Leu 


V ci -L 




rl O 




X LlJL 






50 










55 










60 










Ser 


Leu 


Glu 


Arg 


Arg 


He 


He 


Leu 


Arg Ala 


Leu 


Gly 


Ala 


Glu 


Leu 


His 


65 










70 










75 










80 


Leu 


Ser 


Asp 


Gin 


Arg 


He 


Gly 


Leu 


Lys 


Gly 


Met 


Leu 


Glu 


Lys 


Thr 


Glu 










85 










90 










95 




Ala 


He 


Leu 


Ser 


Lys 


Thr 


Pro 


Gly 


Gly 


Tyr 


He 


Pro 


Gin 


Gin 


Phe 


Glu 








100 










105 










110 






Asn 


Pro 


Ala 


Asn 


Pro 


Glu 


He 


His 


Tyr 


Arg 


Thr 


Thr 


Gly 


Pro 


Glu 


He 






115 










120 










125 








Trp 


Arg 


Asp 


Ser 


Ala 


Gly 


Lys 


Val 


Asp 


He 


Leu 


Val 


Ala 


Gly 


Val 


Gly 




130 










135 










140 










Thr 


Gly 


Gly 


Thr 


Ala 


Thr 


Gly 


Val 


Gly 


Lys 


Phe 


Leu 


Lys 


Glu 


Gin 


Asn 


145 










150 










155 










160 


Lys 


Asp 


He 


Lys 


Val 


Cys 


Val 


Val 


Glu 


Pro 


Val 


Glu 


Ser 


Pro 


Val 


Leu 










165 










170 










175 




Ser 


Gly 


Gly 


Gin 


Pro 


Gly 


Pro 


His 


Leu 


He 


Gin 


Gly 


He 


Gly 


Ser 


Gly 








180 










185 










190 






He 


Val 


Pro 


Phe 


Asn 


Leu 


Asp 


Leu 


Thr 


He 


Val 


Asp 


Glu 


He 


He 


Gin 






195 










200 










205 








Val 


Ala 


Gly 


Glu 


Glu 


Ala 


He 


Glu 


Thr 


Ala 


Lys 


Leu 


Leu 


Ala 


Leu 


Lys 




210 










215 










220 










Glu 


Gly 


Leu 


Leu 


Val 


Gly 


He 


Ser 


Ser 


Gly 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


225 










230 










235 










240 


Leu 


Lys 


Val 


Ala 


Lys 


Arg 


Pro 


Glu 


Asn 


Ala 


Gly 


i«ys 


Leu 


He 


Val 


Val 










245 










250 










255 




Val 


Phe 


Pro 


Ser 


Gly 


Gly 


Glu 


Arg 


Tyr 


Leu 


Ser 


Thr 


Lys 


Leu 


Phe 


Asp 








260 










265 










270 






Ser 


He 


Arg 


Tyr 


Glu 


Ala 


Glu 


Asn 


Leu 


Pro 


He 


Glu 











275 280 



(2) INFORMATION FOR SEQ ID NO: 149: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 648 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..648 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581454 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 149: 
aaaaaacaaa aagcgtcact aggaagctgc gactatttga ctgactctaa tttcacgcaa 
ctccttgtga gattctccga ttgcttcgcc tgagtttctg ggttaactct ccacgccgtc 
actttgatct ttgcttttcc gatctttagg gtttttgttt agggatttat cgaagactct 
gctctatcat gtctggaaga aaagaaacgg ttttagattt ggccaagttt gtagataagg 
gtgtgcaagt taagctcact ggtggtagac aagtgactgg aactcttaaa ggctatgacc 
aattgcttaa tcttgttctt gatgaagcag tcgagtttgt tcgagatcat gatgatcctt 
tgaagactac ggatcagaca agacgccttg gtttgattgt ttgccgtgga acagcggtga 
tgcttgtctc accaaccgat ggcaccgaag aaatcgctaa cccgttcgtt acagcagagg 
ctgtctaaaa gactttcttc tcaaacaata tgtctctcta ttagtttaac ttggcgattt 
agagagtatt ttatctaact ctctggtgtg atgttggaaa catatatgtt caatttaaac 
tatttggaac atcatggaca cttcttcttg ttacctaact tcgttccc 
(2) INFORMATION FOR SEQ ID NO: 150: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 99 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME / KEY : peptide 

(B) LOCATION: 1..99 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581455 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 150: 
Met Ser Gly Arg Lys Glu Thr Val Leu Asp Leu Ala Lys Phe Val Asp 
15 10 15 

Lys Gly Val Gin Val Lys Leu Thr Gly Gly Arg Gin Val Thr Gly Thr 

20 25 30 

Leu Lys Gly Tyr Asp Gin Leu Leu Asn Leu Val Leu Asp Glu Ala Val 

35 40 45 

Glu Phe Val Arg Asp His Asp Asp Pro Leu Lys Thr Thr Asp Gin Thr 

50 55 60 

Arq Arq Leu Gly Leu He Val Cys Arg Gly Thr Ala Val Met Leu Val 
65 70 75 80 

Ser Pro Thr Asp Gly Thr Glu Glu He Ala Asn Pro Phe Val Thr Ala 
85 90 95 

Glu Ala Val 

(2) INFORMATION FOR SEQ ID NO: 151: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1016 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..1016 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581498 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 151: 
ccatattatt cacgattctc atcaaatcat ctccgatact cacaaccgaa ataactaacc 
cctcctcaac aaaaaacaac aaaacatgta cactccatca tacttaaaat attcaataat 
ctcaattata tccgtattat tcctccaagg aactcatgga gacgacggag gttggcaagg 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 



60 
120 
180 
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tggtcacgcc acgttttacg gcggcgaaga tgcttccggc accatgggcg gagcttgtgg 24 0 

ctatggaaat ttgtatggcc aaggttacgg gacgaacacg gcggctttaa gtacggctct 300 

attcaacaac ggactcacgt gtggcgcgtg ctatgagatg aagtgtaacg atgacccgag 360 

gtggtgtctc gggtcaacca tcaccgtcac agctacaaac ttttgcccac ctaaccctgg 420 

cctctccaac gataatggag gttggtgcaa tcctcctctt cagcatttcg acctcgccga 480 

gccagctttt cttcagatcg ctcagtatcg tgccggcatt gttcctgtct ctttccgaag 54 0 

agtaccatgt atgaagaaag gaggaataag gtttacgatc aacggacact catacttcaa 600 

cctcgttctg atctccaacg taggaggagc aggagacgta cacgccgtct caatcaaagg 660 

ctcaaaaaca cagtcgtggc aagcgatgtc tagaaactgg ggacaaaact ggcagagcaa 720 

ttcatacatg aacgaccaaa gcctttcctt ccaggtaacg accagcgatg gtcgcacact 780 

cgttagcaac gacgtggctc cttctaattg gcagttcgga caaacctacc aaggtggtca 84 0 

gttctgatcc aaaccatcat ccacatctct ctgttttggg tgctgacgtg gctgcatatt 900 

gctgaggtgg ctcgtaagca cccgcttaat tagcttagcc tttttttctc ttatttacga 960 



attattgctt caatggttgt attttcattg tgcctacaaa aaagcaaggt tttttt 
(2) INFORMATION FOR SEQ ID NO: 152: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 253 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 253 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581499 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 152: 



Met 


Tyr 


Thr 


Pro 


Ser 


Tyr 


Leu 


Lys 


Tyr 


Ser 


He 


He 


Ser 


lie 


He 


Ser 


1 








5 










10 










15 




Val 


Leu 


Phe 


Leu 


Gin 


Gly 


Thr 


His 


Gly 


Asp 


Asp 


Gly 


Gly 


Trp 


Gin 


Gly 








20 










25 










30 






Gly 


His 


Ala 


Thr 


Phe 


Tyr 


Gly 


Gly 


Glu 


Asp 


Ala 


Ser 


Gly 


Thr 


Met 


Gly 






35 










40 










45 








Gly 


Ala 


Cys 


Gly 


Tyr 


Gly 


Asn 


Leu 


Tyr 


Gly 


Gin 


Gly 


Tyr 


Gly 


Thr 


Asn 




50 










55 










60 










Thr 


Ala 


Ala 


Leu 


Ser 


Thr 


Ala 


Leu 


Phe 


Asn 


Asn 


Gly 


Leu 


Thr 


Cys 


Gly 


65 










70 










75 










80 


Ala 


Cys 


Tyr 


Glu 


Met 


Lys 


Cys 


Asn 


Asp 


Asp 


Pro 


Arg 


Trp 


Cys 


Leu 


Gly 










85 










90 










95 




Ser 


Thr 


He 


Thr 


Val 


Thr 


Ala 


Thr 


Asn 


Phe 


Cys 


Pro 


Pro 


Asn 


Pro 


Gly 








100 










105 










110 






Leu 


Ser 


Asn 


Asp 


Asn 


Gly 


Gly 


Trp 


Cys 


Asn 


Pro 


Pro 


Leu 


Gin 


His 


Phe 






115 










120 










125 








Asp 


Leu 


Ala 


Glu 


Pro 


Ala 


Phe 


Leu 


Gin 


He 


Ala 


Gin 


Tyr 


Arg 


Ala 


Gly 




130 










135 










140 










lie 


Val 


Pro 


Val 


Ser 


Phe 


Arg 


Arg 


Val 


Pro 


Cys 


Met 


Lys 


Lys 


Gly 


Gly 


145 










150 










155 










160 


He 


Arg 


Phe 


Thr 


He 


Asn 


Gly 


His 


Ser 


Tyr 


Phe 


Asn 


Leu 


Val 


Leu 


He 










165 










170 










175 




Ser 


Asn 


Val 


Gly 


Gly 


Ala 


Gly 


Asp 


Val 


His 


Ala 


Val 


Ser 


He 


Lys 


Gly 








180 










185 










190 






Ser 


Lys 


Thr 


Gin 


Ser 


Trp 


Gin 


Ala 


Met 


Ser 


Arg 


Asn 


Trp 


Gly 


Gin 


Asn 






195 










200 










205 








Trp 


Gin 


Ser 


Asn 


Ser 


Tyr 


Met 


Asn 


Asp 


Gin 


Ser 


Leu 


Ser 


Phe 


Gin 


Val 




210 










215 










220 










Thr 


Thr 


Ser 


Asp 


Gly 


Arg 


Thr 


Leu 


Val 


Ser 


Asn 


Asp 


Val 


Ala 


Pro 


Ser 


225 










230 










235 










240 


Asn 


Trp 


Gin 


Phe 


Gly 


Gin 


Thr 


Tyr 


Gin 


Gly 


Gly 


Gin 


Phe 









245 250 
(2) INFORMATION FOR SEQ ID NO: 153: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 207 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..207 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581500 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 153: 



lyier. 


pi 
biy 


biy 


Ala 


Cys 


pi , T 


Tyr 


biy 


Asn 


Leu 


Tyr 


biy 


bin 


biy 


iyr 


p 1 ! ,T 

biy 


i 








c 

o 
























Thr 


Asn 


Thr 


riJLa 


A±a 


Leu 


Ser 


Tnr 




Leu 


jrne 


Asn 


Asn 


biy 


Leu 


i nr 




























o u 






Cys 


biy 


/iia 


Cys 


Tyr 


biU 


T\/Tr-\ 4- 

jyiec 


Lys 


bys 


Asn 


Asp 


Asp 


Pro 


Arg 


Trp 


Cys 
















/I C\ 
4 U 










/I R 
Q 3 








Leu 


biy 


oer 


i nr 


1 1€ 


i nr 


vai 


Thr 


/■ii- a 


Thr 


Asn 


irne 


Cys 


Pro 


Pro 


Asn 




D V 










o o 










o u 










Pro 


biy 


Leu 


Ser 


Asn 


Asp 


Asn 


biy 


biy 


Trp 


Cys 


Asn 


Pro 


Pro 


Leu 


bin 












7 n 










75 












HIS 


Fne 


Asp 


Leu 


Ala 


blU 


Pro 


Ala 


irne 


Leu 


pi n 

bin 


i j_e 


Hid 


bin 


rn 

Tyr 


Arg 










85 










90 










95 




Ala 


Gly 


He 


Val 


Pro 


Val 


Ser 


Phe 


Arg 


Arg 


Val 


Pro 


Cys 


Met 


Lys 


Lys 








100 










105 










110 






Gly 


Gly 


He 


Arg 


Phe 


Thr 


He 


Asn 


Gly 


His 


Ser 


Tyr 


Phe 


Asn 


Leu 


Val 






115 










120 










125 








Leu 


He 


Ser 


Asn 


Val 


Gly 


Gly 


Ala 


Gly 


Asp 


Val 


His 


Ala 


Val 


Ser 


He 




130 










135 










140 










Lys 


Gly 


Ser 


Lys 


Thr 


Gin 


Ser 


Trp 


Gin 


Ala 


Met 


Ser 


Arg 


Asn 


Trp 


Gly 


145 










150 










155 










160 


Gin 


Asn 


Trp 


Gin 


Ser 


Asn 


Ser 


Tyr 


Met 


Asn 


Asp 


Gin 


Ser 


Leu 


Ser 


Phe 










165 










170 










175 




Gin 


Val 


Thr 


Thr 


Ser 


Asp 


Gly 


Arg 


Thr 


Leu 


Val 


Ser 


Asn 


Asp 


Val 


Ala 








180 










185 










190 






Pro 


Ser 


Asn 


Trp 


Gin 


Phe 


Gly 


Gin 


Thr 


Tyr 


Gin 


Gly 


Gly 


Gin 


Phe 








195 










200 










205 









(2) INFORMATION FOR SEQ ID NO: 154: 
(i) SEQUENCE CHARACTERISTICS: 

{A} LENGTH: 169 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: H.169 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581501 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154: 



Met 


Lys 


Cys 


Asn 


Asp 


Asp 


Pro 


Arg 


Trp 


Cys 


Leu 


Gly 


Ser 


Thr 


He 


Thr 


1 








5 










10 










15 




Val 


Thr 


Ala 


Thr 


Asn 


Phe 


Cys 


Pro 


Pro 


Asn 


Pro 


Gly 


Leu 


Ser 


Asn 


Asp 








20 










25 










30 






Asn 


Gly 


Gly 


Trp 


Cys 


Asn 


Pro 


Pro 


Leu 


Gin 


His 


Phe 


Asp 


Leu 


Ala 


Glu 






35 










40 










45 








Pro 


Ala 


Phe 


Leu 


Gin 


He 


Ala 


Gin 


Tyr 


Arg 


Ala 


Gly 


He 


Val 


Pro 


Val 




50 










55 










60 










Ser 


Phe 


Arg 


Arg 


Val 


Pro 


Cys 


Met 


Lys 


Lys 


Gly 


Gly 


He 


Arg 


Phe 


Thr 


65 










70 










75 










80 


He 


Asn 


Gly 


His 


Ser 


Tyr 


Phe 


Asn 


Leu 


Val 


Leu 


He 


Ser 


Asn 


Val 


Gly 










85 










90 










95 




Gly 


Ala 


Gly 


Asp 


Val 


His 


Ala 


Val 


Ser 


He 


Lys 


Gly 


Ser 


Lys 


Thr 


Gin 



100 105 110 
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(2) 



Ser Trp Gin Ala Met Ser Arg Asn Trp Gly Gin Asn Trp Gin Ser Asn 

115 120 125 

Ser Tyr Met Asn Asp Gin Ser Leu Ser Phe Gin Val Thr Thr Ser Asp 

130 135 140 

Gly Arg Thr Leu Val Ser Asn Asp Val Ala Pro Ser Asn Trp Gin Phe 
145 150 155 160 

Gly Gin Thr Tyr Gin Gly Gly Gin Phe 
165 

INFORMATION FOR SEQ ID NO: 155; 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 491 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE : 

(A) NAME /KEY : - 

(B) LOCATION: 1..4 91 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581567 
SEQUENCE DESCRIPTION: SEQ ID NO: 155: 
ctctgcgtgc cctagttatt cagatcgccg aatttgcagt aaggagacga aaatatggtg 
cacgtttgct actaccgcaa ctatggaaag accttcaagg gaccacgtcg tccttacgag 
aaggagcgtc ttgattctga attgaagctg gttggtgagt atggtctgcg taacaagcgt 
gagctctgga gagtgcagta ctctcttagc cgtatccgta atgctgctag agatcttttg 
actcttgatg agaagagtcc aagaaggatc tttgaaggtg aggctttgct ccgtagGatg 
aaccgttacg ggcttcttga tgagagccag aacaagctcg attacgtctt ggctttgact 
gttgagaact ttcttgagcg tcgtcttcag actattgtgt tcaagtctgg tatggctaag 
tctatccatc actctcgtgt cctcatcagg cagaggcata tcagggttgg aaagcaattg 
gtgaacattc c 

(2) INFORMATION FOR SEQ ID NO: 15 6: 



(ii) 
(ix) 



(xi) 



(i) 



(ii) 
(ix) 



(xi) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 145 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..145 

(D) OTHER INFORMATION: / Ceres Seq. 
SEQUENCE DESCRIPTION: SEQ ID NO: 156: 



ID 1581568 



Met 


Val 


His 


Val 


Cys 


Tyr 


Tyr 


Arg 


Asn 


Tyr 


Gly 


Lys 


Thr 


Phe 


Lys 


Gly 


1 








5 










10 










15 




Pro 


Arg Arg 


Pro 


Tyr 


Glu 


Lys 


Glu 


Arg 


Leu 


Asp 


Ser 


Glu 


Leu 


Lys 


Leu 








20 










25 










30 






Val 


Gly 


Glu 
35 


Tyr 


Gly 


Leu 


Arg 


Asn 
40 


Lys 


Arg 


Glu 


Leu 


Trp 
45 


Arg 


Val 


Gin 


Tyr 


Ser 
50 


Leu 


Ser 


Arg 


He 


Arg 
55 


Asn 


Ala 


Ala 


Arg 


Asp 
60 


Leu 


Leu 


Thr 


Leu 


Asp 


Glu 


Lys 


Ser 


Pro 


Arg 


Arg 


He 


Phe 


Glu 


Gly 


Glu 


Ala 


Leu 


Leu 


Arg 


65 










70 










75 










80 


Arg 


Met 


Asn 


Arg 


Tyr 
85 


Gly 


Leu 


Leu 


Asp 


Glu 
90 


Ser 


Gin 


Asn 


Lys 


Leu 

95 


Asp 


Tyr 


Val 


Leu 


Ala 
100 


Leu 


Thr 


Val 


Glu 


Asn 
105 


Phe 


Leu 


Glu 


Arg 


Arg 
110 


Leu 


Gin 


Thr 


He 


Val 
115 


Phe 


Lys 


Ser 


Gly 


Met 
120 


Ala 


Lys 


Ser 


He 


His 
125 


His 


Ser 


Arg 


Val 


Leu 
130 


He 


Arg 


Gin 


Arg 


His 
135 


He 


Arg 


Val 


Gly 


Lys 
140 


Gin 


Leu 


Val 


Asn 



60 
120 
180 
240 
300 
360 
420 
480 



He 
145 
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(2) INFORMATION FOR SEQ ID NO: 157: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 658 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..658 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581585 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 157: 



acttagggtt catagcagcc agagagagag agacaagtga gagggatcta ccaaacgaag 60 

caacaatggt gaagttcttg aagcagaaca aggccgtgat ccttcttcaa ggacgttacg 120 

ccggaaagaa agccgtcatc atcaaatcct tcgacgacgg taaccgtgat cgtccttacg 180 

gacactgcct cgtcgccgga ctgaagaagt acccgagcaa agtcatccgc aaagactcag 240 

ctaagaagac agctaagaaa tctagggtta agtgtttcat caagcttgtt aattaccagc 300 

atctgatgcc tactcgttac acactcgacg tggatctcaa ggaagtggcg actcttgatg 360 

ctcttcagag taaggataag aaggttgctg ctcttaagga agctaaggct aagcttgagg 420 

agaggttcaa gactggtaag aacagatggt tctttaccaa gctcaggttc tgaagaaatt 480 

ttctatttcg tgaggaattc attttgagct ttttgttatc gtgtttttta gtttctaggg 540 

ttcatttccc atggtgaaaa atgtggatcc tgttttggat tgtggaagat gttttgttga 600 



agtttggatt atggatttga tcttttattt tgattctcta atgcattctt atttaccc 
(2) INFORMATION FOR SEQ ID NO: 158: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..156 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581586 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 158: 



Leu 


Gly 


Phe 


He 


Ala 


Ala 


Arg 


Glu 


Arg 


Glu 


Thr 


Ser 


Glu 


Arg 


Asp 


Leu 


1 








5 










10 










15 




Pro 


Asn 


Glu 


Ala 


Thr 


Met 


Val 


Lys 


Phe 


Leu 


Lys 


Gin 


Asn 


Lys 


Ala 


Val 








20 










25 










30 






lie 


Leu 


Leu 


Gin 


Gly 


Arg 


Tyr 


Ala 


Gly 


Lys 


Lys 


Ala 


Val 


He 


He 


Lys 






35 










40 










45 








Ser 


Phe 


Asp 


Asp 


Gly 


Asn 


Arg 


Asp 


Arg 


Pro 


Tyr 


Gly 


His 


Cys 


Leu 


Val 




50 










55 










60 










Ala 


Gly 


Leu 


Lys 


Lys 


Tyr 


Pro 


Ser 


Lys 


Val 


He 


Arg 


Lys 


Asp 


Ser 


Ala 


65 










70 










75 










80 


Lys 


Lys 


Thr 


Ala 


L Y S 




Ser 


Arg 


Val 


Lys 


Cys 


Phe 


He 


Lys 


Leu 


Val 










85 










90 










95 




Asn 


Tyr 


Gin 


His 


Leu 


Met 


Pro 


Thr 


Arg 


Tyr 


Thr 


Leu 


Asp 


Val 


Asp 


Leu 








100 










105 










110 






Lys 


Glu 


Val 


Ala 


Thr 


Leu 


Asp 


Ala 


Leu 


Gin 


Ser 


Lys 


Asp 


Lys 


Lys 


Val 






115 










120 










125 








Ala 


Ala 


Leu 


Lys 


Glu 


Ala 


Lys 


Ala 


Lys 


Leu 


Glu 


Glu 


Arg 


Phe 


Lys 


Thr 




130 










135 










140 










Gly 


Lys 


Asn 


Arg 


Trp 


Phe 


Phe 


Thr 


Lys 


Leu 


Arg 


Phe 











145 150 155 

(2) INFORMATION FOR SEQ ID NO: 159: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 135 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



Table 2 
Page 8 7 
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(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..135 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581587 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 159: 



LYieu 


v ax 


Lys 


P"h d 

IT I IS 


Leu 


Lys 


fin 


Asn 


Lys 


fixa 


Val 


T 1 o 

lie 


Leu 


Leu 


ulll 


o-i_y 


1 
1 








3 










1 U 










±o 




i-xJL Uj 


i yr 


ri_Lci 


LjX y 


Lys 


Lys 


a 


v a x 


Tl 0 
± _L t; 


Tic 


Lys 


Dei 


Jrf ltr 


flop 












Z U 










2 0 
















Zi n 




Zi en 


rl-L y 


XT X 










LSU 


Val 


Ala 


Gly 




i_iy s 


Lys 






35 










40 










45 








Tyr 


Pro 


Ser 


Lys 


Val 


He 


Arg 


Lys 


Asp 


Ser 


Ala 


Lys 


Lys 


Thr 


Ala 


Lys 




50 










55 










60 










Lys 


Ser 


Arg 


Val 


Lys 


Cys 


Phe 


He 


Lys 


Leu 


Val 


Asn 


Tyr 


Gin 


His 


Leu 


65 










70 










75 










80 


Met 


Pro 


Thr 


Arg 


Tyr 


Thr 


Leu 


Asp 


Val 


Asp 


Leu 


Lys 


Glu 


Val 


Ala 


Thr 










85 










90 










95 




Leu 


Asp 


Ala 


Leu 


Gin 


Ser 


Lys 


Asp 


Lys 


Lys 


Val 


Ala 


Ala 


Leu 


Lys 


Glu 








100 










105 










110 






Ala 


Lys 


Ala 


Lys 


Leu 


Glu 


Glu 


Arg 


Phe 


Lys 


Thr 


Gly 


Lys 


Asn 


Arg 


Trp 






115 










120 










125 








Phe 


Phe 


Thr 


Lys 


Leu 


Arg 


Phe 





















130 135 
(2) INFORMATION FOR SEQ ID NO: 160: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 61 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..14 61 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581608 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 160: 



atcttaaaag aaaccgaaag ctaatctcca aagggaaaaa gacgtgtagc agagacctat 60 

acatacacct atacattcac gttgcgtgtg ataatctact gagggttcta tctatattaa 120 

tccttctctg cttccttcaa tetaattgea aatcaaatgg ategtcttaa gctttatttc 180 

teegtttteg ttttgtcttt etttategtc tcggtttcgt cgtctgatgt caacgacggc 240 

gatgatctcg tgatcegtea ggtggttggt ggagecgage ctcaggtttt gacctcagag 300 

gatcactttt ctctcttcaa gcggaagttc gggaaggtct acgcttccaa cgaggagcat 360 

gactatagat tctcggtttt caaagegaat etcaggegag egaggegtea ccagaagttg 420 

gatcegtegg cgactcatgg tgttacgcag ttctcagatc tgactcggtc tgagttccgt 480 

aagaagcact tgggggttag aagtggcttt aagcttccta aagatgccaa caaggctccg 540 

attctcccta ccgaaaatct ccctgaggat tttgattgga gagatcatgg cgccgttact 600 

cccgtcaaaa atcagggatc ttgeggctet tgctggagtt tcagcgccac tggagctttg 660 

gaaggtgcta acttcctcgc taceggcaaa ctcgtcagcc teagegaaca acagctcgtc 720 

gactgtgatc acgagtgtga tcccgaggag gcagattcct gcgactctgg ttgcaatggt 780 

gggctaatga acagegcttt tgaacacacc ctcaaaaccg gagggctcat gaaagaagaa 840 

gactatcctt acaceggaaa ggacggcaag acctgcaagc tagacaagtc caagategtt 900 

gcctctgtct ccaacttcag tgttatctcc attgatgaag aacagattgc tgeaaacett 960 

gtcaagaacg gacctcttgc tgtagecate aacgetgget atatgeagae ttacattgga 1020 

ggagtctcat gcccttacat atgeaccagg aggctcaacc acggtgtctt attggttggc 108 0 

tatggagegg caggttaege teeggctagg ttcaaggaga agecttactg gatcatcaag 1140 

aactcgtggg gagagacttg gggtgaaaat ggtttctaca aaatctgcaa aggcegtaac 1200 

atttgtggtg ttgacagtat ggtctccact gttgeageca ccgtctcaac caccgcccat 1260 

taagcatctc gtcaataagt tttaattact ttggtgattt gtatgagega gctctctttg 1320 

cgctgctgac tctctctatt tatctctget tettgettgt aaataaaatg cgttctattg 1380 

agcaaaaccg tacaatgetc attagcaatg agcccacttt aacaagtatc agtttatggc 1440 
ccaatatatg aaagttaaat c 
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(2) INFORMATION FOR SEQ ID NO: 161: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 68 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME / KEY : peptide 

(B) LOCATION: 1..368 

(D) OTHER INFORMATION : / Ceres Seq. ID 1581609 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 161 : 
Met Asp Arg Leu Lys Leu Tyr Phe Ser Val Phe Val Leu Ser Phe Phe 
15 10 15 

He Val Ser Val Ser Ser Ser Asp Val Asn Asp Gly Asp Asp Leu Val 

20 25 30 

He Arg Gin Val Val Gly Gly Ala Glu Pro Gin Val Leu Thr Ser Glu 

35 40 45 

Asp His Phe Ser Leu Phe Lys Arg Lys Phe Gly Lys Val Tyr Ala Ser 

50 55 60 

Asn Glu Glu His Asp Tyr Arg Phe Ser Val Phe Lys Ala Asn Leu Arg 
65 70 75 80 

Arq Ala Arg Arg His Gin Lys Leu Asp Pro Ser Ala Thr His Gly Val 

85 90 95 

Thr Gin Phe Ser Asp Leu Thr Arg Ser Glu Phe Arg Lys Lys His Leu 

100 105 HO 

Gly Val Arg Ser Gly Phe Lys Leu Pro Lys Asp Ala Asn Lys Ala Pro 

115 120 125 

He Leu Pro Thr Glu Asn Leu Pro Glu Asp Phe Asp Trp Arg Asp His 

130 135 140 

Gly Ala Val Thr Pro Val Lys Asn Gin Gly Ser Cys Gly Ser Cys Trp 
145 150 155 160 

Ser Phe Ser Ala Thr Gly Ala Leu Glu Gly Ala Asn Phe Leu Ala Thr 

165 170 175 

Glv Lys Leu Val Ser Leu Ser Glu Gin Gin Leu Val Asp Cys Asp His 

180 185 190 

Glu Cys Asp Pro Glu Glu Ala Asp Ser Cys Asp Ser Gly Cys Asn Gly 

195 200 205 

Gly Leu Met Asn Ser Ala Phe Glu His Thr Leu Lys Thr Gly Gly Leu 

210 215 220 

Met Lys Glu Glu Asp Tyr Pro Tyr Thr Gly Lys Asp Gly Lys Thr Cys 
225 230 235 240 

Lys Leu Asp Lys Ser Lys He Val Ala Ser Val Ser Asn Phe Ser Val 

245 250 255 

He Ser He Asp Glu Glu Gin He Ala Ala Asn Leu Val Lys Asn Gly 

260 265 270 

Pro Leu Ala Val Ala He Asn Ala Gly Tyr Met Gin Thr Tyr He Gly 

275 280 285 

Gly Val Ser Cys Pro Tyr He Cys Thr Arg Arg Leu Asn His Gly Val 

290 295 300 

Leu Leu Val Gly Tyr Gly Ala Ala Gly Tyr Ala Pro Ala Arg Phe Lys 
305 310 315 320 

Glu Lys Pro Tyr Trp He He Lys Asn Ser Trp Gly Glu Thr Trp Gly 

325 330 335 

Glu Asn Gly Phe Tyr Lys He Cys Lys Gly Arg Asn He Cys Gly Val 

340 345 350 

Asp Ser Met Val Ser Thr Val Ala Ala Thr Val Ser Thr Thr Ala His 
355 360 365 



(2) INFORMATION FOR SEQ ID NO: 162: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2004 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY: 

(B) LOCATION: 



DNA (genomic) 



1..2004 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581621 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 162: 
acattcattt tctctggtct ttgattgctt ttcttttcac tgcttaggtt ctttgtctgt 
taggtttaga gagatccttt ccaaagatcc aaagctggaa actttctagt caatgctgct 
taagatctga ctttttccct tcaatgtcgc tctgaaatcc cctaaatttt tgggcccatc 
aggtgaataa agcggaggtc agaagatggg ttctggttta aatgacacat tatctcggaa 
ctacaatggt ttggagctat gggagataat agtgattgtt ctctctgcga tattcgttgt 
agttttagct atatcgctat ggcttacttt cagaagaaaa acctctagat cttcttctaa 
tctaatccct gttagtcgcc agattcctcc tagtgttcct gaagagatta aagagattag 
agtcgacgag gtttcttcaa gcaatggtgg gaatggatac ccctctatta gtgagaaatt 
tggcgataaa gaacccgaaa aagggataaa agcagagtca gaaaatggcg atagtagccg 
gtcaggctcg tttaatcact tggagaaaaa agacggatcg agcgtatctt ctgctaatcc 
tttgacagct ccatctcctt tgtctggtct tcctgagttt tctcaccttg gatggggaca 
ttggttcact cttagagatc ttcagatggc tactaatcag ttttcaaggg ataatatcat 
cggtgatggt ggatatggag ttgtttaccg cggtaacctt gttaatggta ctcctgttgc 
tgttaaaaag ttgctcaaca atttaggaca agctgataaa gacttcagag ttgaagttga 
agctataggt cacgttcgac ataaaaactt ggtccgcctt ctcggatatt gtatggaagg 
aacgcagagg atgctggtgt atgagtatgt taacaatgga aatttggagc aatggctccg 
tggagacaat caaaatcatg agtatcttac atgggaggca cgagtgaaaa ttcttattgg 
gacagccaaa gcgctcgcgt accttcacga ggcgattgag ccaaaagtgg tgcacagaga 
cattaagtct agtaacattc tgattgatga caaattcaat tctaaaattt ctgactttgg 
acttgctaaa ctacttggtg ctgataagag ttttataact actagagtta tgggtacctt 
cggttacgta gctccagagt atgcgaattc cggtcttctg aatgagaaaa gcgatgtcta 
cagcttcggg gttgtactct tggaagctat aactggtaga tatccggtag actatgctcg 
tccaccaccc gaggtacatt tggtggagtg gctgaagatg atggtccaac aaagacgatc 
agaagaagtg gttgatccaa accttgaaac aaaaccatct acaagtgctt tgaaaagaac 
actattgact gctttgagat gtgttgatcc aatgtctgag aaaagaccga ggatgagcca 
agttgcacgt atgcttgaat ccgaagaata cccaattgct agagaggata ggagaagacg 
aaggagtcag aacgggacaa caagagattc agatcctccg aggaacagca cagacactga 
caagagtgag taccatgacc taaagcctga aggtggatag ccattggaaa taatggaggt 
gtaaattgtc gtaaagtctc gagctttatg ggaagtttca agtccttagt tcttctccag 
atcttctgcg ttatttgatg tttcttttct tttgtaataa ggagggagag agagagtgaa 
tgagttagtg agtgttcatc gttcgtttgt tgcttgtata gtatttgttt gttgtattat 
cgtttgcttc tccacgtttt gcaatgtagg ttgtttttgg actaagtaaa cgctatttgc 
aggttttgta gggtatgtaa gtctttgaga gaaacgacgt gtaatttgca atttgcttgt 
ggaagatatt gttatttgag agag 
(2) INFORMATION FOR SEQ ID NO: 163: 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 84 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 



(D) TOPOLOGY: 
MOLECULE TYPE: 
FEATURE : 

(A) NAME/KEY: 

(B) LOCATION: 



linear 
peptide 



peptide 
1..484 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581622 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 163: 
Met Gly Ser Gly Leu Asn Asp Thr Leu Ser Arg Asn Tyr Asn Gly Leu 
1 5 10 15 

Glu Leu Trp Glu He He Val He Val Leu Ser Ala He Phe Val Val 
20 25 30 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
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Val 


Leu 


Ala 


He 


Ser 


Leu 


Trp 


Leu 


Thr 


Phe 


Arg 


Arg 


Lys 


Thr 


Ser 


Arg 






35 










40 










45 








Ser 


Ser 


Ser 


Asn 


Leu 


He 


Pro 


Val 


Ser 


Arg 


Gin 


He 


Pro 


Pro 


Ser 


Val 




50 










55 










60 










Pro 


Glu 


Glu 


He 


Lys 


Glu 


He 


Arg 


Val 


Asp 


Glu 


Val 


Ser 


Ser 


Ser 


Asn 


65 










70 










75 










80 


Gly 


Gly 


Asn 


Gly 


Tyr 


Pro 


Ser 


He 


Ser 


Glu 


Lys 


Phe 


Gly 


Asp 


Lys 


Glu 










85 










90 










95 




Pro 


Glu 


Lys 


Gly 


He 


Lys 


Ala 


Glu 


Ser 


Glu 


Asn 


Gly 


Asp 


Ser 


Ser 


Arg 








100 










105 










110 






Ser 


Gly 


Ser 


Phe 


Asn 


His 


Leu 


Glu 


Lys 


Lys 


Asp 


Gly 


Ser 


Ser 


Val 


Ser 






115 










120 










125 








Ser 


Ala 


Asn 


Pro 


Leu 


Thr 


Ala 


Pro 


Ser 


Pro 


Leu 


Ser 


Gly 


Leu 


Pro 


Glu 




130 










135 










140 










Phe 


Ser 


His 


Leu 


Gly 


Trp 


Gly 


His 


Trp 


Phe 


Thr 


Leu 


Arg 


Asp 


Leu 


Gin 


145 










150 










155 










160 


Met 


Ala 


Thr 


Asn 


Gin 


Phe 


Ser 


Arg 


Asp 


Asn 


He 


He 


Gly 


Asp 


Gly 


Gly 










165 










170 










175 




Tyr 


Gly Val 


Val 


Tyr 


Arg 


Gly 


Asn 


Leu 


Val 


Asn 


Gly 


Thr 


Pro 


Val 


Ala 








180 










185 










190 






Val 


Lys 


Lys 


Leu 


Leu 


Asn 


Asn 


Leu 


Gly 


Gin 


Ala 


Asp 


Lys 


Asp 


Phe 


Arg 






195 










200 










205 








Val 


Glu 


Val 


Glu 


Ala 


He 


Gly 


His 


Val 


Arg 


His 


Lys 


Asn 


Leu 


Val 


Arg 




210 










215 










220 










Leu 


Leu 


Gly 


Tyr 


Cys 


Met 


Glu 


Gly 


Thr 


Gin 


Arg 


Met 


Leu 


Val 


Tyr 


Glu 


225 










230 










235 










240 


Tyr 


Val 


Asn 


Asn 


Gly 


Asn 


Leu 


Glu 


Gin 


Trp 


Leu 


Arg 


Gly 


Asp 


Asn 


Gin 










245 










250 










255 




Asn 


His 


Glu 


Tyr 


Leu 


Thr 


Trp 


Glu 


Ala 


Arg 


Val 


Lys 


He 


Leu 


He 


Gly 








260 










265 










270 






Thr 


Ala 


Lys 


Ala 


Leu 


Ala 


Tyr 


Leu 


His 


Glu 


Ala 


He 


Glu 


Pro 


Lys 


Val 






275 










280 










285 








Val 


His 


Arg 


Asp 


He 


Lys 


Ser 


Ser 


Asn 


He 


Leu 


He 


Asp 


Asp 


Lys 


Phe 




290 










295 










300 










Asn 


Ser 


Lys 


He 


Ser 


Asp 


Phe 


Gly 


Leu 


Ala 


Lys 


Leu 


Leu 


Gly 


Ala 


Asp 


305 










310 










315 










320 


Lys 


Ser 


Phe 


He 


Thr 


Thr 


Arg 


Val 


Met 


Gly 


Thr 


Phe 


Gly 


Tyr 


Val 


Ala 










325 










330 










335 




Pro 


Glu 


Tyr 


Ala 


Asn 


Ser 


Gly 


Leu 


Leu 


Asn 


Glu 


Lys 


Ser 


Asp 


Val 


Tyr 








340 










345 










350 






Ser 


Phe 


Gly 


Val 


Val 


Leu 


Leu 


Glu 


Ala 


He 


Thr 


Gly 


Arg 


Tyr 


Pro 


Val 






355 










360 










365 








Asp 


Tyr 


Ala 


Arg 


Pro 


Pro 


Pro 


Glu 


Val 


His 


Leu 


Val 


Glu 


Trp 


Leu 


Lys 




370 










375 










380 










Met 


Met 


Val 


Gin 


Gin 


Arg 


Arg 


Ser 


Glu 


Glu 


Val 


Val 


Asp 


Pro 


Asn 


Leu 


385 










390 










395 










400 


Glu 


Thr 


Lys 


Pro 


Ser 


Thr 


Ser 


Ala 


Leu 


Lys 


Arg 


Thr 


Leu 


Leu 


Thr 


Ala 










405 










410 










415 




Leu 


Arg 


Cys 


Val 


Asp 


Pro 


Met 


Ser 


Glu 


Lys 


Arg 


Pro 


Arg 


Met 


Ser 


Gin 








420 










425 










430 






Val 


Ala 


Arg 


Met 


Leu 


Glu 


Ser 


Glu 


Glu 


Tyr 


Pro 


He 


Ala 


Arg 


Glu 


Asp 






435 










440 










445 








Arg 


Arg 


Arg 


Arg 


Arg 


Ser 


Gin 


Asn 


Gly 


Thr 


Thr 


Arg 


Asp 


Ser 


Asp 


Pro 




450 










455 










460 










Pro 


Arg 


Asn 


Ser 


Thr 


Asp 


Thr 


Asp 


Lys 


Ser 


Glu 


Tyr 


His 


Asp 


Leu 


Lys 


465 










470 










475 










480 


Pro 


Glu 


Gly 


Gly 



























(2} INFORMATION FOR SEQ ID NO: 164: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 4 amino acids 
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(B) TYPE; amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
fix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..32 4 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581623 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 164: 



Met 


Ala 


Thr 


Asn 


Gin 


Phe 


Ser 


Arg 


Asp 


Asn 


He 


He 


Gly 


Asp 


Gly 


Gly 


1 








5 










10 










15 




Tyr 


Gly 


Val 


Val 


Tyr 


Arg 


Gly 


Asn 


Leu 


Val 


Asn 


Gly 


Thr 


Pro 


Val 


Ala 








20 










25 










30 






Val 


Lys 


Lys 


Leu 


Leu 


Asn 


Asn 


Leu 


Gly 


Gin 


Ala 


Asp 


Lys 


Asp 


Phe 


Arg 






35 










40 










45 








Val 


Glu 


Val 


Glu 


Ala 


He 


Gly 


His 


Val 


Arg 


His 


Lys 


Asn 


Leu 


Val 


Arg 




50 










55 










60 










Leu 


Leu 


Gly 


Tyr 


Cys 


Met 


Glu 


Gly 


Thr 


Gin 


Arg 


Met 


Leu 


Val 


Tyr 


Glu 


65 










70 










75 










80 


Tyr 


Val 


Asn 


Asn 


Gly 


Asn 


Leu 


Glu 


Gin 


Trp 


Leu 


Arg 


Gly Asp 


Asn 


Gin 










85 










90 










95 




Asn 


His 


Glu 


Tyr 


Leu 


Thr 


Trp 


Glu 


Ala 


Arg 


Val 


Lys 


He 


Leu 


He 


Gly 








100 










105 










110 






Thr 


Ala 


Lys 


Ala 


Leu 


Ala 


Tyr 


Leu 


His 


Glu 


Ala 


He 


Glu 


Pro 


Lys 


Val 






115 










120 










125 








Val 


His 


Arg 


Asp 


He 


Lys 


Ser 


Ser 


Asn 


He 


Leu 


He 


Asp 


Asp 


Lys 


Phe 




130 










135 










140 










Asn 


Ser 


Lys 


He 


Ser 


Asp 


Phe 


Gly 


Leu 


Ala 


Lys 


Leu 


Leu 


Gly Ala 


Asp 


145 










150 










155 










160 


Lys 


Ser 


Phe 


He 


Thr 


Thr 


Arg 


Val 


Met 


Gly 


Thr 


Phe 


Gly 


Tyr 


Val 


Ala 










165 










170 










175 




Pro 


Glu 


Tyr 


Ala 


Asn 


Ser 


Gly 


Leu 


Leu 


Asn 


Glu 


Lys 


Ser 


Asp 


Val 


Tyr 








180 










185 










190 






Ser 


Phe 


Gly 


Val 


Val 


Leu 


Leu 


Glu 


Ala 


He 


Thr 


Gly 


Arg 


Tyr 


Pro 


Val 






195 










200 










205 








Asp 


Tyr 


Ala 


Arg 


Pro 


Pro 


Pro 


Glu 


Val 


His 


Leu 


Val 


Glu 


Trp 


Leu 


Lys 




210 










215 










220 










Met 


Met 


Val 


Gin 


Gin 


Arg 


Arg 


Ser 


Glu 


Glu 


Val 


Val 


Asp 


Pro 


Asn 


Leu 


225 










230 










235 










240 


Glu 


Thr 


Lys 


Pro 


Ser 


Thr 


Ser 


Ala 


Leu 


Lys 


Arg 


Thr 


Leu 


Leu 


Thr 


Ala 










245 










250 










255 




Leu 


Arg 


Cys 


Val 


Asp 


Pro 


Met 


Ser 


Glu 


Lys 


Arg 


Pro 


Arg 


Met 


Ser 


Gin 








260 










265 










270 






Val 


Ala 


Arg 


Met 


Leu 


Glu 


Ser 


Glu 


Glu 


Tyr 


Pro 


He 


Ala 


Arg 


Glu 


Asp 






275 










280 










285 








Arg 


Arg 


Arg 


Arg 


Arg 


Ser 


Gin 


Asn 


Gly 


Thr 


Thr 


Arg 


Asp 


Ser 


Asp 


Pro 




290 










295 










300 










Pro 


Arg 


Asn 


Ser 


Thr 




Thr 


Asp 


Lys 


Ser 


Glu 


Tyr 


His 


Asp 


Leu 


Lys 



305 310 315 320 

Pro Glu Gly Gly 



(2) INFORMATION FOR SEQ ID NO: 165 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 255 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE; 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..255 
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{ D) OTHER INFORMATION: / Ceres Seq. ID 1581624 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 165: 



Met 


Glu 


Gly 


Thr 


Gin 


Arg 


Met 


Leu 


Val 


Tyr 


Glu 


Tyr 


Val 


Asn 


Asn 


Gly 


1 








5 










10 










15 




Asn 


Leu 


Glu 


Gin 


Trp 


Leu 


Arg 


Gly 


Asp 


Asn 


Gin 


Asn 


His 


Glu 


Tyr 


Leu 








20 










25 










30 






Thr 


Trp 


Glu 


Ala 


Arg 


Val 


Lys 


He 


Leu 


He 


Gly 


Thr 


Ala 


Lys 


Ala 


Leu 






35 










40 










45 








Ala 


Tyr 


Leu 


His 


Glu 


Ala 


He 


Glu 


Pro 


Lys 


Val 


Val 


His 


Arg 


Asp 


He 




50 










55 










60 










Lys 


Ser 


Ser 


Asn 


He 


Leu 


He 


Asp 


Asp 


Lys 


Phe 


Asn 


Ser 


Lys 


He 


Ser 


65 










70 










75 










80 


Asp 


Phe 


Gly 


Leu 


Ala 


Lys 


Leu 


Leu 


Gly 


Ala 


Asp 


Lys 


Ser 


Phe 


He 


Thr 










85 










90 










95 




Thr 


Arg 


Val 


Met 


Gly 


Thr 


Phe 


Gly 


Tyr 


Val 


Ala 


Pro 


Glu 


Tyr 


Ala 


Asn 








100 










105 










110 






Ser 


Gly 


Leu 


Leu 


Asn 


Glu 


Lys 


Ser 


Asp 


Val 


Tyr 


Ser 


Phe 


Gly 


Val 


Val 






115 










120 










125 








Leu 


Leu 


Glu 


Ala 


He 


Thr 


Gly 


Arg 


Tyr 


Pro 


Val 


Asp 


Tyr 


Ala 


Arg 


Pro 




130 










135 










140 










Pro 


Pro 


Glu 


Val 


His 


Leu 


Val 


Glu 


Trp 


Leu 


Lys 


Met 


Met 


Val 


Gin 


Gin 


145 










150 










155 










160 


Arg 


Arg 


Ser 


Glu 


Glu 


Val 


Val 


Asp 


Pro 


Asn 


Leu 


Glu 


Thr 


Lys 


Pro 


Ser 










165 










170 










175 




Thr 


Ser 


Ala 


Leu 


Lys 


Arg 


Thr 


Leu 


Leu 


Thr 


Ala 


Leu 


Arg 


Cys 


Val 


Asp 








180 










185 










190 






Pro 


Met 


Ser 


Glu 


Lys 


Arg 


Pro 


Arg 


Met 


Ser 


Gin 


Val 


Ala 


Arg 


Met 


Leu 






195 










200 










205 








Glu 


Ser 


Glu 


Glu 


Tyr 


Pro 


He 


Ala 


Arg 


Glu 


Asp 


Arg 


Arg 


Arg 


Arg 


Arg 




210 










215 










220 










Ser 


Gin 


Asn 


Gly 


Thr 


Thr 


Arg 


Asp 


Ser 


Asp 


Pro 


Pro 


Arg 


Asn 


Ser 


Thr 


225 










230 










235 










240 


Asp 


Thr 


Asp 


Lys 


Ser 


Glu 


Tyr 


His 


Asp 


Leu 


Lys 


Pro 


Glu 


Gly 


Gly 












245 










250 










255 





(2) INFORMATION FOR SEQ ID NO: 166 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1378 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..1378 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581926 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 166: 



atatgatttt tgttctcatt tactaatcaa agttctgcaa acttgtagtt gttgtaggat 60 

ttgttgctct ggctctggtg gtaggtctat gaaatcaacc catatcgtga atggactgca 120 

acatggtatc ttcgtcccag tgggattggg agcatttgat catgtccaat ccgtcaagga 180 

ctgaagatga cagcaaacag ctacctactg agtgggaaat tgaaaaaggt gaaggaattg 24 0 

aatctatagt tccacatttc tcaggccttg agagagtcag tagtggctct gccaccagct 300 

tctggcacac tgctgtatcg aaaagctcac agtcgacctc tatcaactca tcatctcccg 360 

aagccaaacg atgcaagctt gcatcagaaa gttcccctgg agattcttgc agcaacatag 420 

actttgtcca ggtgaaggct cccacagctc tcgaggtatc cgttgcctca gctgaatcag 480 

atctttgttt aaaactagga aagcggacat actctgaaga atactggggt agaaacaata 540 

atgaaatttc agcggtttct atgaagttgt taactccatc tgttgtcgct gggaaatcca 600 

aattgtgtgg tcagagcatg ccagtcccgc gttgccaaat tgatggctgt gaactggatc 660 

tctcatctgc taagggttat catcgtaagc acaaagtctg cgaaaagcat tcaaagtgcc 720 

caaaagttag cgtgagtggc ctggaacgtc ggttctgcca acagtgtagc aggttccatg 780 

ctgtctctga atttgatgag aagaaacgaa gctgccgaaa acgtctttct catcataatg 840 

cgaggcgtcg taagccacaa ggagtatttt caatgaatcc cgagagggtg tatgatcgaa 900 
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gacagcatac aaatatgttg tggaatgggg tgtcccttaa cgcgagatct gaagaaatgt 960 
atgaatgggg taataacact tatgatacaa agcctagaca aacggaaaaa agctttactc 1020 
tgagcttcca gagaggtaat ggctctgagg accagctggt tgctagtagc agccgtatgt 1080 
tctctacatc tcaaacctca ggtgggttcc cagcaggaaa gtccaagttt caacttcatg 1140 
gcgaagatgt gggagaatac tcaggagtcc tccatgaatc tcaagatatc caccgtgctc 1200 
tctctcttct gtcaacctct tcggatcccc tggcccaacc acatgtgcag ccattttctc 1260 
tactctgttc atatgatgtt gtaccaaaat agatgagtaa gtaatgtgta atttgtaaac 1320 
ctgttactca gttggtggat acttttccaa acctatgata aaaacctcgt cctagacc 
(2) INFORMATION FOR SEQ ID NO: 167: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 393 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..393 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581927 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 167: 



Met 


Asp 


Cys 


Asn 


Met 


Val 


Ser 


Ser 


Ser 


Gin 


Trp 


Asp 


Trp 


Glu 


His 


Leu 


1 








5 










10 










15 




lie 


Met 


Ser 


Asn 


Pro 


Ser 


Arg 


Thr 


Glu 


Asp 


Asp 


Ser 


Lys 


Gin 


Leu 


Pro 








20 










25 










30 






Thr 


Glu 


Trp 


Glu 


He 


Glu 


Lys 


Gly 


Glu 


Gly 


He 


Glu 


Ser 


He 


Val 


Pro 






35 










40 










45 








His 


Phe 


Ser 


Gly 


Leu 


Glu 


Arg 


Val 


Ser 


Ser 


Gly 


Ser 


Ala 


Thr 


Ser 


Phe 




50 










55 










60 










Trp 


His 


Thr 


Ala 


Val 


Ser 


Lys 


Ser 


Ser 


Gin 


Ser 


Thr 


Ser 


He 


Asn 


Ser 


65 










70 










75 










80 


Ser 


Ser 


Pro 


Glu 


Ala 


Lys 


Arg 


Cys 


Lys 


Leu 


Ala 


Ser 


Glu 


Ser 


Ser 


Pro 










85 










90 










95 




Gly 


Asp 


Ser 


Cys 


Ser 


Asn 


He 


Asp 


Phe 


Val 


Gin 


Val 


Lys 


Ala 


Pro 


Thr 








100 










105 










110 






Ala 


Leu 


Glu 


Val 


Ser 


Val 


Ala 


Ser 


Ala 


Glu 


Ser 


Asp 


Leu 


Cys 


Leu 


Lys 






115 










120 










125 








Leu 


Gly 


Lys 


Arg 


Thr 


Tyr 


Ser 


Glu 


Glu 


Tyr 


Trp 


Gly Arg 


Asn 


Asn 


Asn 




130 










135 










140 










Glu 


He 


Ser 


Ala 


Val 


Ser 


Met 


Lys 


Leu 


Leu 


Thr 


Pro 


Ser 


Val 


Val 


Ala 


145 










150 










155 










160 


Gly 


Lys 


Ser 


Lys 


Leu 


Cys 


Gly 


Gin 


Ser 


Met 


Pro 


Val 


Pro 


Arg 


Cys 


Gin 










165 










170 










175 




He 


Asp 


Gly 


Cys 


Glu 


Leu Asp 


Leu 


Ser 


Ser 


Ala 


Lys 


Gly 


Tyr 


His 


Arg 








180 










185 










190 






Lys 


His 


Lys 


Val 


Cys 


Glu 


Lys 


His 


Ser 


Lys 


Cys 


Pro 


Lys 


Val 


Ser 


Val 






195 










200 










205 








Ser 


Gly 


Leu 


Glu 


Arg 


Arg 


Phe 


Cys 


Gin 


Gin 


Cys 


Ser 


Arg 


Phe 


His 


Ala 




210 










215 










220 










Val 


Ser 


Glu 


Phe 


Asp 


Glu 


Lys 


Lys 


Arg 


Ser 


Cys 


Arg 


Lys 


Arg 


Leu 


Ser 


225 










230 










235 










240 


His 


His 


Asn 


Ala 


Arg 


Arg 


Arg 


Lys 


Pro 


Gin 


Gly 


Val 


Phe 


Ser 


Met 


Asn 










245 










250 










255 




Pro 


Glu 


Arg 


Val 


Tyr 


Asp 


Arg 


Arg 


Gin 


His 


Thr 


Asn 


Met 


Leu 


Trp 


Asn 








260 










265 










270 






Gly 


Val 


Ser 


Leu 


Asn 


Ala 


Arg 


Ser 


Glu 


Glu 


Met 


Tyr 


Glu 


Trp 


Gly 


Asn 






275 










280 










285 








Asn 


Thr 


Tyr 


Asp 


Thr 


Lys 


Pro 


Arg 


Gin 


Thr 


Glu 


Lys 


Ser 


Phe 


Thr 


Leu 




290 










295 










300 










Ser 


Phe 


Gin 


Arg 


Gly 


Asn 


Gly 


Ser 


Glu 


Asp 


Gin 


Leu 


Val 


Ala 


Ser 


Ser 


305 










310 










315 










320 


Ser 


Arg 


Met 


Phe 


Ser 


Thr 


Ser 


Gin 


Thr 


Ser 


Gly 


Gly 


Phe 


Pro 


Ala 


Gly 
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325 330 335 

Lys Ser Lys Phe Gin Leu His Gly Glu Asp Val Gly Glu Tyr Ser Gly 

340 345 350 

Val Leu His Glu Ser Gin Asp lie His Arg Ala Leu Ser Leu Leu Ser 

355 360 365 

Thr Ser Ser Asp Pro Leu Ala Gin Pro His Val Gin Pro Phe Ser Leu 

370 375 380 

Leu Cys Ser Tyr Asp Val Val Pro Lys 
385 390 
(2) INFORMATION FOR SEQ ID NO: 168: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..389 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581928 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 168: 
Met Val Ser Ser Ser Gin Trp Asp Trp Glu His Leu lie Met Ser Asn 
15 10 15 

Pro Ser Arg Thr Glu Asp Asp Ser Lys Gin Leu Pro Thr Glu Trp Glu 

20 25 30 

lie Glu Lys Gly Glu Gly lie Glu Ser lie Val Pro His Phe Ser Gly 

35 40 45 

Leu Glu Arg Val Ser Ser Gly Ser Ala Thr Ser Phe Trp His Thr Ala 

50 55 60 

Val Ser Lys Ser Ser Gin Ser Thr Ser lie Asn Ser Ser Ser Pro Glu 
65 70 75 80 

Ala Lys Arg Cys Lys Leu Ala Ser Glu Ser Ser Pro Gly Asp Ser Cys 

85 90 95 

Ser Asn lie Asp Phe Val Gin Val Lys Ala Pro Thr Ala Leu Glu Val 

100 105 110 

Ser Val Ala Ser Ala Glu Ser Asp Leu Cys Leu Lys Leu Gly Lys Arg 

115 120 125 

Thr Tyr Ser Glu Glu Tyr Trp Gly Arg Asn Asn Asn Glu lie Ser Ala 

130 135 140 

Val Ser Met Lys Leu Leu Thr Pro Ser Val Val Ala Gly Lys Ser Lys 
145 150 155 160 

Leu Cys Gly Gin Ser Met Pro Val Pro Arg Cys Gin lie Asp Gly Cys 

165 170 175 

Glu Leu Asp Leu Ser Ser Ala Lys Gly Tyr His Arg Lys His Lys Val 

180 185 190 

Cys Glu Lys His Ser Lys Cys Pro Lys Val Ser Val Ser Gly Leu Glu 

195 200 205 

Arg Arg Phe Cys Gin Gin Cys Ser Arg Phe His Ala Val Ser Glu Phe 

210 215 220 

Asp Glu Lys Lys Arg Ser Cys Arg Lys Arg Leu Ser His His Asn Ala 
225 230 235 240 

Arg Arg Arg Lys Pro Gin Gly Val Phe Ser Met Asn Pro Glu Arg Val 

245 250 255 

Tyr Asp Arg Arg Gin His Thr Asn Met Leu Trp Asn Gly Val Ser Leu 

260 265 270 

Asn Ala Arg Ser Glu Glu Met Tyr Glu Trp Gly Asn Asn Thr Tyr Asp 

275 280 285 

Thr Lys Pro Arg Gin Thr Glu Lys Ser Phe Thr Leu Ser Phe Gin Arg 

290 295 300 

Gly Asn Gly Ser Glu Asp Gin Leu Val Ala Ser Ser Ser Arg Met Phe 
305 310 315 320 
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Ser Thr Ser Gin Thr Ser Gly Gly Phe Pro Ala Gly Lys Ser Lys Phe 

325 330 335 

Gin Leu His Gly Glu Asp Val Gly Glu Tyr Ser Gly Val Leu His Glu 

340 345 350 

Ser Gin Asp lie His Arg Ala Leu Ser Leu Leu Ser Thr Ser Ser Asp 

355 " 360 365 

Pro Leu Ala Gin Pro His Val Gin Pro Phe Ser Leu Leu Cys Ser Tyr 

370 375 380 

Asp Val Val Pro Lys 
385 

(2) INFORMATION FOR SEQ ID NO: 169: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..376 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581929 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 169: 
Met Ser Asn Pro Ser Arg Thr Glu Asp Asp Ser Lys Gin Leu Pro Thr 
15 10 15 

Glu Trp Glu lie Glu Lys Gly Glu Gly lie Glu Ser lie Val Pro His 

20 25 30 

Phe Ser Gly Leu Glu Arg Val Ser Ser Gly Ser Ala Thr Ser Phe Trp 

35 40 45 

His Thr Ala Val Ser Lys Ser Ser Gin Ser Thr Ser lie Asn Ser Ser 

50 55 60 

Ser Pro Glu Ala Lys Arg Cys Lys Leu Ala Ser Glu Ser Ser Pro Gly 
65 70 75 80 

Asp Ser Cys Ser Asn lie Asp Phe Val Gin Val Lys Ala Pro Thr Ala 

85 90 95 

Leu Glu Val Ser Val Ala Ser Ala Glu Ser Asp Leu Cys Leu Lys Leu 

100 105 110 

Gly Lys Arg Thr Tyr Ser Glu Glu Tyr Trp Gly Arg Asn Asn Asn Glu 

115 120 125 

He Ser Ala Val Ser Met Lys Leu Leu Thr Pro Ser Val Val Ala Gly 

130 135 140 

Lys Ser Lys Leu Cys Gly Gin Ser Met Pro Val Pro Arg Cys Gin He 
145 150 155 160 

Asp Gly Cys Glu Leu Asp Leu Ser Ser Ala Lys Gly Tyr His Arg Lys 

165 170 175 

His Lys Val Cys Glu Lys His Ser Lys Cys Pro Lys Val Ser Val Ser 

180 185 190 

Gly Leu Glu Arg Arg Phe Cys Gin Gin Cys Ser Arg Phe His Ala Val 

195 200 205 

Ser Glu Phe Asp Glu Lys Lys Arg Ser Cys Arg Lys Arg Leu Ser His 

210 215 220 

His Asn Ala Arg Arg Arg Lys Pro Gin Gly Val Phe Ser Met Asn Pro 
225 230 235 240 

Glu Arg Val Tyr Asp Arg Arg Gin His Thr Asn Met Leu Trp Asn Gly 

245 250 255 

Val Ser Leu Asn Ala Arg Ser Glu Glu Met Tyr Glu Trp Gly Asn Asn 

260 265 270 

Thr Tyr Asp Thr Lys Pro Arg Gin Thr Glu Lys Ser Phe Thr Leu Ser 

275 280 285 

Phe Gin Arg Gly Asn Gly Ser Glu Asp Gin Leu Val Ala Ser Ser Ser 

290 295 300 

Arg Met Phe Ser Thr Ser Gin Thr Ser Gly Gly Phe Pro Ala Gly Lys 
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305 310 315 320 

Ser Lys Phe Gin Leu His Gly Glu Asp Val Gly Glu Tyr Ser Gly Val 

325 330 335 

Leu His Glu Ser Gin Asp He His Arg Ala Leu Ser Leu Leu Ser Thr 



340 



345 



350 



Ser Ser Asp Pro Leu Ala Gin Pro His Val Gin Pro Phe Ser Leu Leu 

355 360 365 

Cys Ser Tyr Asp Val Val Pro Lys 

370 375 
(2} INFORMATION FOR SEQ ID NO: 170: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1017 base pairs 
{B} TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1017 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581971 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 170: 
gaatcgaaac agatcggaaa atcgtcgaga gagagagaga gagagagaga atgtcgtatc 
ttcttccaca tctgcactcc ggttgggctg ttgatcagtc gattctggcc gaggaagagc 
gtctcgtcgt cattcgtttc ggccatgact gggatgagac ctgtatgcag atggatgagg 
tgcttgcgtc tgttgctgag acgattaaga actttgcagt catttatctg gtggacatca 
ctgaggttcc agacttcaac accatgtacg agctgtacga tccttctacg gtcatgttct 
tcttcaggaa caagcacatc atgatcgatc ttggaactgg taacaacaac aagatcaact 
gggctctcaa ggacaagcag gagttcattg atatcattga gactgtctac cgtggtgcaa 
ggaagggtcg tggattggtg attgctccaa aagattactc caccaaatac cgttactaat 
cgagcttccc aacactatct agtttgttaa aaccattgag tctagtgatt ctggtcagct 
gaaatatccc gtgaactggt ttcatttcat atatgctttg atgatgattg tgattctggt 
gcctctgttc ctcgtcccgc tcatcaattt aatgcctcga atcatcgatt atttcatggc 
taaattgtac gcgtggctcg gatgggagta caggaagcca gcgagagttc ctccagcttg 
tcctttcaag ccagttgcta aaaacgacaa cgcgaccaaa gtgggtgcgg aaactggaac 
tgaaggtaca gaaacaatag ctaaacctgg cgttgtagag gagacgggtg gtattaagca 
ggattgaggc cgagtctttt cttcaataaa accttttgta tttgtggata tgttttggat 
aagagaagaa gctggaagtt gaaattacct gatgtgatac tgataatgaa tgatgtacgt 
tgaaaatgct acgtaacaga taattttctt atatgtttac ttctctctct ctctctc 
(2) INFORMATION FOR SEQ ID NO: 171: 



u; 



(ii) 
(ix) 



{xi 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 107 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE 
FEATURE: 

(A) NAME /KEY 

(B) LOCATION 



peptide 



peptide 
1. . 107 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581972 
SEQUENCE DESCRIPTION: SEQ ID NO: 171: 



Glu Ser Lys Gin He Gly Lys Ser Ser Arg Glu Arg Glu Arg Glu Arg 

15 



10 



Glu Cys Arg He Phe Phe His He Cys Thr Pro Val Gly Leu Leu He 



20 



25 



30 



Ser Arg Phe Trp Pro Arg Lys Ser Val Ser Ser Ser Phe Val Ser Ala 



35 



40 



45 



Met Thr Gly Met Arg Pro Val Cys Arg Trp Met Arg Cys Leu Arg Leu 



50 



55 



60 



Leu Leu Arg Arg Leu Arg Thr Leu Gin Ser Phe He Trp Trp Thr Ser 



65 



70 



75 



80 



Leu Arg Phe Gin Thr Ser Thr Pro Cys Thr Ser Cys Thr He Leu Leu 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
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85 90 95 

Arg Ser Cys Ser Ser Ser Gly Thr Ser Thr Ser 

100 105 
(2) INFORMATION FOR SEQ ID NO: 172: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 158 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/ KEY : peptide 

(B) LOCATION: 1. .158 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581973 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 172: 
He Glu Thr Asp Arg Lys He Val Glu Arg Glu Arg Glu Arg Glu Arg 
15 10 15 

Met Ser Tyr Leu Leu Pro His Leu His Ser Gly Trp Ala Val Asp Gin 

20 25 30 

Ser He Leu Ala Glu Glu Glu Arg Leu Val Val He Arg Phe Gly His 

35 40 45 

Asp Trp Asp Glu Thr Cys Met Gin Met Asp Glu Val Leu Ala Ser Val 

50 55 60 

Ala Glu Thr lie Lys Asn Phe Ala Val He Tyr Leu Val Asp He Thr 
65 70 75 80 

Glu Val Pro Asp Phe Asn Thr Met Tyr Glu Leu Tyr Asp Pro Ser Thr 

85 90 95 

Val Met Phe Phe Phe Arg Asn Lys His He Met He Asp Leu Gly Thr 

100 105 HO 

Gly Asn Asn Asn Lys He Asn Trp Ala Leu Lys Asp Lys Gin Glu Phe 

115 120 125 

He Asp He He Glu Thr Val Tyr Arg Gly Ala Arg Lys Gly Arg Gly 

130 135 140 

Leu Val He Ala Pro Lys Asp Tyr Ser Thr Lys Tyr Arg Tyr 
145 150 155 

(2) INFORMATION FOR SEQ ID NO: 173: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 142 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..142 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581974 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 


:173 










Gin 


Met 


Ser 


Tyr 


Leu 


Leu 


Pro 


His 


Leu 


His 


Ser 


Gly 


Trp 


Ala 


Val 


Asp 


1 






5 










10 










15 




Ser 


He 


Leu 


Ala 


Glu 


Glu 


Glu 


Arg 


Leu 


Val 


Val 


He 


Arg 


Phe 


Gly 


His 






20 










25 










30 






Asp 


Trp 


Asp 


Glu 


Thr 


Cys 


Met 


Gin 


Met 


Asp 


Glu 


Val 


Leu 


Ala 


Ser 


Val 


35 










40 










45 




He 


Thr 


Ala 


Glu 


Thr 


He 


Lys 


Asn 


Phe 


Ala 


Val 


He 


Tyr 


Leu 


Val 


Asp 




50 










55 










60 








Thr 


Glu 


Val 


Pro 


Asp 


Phe 


Asn 


Thr 


Met 


Tyr 


Glu 


Leu 


Tyr 


Asp 


Pro 


Ser 


65 








70 










75 










80 


Val 


Met 


Phe 


Phe 


Phe 


Arg 


Asn 


Lys 


His 


He 


Met 


He 


Asp 


Leu 


Gly 


Thr 








85 








90 










95 




Gly 


Asn 


Asn 


Asn 


Lys 


He 


Asn 


Trp 


Ala 


Leu 


Lys 


Asp 


Lys 


Gin 


Glu 


Phe 






100 










105 










110 
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He Asp He He Glu Thr Val Tyr Arg Gly Ala Arg Lys Gly Arg Gly 

115 120 125 

Leu Val He Ala Pro Lys Asp Tyr Ser Thr Lys Tyr Arg Tyr 

130 135 140 

(2) INFORMATION FOR SEQ ID NO: 17 4: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1230 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1230 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581981 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 174: 
attacagcat cgtaagccaa ccactcttaa actaaaagtt gtctttctaa tcaaaagcca 
aataatagga agagtatgga agaagatgat ggagacgcgt ctacgccgtt ttggcttcaa 120 
tcacgccgca ataacactta cttccgccgc actgcaagtc tcggtggccg tacaaccacc 180 
atcgccactc aaatattctt cgccggaaca gctgcaatcc tcatagtcgt cttcattatc 240 
cctcctttct tctcctctgt ttctcagatt ttccgacctc atttagtccg taaaagctgg 300 
gattatctca acttcgttct cgtccttttc gccgtccttt gcggcttcct cagccgcaac 
accaataatg acgaatccaa tcatcacaaa gaagaagaca ttcgtaacaa attctcgact 
tcaccatcga ttattgatcg aagaagtcgt gtatctaaca gtggtacaac gccgcgttat 
tggaacgatg atcgcggtgg tggcggcggt gatcaaacgg tgtacaagag gtttagtaga 540 
ttacgaagtg ttagctcgta tccagatctg aggctccggg aatacgaagc cgatgaacgg ^ nn 
tggagattct acgatgatac acgtgtaagt caatgccgtt atgaagatgt agatccgatc 



60 



360 
420 
480 



600 
660 



tatccaaatc aaagttacag aaactggcat gaggaaggta aaccaccgcc ggaagatgta 720 
gatcaaacag aggacggtga taatggagaa ggaagtaagg tccgtaacgg cggttcggaa 780 

, ~ 84 0 

900 
960 



actgagaaag ttgaggtggt tgcgacggcg gaagctgaag tagtagaaga gctaaaagtg 
ccttctgctc cgccgtatat tccgtctcct ccgccgtctc cgccacgtcc tccaccagcg 
aagcaagcga agagaaagac taatagagtg taccaagatg tttctccaca ggaagagaag 
aaagaaagag atgattttgt agcgacgacg acgccgattc cacctccggc gactgtgtat 1020 
caaaagagca ataaacagga gaagaagaaa ggaggagcaa cgaaagactt tctgattgcg 1080 
ttacggagaa agaagaagaa gcagagacaa cagagcatcg atggcctcga tctcctcttc 1140 
ggctccgatc ctccattggt ctattcacca ccaccgccgc cgcctcctcc accacctttc 1200 
ttccaagggc ttttctcatc caaaaaaggt 
(2) INFORMATION FOR SEQ ID NO: 17 5: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..385 

(D) OTHER INFORMATION: / Ceres Seq. ID 1581982 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 175: 
Met Glu Glu Asp Asp Gly Asp Ala Ser Thr Pro Phe Trp Leu Gin Ser 
15 10 15 

Arg Arg Asn Asn Thr Tyr Phe Arg Arg Thr Ala Ser Leu Gly Gly Arg 

20 25 30 

Thr Thr Thr He Ala Thr Gin He Phe Phe Ala Gly Thr Ala Ala He 

35 40 45 

Leu He Val Val Phe He He Pro Pro Phe Phe Ser Ser Val Ser Gin 

50 55 60 

He Phe Arg Pro His Leu Val Arg Lys Ser Trp Asp Tyr Leu Asn Phe 
65 70 75 80 

Val Leu Val Leu Phe Ala Val Leu Cys Gly Phe Leu Ser Arg Asn Thr 
85 90 95 
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Asn Asn Asp Glu Ser Asn His His Lys Glu Glu Asp He Arg Asn Lys 

100 105 HO 

Phe Ser Thr Ser Pro Ser He He Asp Arg Arg Ser Arg Val Ser Asn 

115 120 125 

Ser Gly Thr Thr Pro Arg Tyr Trp Asn Asp Asp Arg Gly Gly Gly Gly 

130 135 140 

Gly Asp Gin Thr Val Tyr Lys Arg Phe Ser Arg Leu Arg Ser Val Ser 
145 150 155 160 

Ser Tyr Pro Asp Leu Arg Leu Arg Glu Tyr Glu Ala Asp Glu Arg Trp 

165 170 175 

Arg Phe Tyr Asp Asp Thr Arg Val Ser Gin Cys Arg Tyr Glu Asp Val 

180 185 190 

Asp Pro He Tyr Pro Asn Gin Ser Tyr Arg Asn Trp His Glu Glu Gly 

195 200 205 

Lys Pro Pro Pro Glu Asp Val Asp Gin Thr Glu Asp Gly Asp Asn Gly 

210 215 220 

Glu Gly Ser Lys Val Arg Asn Gly Gly Ser Glu Thr Glu Lys Val Glu 
225 230 235 240 

Val Val Ala Thr Ala Glu Ala Glu Val Val Glu Glu Leu Lys Val Pro 

245 250 255 

Ser Ala Pro Pro Tyr He Pro Ser Pro Pro Pro Ser Pro Pro Arg Pro 

260 265 270 

Pro Pro Ala Lys Gin Ala Lys Arg Lys Thr Asn Arg Val Tyr Gin Asp 

275 280 285 

Val Ser Pro Gin Glu Glu Lys Lys Glu Arg Asp Asp Phe Val Ala Thr 

290 295 300 

Thr Thr Pro He Pro Pro Pro Ala Thr Val Tyr Gin Lys Ser Asn Lys 
305 310 315 320 

Gin Glu Lys Lys Lys Gly Gly Ala Thr Lys Asp Phe Leu He Ala Leu 

325 330 335 

Arg Arg Lys Lys Lys Lys Gin Arg Gin Gin Ser He Asp Gly Leu Asp 

340 345 350 

Leu Leu Phe Gly Ser Asp Pro Pro Leu Val Tyr Ser Pro Pro Pro Pro 

355 360 365 

Pro Pro Pro Pro Pro Pro Phe Phe Gin Gly Leu Phe Ser Ser Lys Lys 
370 375 380 

Gly 
385 

(2) INFORMATION FOR SEQ ID NO: 17 6: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1239 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1.H239 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582014 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 6: 
tgagtaatag tgagcaaaat gcaagagcta gtgtgaacaa gaatggatta agagacttga 60 
ggaacaaatc tggttctgat gttcttccat ctaattcaac tccgacaaga aaaagtaaca 120 
tctttagaaa gaaaacctct gatggtgaga gcagctcttc gagtagaggg aataagacgg 180 
aaggatcagt ggttggggga aagaatatta gttcccctca ggggaatggc atcaccatgt 240 
ctgaacctag gaggaacaga aacttaccaa gtgttaggga caacagtgtt gtttcaagta 300 
gtactaggag atcaactggt tattatggta gaacaggacg tgctggagcg gttgcaacac 360 
tacaagcacc tcggcctcca acaagagctg atctcaatcc ttctagatcg gcagaagctt 420 
cgcgtagtcc tttaaatagt tacagtaggc caatcagtag taatggcagg ttacgtagcc 480 
tgatgatgcc tggtagcccc tcagaagccg gcctttctcg ctctttgatg aaccgtgata 540 
cttacagacg gtataacatg aatggagttg cagaggtatt gttggccctg gaaaggattg 600 
agcaagacga agagcttaca tacgagcaat tggctgtttt agagaccaat ctgttcttaa 660 
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atggtatgag cagcttccac gatcagcata gagatatgag gcttgacatt gacaacatgt 
cgtatgagga actgttagca ttagaggaga aaatgggtac agtgagcact gctctaagcg 
tctgaaaagc ctcaagtcaa gtatttaccg tccaaacgat gaatccgacg 
gaacaaagat gatgatgtca agtgcagcat ttgccaggaa gagtatgttg 
agtagggact ttgccttgcc aacataaata ccacgtgaGc tgcgcgcaac 
gatgaagaat tggtgtccta tatgtaaaac ctctgcagaa tctcagccac 



aagaagcgct 
acatttgcct 
atggagatga 
aatggctacg 

atccattttc atgatgatgg gattgcagag acatttacca aagtgtttgt gtctacttcc 



actaccttgc tctctattat ttgtacataa aaggttttct 
ggaacaaata gaaaataaga tcaggtggac aaattattgt 
taatttcaac aatatttggt ttaattccat atatagttt 
(2) INFORMATION FOR SEQ ID NO: 177; 



tttttactta cacacaattg 
tttcaccttg tattggttcg 



(i) 



(ii) 
<ix) 



(xi) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 343 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..343 

{ D) OTHER INFORMATION: / Ceres 
SEQUENCE DESCRIPTION: SEQ ID NO: 



Seq. 
177 : 



ID 1582015 



Ser 


Asn 


Ser 


Glu 


Gin 


Asn 


Ala 


Arg 


Ala 


Ser 


Val 


Asn 


Lys 


Asn 


Gly 


Leu 


1 








5 










10 










15 




Arg 


Asp 


Leu 


Arg 


Asn 


Lys 


Ser 


Gly 


Ser 


Asp 


Val 


Leu 


Pro 


Ser 


Asn 


Ser 








20 










25 
















Thr 


Pro 


Thr 


Arg 


Lys 


Ser 


Asn 


He 


Phe 


Arg 


Lys 


Lys 


Thr 


Ser 


^ S P 


Gly 






35 










4 0 










45 








Glu 


Ser 


Ser 


Ser 


Ser 


Ser 


Arg 


Gly 


Asn 


Lys 


Thr 


Glu 


Gly 


Ser 


Val 


Val 




50 










55 










60 










Gly 


Gly 


Lys 


Asn 


lie 


Ser 


Ser 


Pro 


Gin 


Gly 


Asn 


Gly 


He 


Thr 


Met 


Ser 


65 










70 










75 










80 


Glu 


Pro 


Arg 


Arg 


Asn 


Arg 


Asn 


Leu 


Pro 


Ser 


Val 


Arg 


Asp 


Asn 


C? n -v 

ber 


Vdl 










85 










90 










95 




Val 


Ser 


Ser 


Ser 


Thr 


Arg 


Arg 


Ser 


Thr 


Gly 


Tyr 


Tyr 


Gly 


Arg 


Thr 


Gly 








100 










105 










110 






Arg 


Ala 


Gly 


Ala 


Val 


Ala 


Thr 


Leu 


Gin 


Ala 


Pro 


Arg 


Pro 


Pro 


Thr 


Arg 






115 










120 










125 








Ala 


Asp 


Leu 


Asn 


Pro 


Ser 


Arg 


Ser 


Ala 


Glu 


Ala 


Ser 


Arg 


Ser 


Pro 


Leu 




130 










135 










140 










Asn 


Ser 


Tyr 


Ser 


Arg 


Pro 


He 


Ser 


Ser 


Asn 


Gly Arg 


Leu 


Arg 


Ser 


Leu 


145 










150 










155 










160 


Met 


Met 


Pro 


Gly 


Ser 


Pro 


Ser 


Glu 


Ala 


Gly 


Leu 


Ser 


Arg 


Ser 


Leu 


Met 










165 










170 










175 




Asn 


Arg 


Asp 


Thr 


Tyr 


Arg 


Arg 


Tyr 


Asn 


Met 


Asn 


Gly 


Val 


Ala 


Glu 


Val 








180 










185 










190 






Leu 


Leu 


Ala 


Leu 


Glu 


Arg 


He 


Glu 


Gin 


Asp 


Glu 


Glu 


Leu 


Thr 


Tyr 


Glu 






195 










200 










205 








Gin 


Leu 


Ala 


Val 


Leu 


Glu 


Thr 


Asn 


Leu 


Phe 


Leu 


Asn 


Gly 


Met 


Ser 


Ser 




210 










215 










220 










Phe 


His 


Asp 


Gin 


His 


Arg 


Asp 


Met 


Arg 


Leu 


Asp 


He 


Asp 


Asn 


Met 


Ser 


225 










230 










235 










240 


Tyr 


Glu 


Glu 


Leu 


Leu 


Ala 


Leu 


Glu 


Glu 


Lys 


Met 


Gly 


Thr 


Val 


Ser 


Thr 










245 










250 










255 




Ala 


Leu 


Ser 


Glu 


Glu 


Ala 


Leu 


Leu 


Lys 


Ser 


Leu 


Lys 


Ser 


Ser 


He 


Tyr 








260 










265 










270 






Arg 


Pro 


Asn 


Asp 


Glu 


Ser 


Asp 


Asp 


He 


Cys 


Leu 


Asn 


Lys 


Asp 


Asp 


Asp 






275 










280 










285 








Val 


Lys 


Cys 


Ser 


He 


Cys 


Gin 


Glu 


Glu 


Tyr 


Val 


Asp 


Gly 


Asp 


Glu 


Val 




290 










295 










300 










Gly 


Thr 


Leu 


Pro 


Cys 


Gin 


His 


Lys 


Tyr 


His 


Val 


Ser 


Cys 


Ala 


Gin 


Gin 



720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
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305 310 315 320 

Trp Leu Arg Met Lys Asn Trp Cys Pro lie Cys Lys Thr Ser Ala Glu 

325 330 335 

Ser Gin Pro His Pro Phe Ser 
340 

(2) INFORMATION FOR SEQ ID NO: 17 8: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 65 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

( B ) LOCATION: 1..2 65 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582016 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 178: 



Met 


Ser 


GlU 


Pro 


Arg 


Arg 


Asn 


Arg 


Asn 


Leu 


Pro 


Ser 


T 7 ~ I 

vai 


Arg 


Asp 


Asn 


i 








n 

D 










x u 










X 0 




Ser 


vai 


vai 


Ser 


Ser 


Ser 


l nr 


Arg 


Arg 


oer 


i nr 




iyr 


Tyr 


^iy 


Arg 


















25 
















Thr 


bry 


Arg 


AJ_a 


t^iy 


Ala 


vai 


Ala 


T 1 Vi -w 

1 nr 


Leu 


c^xn 


Ala 


Pro 


Arg 


Pro 


Pro 
















a n 

*± U 










A S 








Thr 


Arg 


7\ 1 -a 

t\±. a 


Asp 


Leu 


Asn 


Pro 


O r-\ -y- 


Arg 




7\1 a 
fiid 


ulU 


rlX a 




rily 






-J u 






























Pro 


Leu 


Asn 


Ser 


Tyr 


Ser 


Arg 


Pro 


i _Le 


oer 


Ser 


Asn 


b-iy 


Arg 


Leu 


Arg 


bo 










/ u 










/ o 










o u 


Ser 


Leu 


Met 


Met 


Pro 




Ser 


Pro 


Ser 


Glu 


Ala 


Gly 


Leu 


Ser 


Arg 


Ser 










85 










90 










95 




Leu 


Met 


Asn 


Arg 


Asp 


Thr 


Tyr 


Arg 


Arg 


Tyr 


Asn 


Met 


Asn 


Gly 


Val 


Ala 








100 










105 










110 






Glu 


Val 


Leu 


Leu 


Ala 


Leu 


Glu 


Arg 


He 


Glu 


Gin 


Asp 


Glu 


Glu 


Leu 


Thr 






115 










120 










125 








Tyr 


Glu 


Gin 


Leu 


Ala 


Val 


Leu 


Glu 


Thr 


Asn 


Leu 


Phe 


Leu 


Asn 


Gly 


Met 




130 










135 










140 










Ser 


Ser 


Phe 


His 


Asp 


Gin 


His 


Arg 


Asp 


Met 


Arg 


Leu 


Asp 


He 


Asp 


Asn 


145 










150 










155 










160 


Met 


Ser 


Tyr 


Glu 


Glu 


Leu 


Leu 


Ala 


Leu 


Glu 


Glu 


Lys 


Met 


Gly 


Thr 


Val 










165 










170 










175 




Ser 


Thr 


Ala 


Leu 


Ser 


Glu 


Glu 


Ala 


Leu 


Leu 


Lys 


Ser 


Leu 


Lys 


Ser 


Ser 








180 










185 










190 






He 


Tyr 


Arg 


Pro 


Asn 


Asp 


Glu 


Ser 


Asp 


Asp 


He 


Cys 


Leu 


Asn 


Lys 


Asp 






195 










200 










205 








Asp 


Asp 


Val 


Lys 


Cys 


Ser 


He 


Cys 


Gin 


Glu 


Glu 


Tyr 


Val 


Asp 


Gly 


Asp 




210 










215 










220 










Glu 


Val 


Gly 


Thr 


Leu 


Pro 


c y s 


Gin 


His 


Lys 


Tyr 


His 


Val 


Ser 


Cys 


Ala 


225 










230 










235 










240 


Gin 


Gin 


Trp 


Leu 


Arg 


Met 


Lys 


Asn 


Trp 


Cys 


Pro 


He 


Cys 


Lys 


Thr 


Ser 










245 










250 










255 




Ala 


Glu 


Ser 


Gin 


Pro 


His 


Pro 


Phe 


Ser 






















260 










265 

















(2) INFORMATION FOR SEQ ID NO: 17 9: 
(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..183 
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(D) OTHER INFORMATION: / Ceres Seq. ID 1582017 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 179: 
Met Met Pro Gly Ser Pro Ser Glu Ala Gly Leu Ser Arg Ser Leu Met 
15 10 15 

Asn Arg Asp Thr Tyr Arg Arg Tyr Asn Met Asn Gly Val Ala Glu Val 

20 25 30 

Leu Leu Ala Leu Glu Arg He Glu Gin Asp Glu Glu Leu Thr Tyr Glu 

35 40 45 

Gin Leu Ala Val Leu Glu Thr Asn Leu Phe Leu Asn Gly Met Ser Ser 

50 55 60 

Phe His Asp Gin His Arg Asp Met Arg Leu Asp He Asp Asn Met Ser 
65 70 75 80 

Tyr Glu Glu Leu Leu Ala Leu Glu Glu Lys Met Gly Thr Val Ser Thr 

85 90 95 

Ala Leu Ser Glu Glu Ala Leu Leu Lys Ser Leu Lys Ser Ser He Tyr 

100 105 HO 

Arg Pro Asn Asp Glu Ser Asp Asp He Cys Leu Asn Lys Asp Asp Asp 

115 120 125 

Val Lys Cys Ser He Cys Gin Glu Glu Tyr Val Asp Gly Asp Glu Val 

130 135 140 

Gly Thr Leu Pro Cys Gin His Lys Tyr His Val Ser Cys Ala Gin Gin 
145 150 155 160 

Trp Leu Arg Met Lys Asn Trp Cys Pro He Cys Lys Thr Ser Ala Glu 

165 170 175 

Ser Gin Pro His Pro Phe Ser 
180 

(2) INFORMATION FOR SEQ ID NO: 180: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1413 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..1413 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582040 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 180: 
agagatctaa aagacgaaag tgggttgcga cttgtgaagc atcgggagac gtgaagggaa 
gaaacgagga gggaacacca acgaaacgac cacaaaaaaa aaacacttaa aatccaattt 
caccatctct cgtccacctc cggtcgtctc gccgttctcc tcctcctcca ctatagctgc 
ctgtattcgt cttctctacc acactcgtta tctggatctg aaatattagg tgatacgaag 
ccatgacagt ttcagaagct tatagtccac ctctgttcag tatagccccg atgatgggat 
ggacagacaa tcactacaga actctagcgc gtcttataac aaaacacgca tggctctaca 
ctgaaatgtt agctgctgaa accattgtct atcaagaaga taacctggac agctttttgg 
cattctctcc agaccaacat cccattgttc ttcaaattgg tggaagaaac ttggaaaatt 
tggctaaagc aaccagactt gctaatgctt acgcctatga tgaaattaat tttaactgtg 
gatgtcctag cccaaaagta agtggacgag gttgttttgg tgctcttctt atgcttgacc 
cgaagtttgt tggtgaggct atgtctgtca ttgcagccaa taccaatgca gctgtcactg 
tcaaatgtcg aataggtgtt gatgatcatg attcatataa tgagctttgt gatttcattc 
atatagtttc ttcattatct cctactaagc atttcatcat acattcgcgg aaggcattac 
tgtctggact tagcccgtca gataaccgtc gaattccgcc cttaaaatac gagtttttct 
ttgccctttt gcgtgatttc ccgtacttaa agttcacaat taatggaggc ataaactctg 
tggttgaggc agatgccgca ttaaggtctg gagctcatgg cgttatgctt gggcgtgctg 
tatactacaa tccctggcac attttgggac acgtcgatac tgtaatttac ggatctccaa 
gcagtggaat tacaaggcga caggttcttg aaaaatataa agtctatgga gagtcggttc 
tcgggaaata tggaaaaggc cgaccaaatc ttcgagatat agtgaggcca ttgatcaatt 
tgttccactc ggagtctgga aatggccagt ggaaacgtag aaccgatgct gctctcttgc 
actgcaccac cttacaatca ttcttagacg aagtgttacc agcaataccc gactatgtcc 
tcgattcatc tgctgtcaaa gaggcgactg gacgtgaaga tctatttgca gatgtacaac 
gtttgttacc tcctccttac gaaaaagaat ccttgaaagc attggaaagg atgccaacga 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
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gacctgtgat tcttgatgag gaatgacaaa tct 
(2) INFORMATION FOR SEQ ID NO: 181: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 387 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
<ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..387 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582041 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 181: 



Met 


Thr 


Val 


Ser 


Glu 


Ala 


Tyr 


Ser 


Pro 


Pro 


Leu 


Phe 


Ser 


He 


Ala 


Pro 


1 








5 










10 










15 




Met 


Met 


Gly 


Trp 


Thr 


Asp 


Asn 


His 


Tyr 


Arg 


Thr 


Leu 


Ala 


Arg 


Leu 


He 








20 










25 










30 






Thr 


Lys 


His 


Ala 


Trp 


Leu 


Tyr 


Thr 


Glu 


Met 


Leu 


Ala 


Ala 


Glu 


Thr 


He 






35 










40 










45 








Val 


Tyr 


Gin 


Glu 


Asp 


Asn 


Leu 


Asp 


Ser 


Phe 


Leu 


Ala 


Phe 


Ser 


Pro 


Asp 




50 










O D 










ou 










Gin 


His 


Pro 


-! 1 _ 

lie 


Val 


Leu 


Gin 


He 


Gly 


Gly 


Arg 


Asn 


Leu 


^rlU 


Asn 


Leu 


65 










) U 










/ 0 










80 


Ala 


Lys 


Ala 


Thr 


Arg 


Leu 


Ala 


Asn 


Ala 


Tyr 


Ala 


Tyr 


Asp 


Glu 


He 


Asn 










o c 
o o 










y u 










95 




Phe 


Asn 


Cys 


Gly 


Cys 


Pro 


Ser 


Pro 


Lys 


Val 


Ser 


Gly 


Arg 


Gly 


Cys 


Phe 








1 (JO 










1 n c 
luo 










i i n 
111) 






Gly Ala 


Leu 


Leu 


Met 


Leu 


Asp 


Pro 


Lys 


Phe 


Val 


Gly 


Glu 


Ala 


Met 


Ser 






Tic: 

llo 










ion 










iz 0 








Val 


He 


Ala 


Ala 


Asn 


Thr 


Asn 


Ala 


Ala 


Val 


Thr 


Val 


Lys 


Cys 


Arg 


He 




130 










135 










140 










Gly Val 


Asp 


Asp 


His 


Asp 


Ser 


Tyr 


Asn 


Glu 


Leu 


Cys 


Asp 


Phe 


He 


His 


145 










1 en 
1 O U 










loo 










160 


He 


Val 


Ser 


Ser 


Leu 


Ser 


Pro 


Thr 


Lys 


His 


Pne 


He 


He 


His 


Ser 


Arg 










165 










170 










175 




Lys 


Ala 


Leu 


Leu 


Ser 


Gly 


Leu 


Ser 


Pro 


Ser 


Asp 


Asn 


Arg 


Arg 


He 


Pro 








18 0 










i q c 

loo 










inn 

i y u 






Pro 


Leu 


Lys 


Tyr 


Glu 


Phe 


Phe 


Phe 


Ala 


Leu 


Leu 


Arg 


Asp 


Phe 


Pro 


Tyr 






195 










200 










205 








Leu 


Lys 


Phe 


Thr 


He 


Asn 


Gly 


Gly 


lie 


Asn 


Ser 


Val 


Val 


Glu 


Ala Asp 




210 










215 










220 










Ala 


Ala 


Leu 


Arg 


Ser 


Gly 


Ala 


His 


Gly 


Val 


Met 


Leu 


Gly 


Arg 


Ala 


Val 


225 










230 










235 










240 


Tyr 


Tyr 


Asn 


Pro 


Trp 


His 


He 


Leu 


Gly 


His 


Val 


Asp 


Thr 


Val 


He 


Tyr 










245 










250 










255 




Gly Ser 


Pro 


Ser 


Ser 


Gly 


He 


Thr 


Arg 


Arg 


Gin 


Val 


Leu 


Glu 


Lys 


Tyr 








260 










265 










270 






Lys 


Val 


Tyr 


Gly 


Glu 


Ser 


Val 


Leu 


Gly 


Lys 


Tyr 


Gly 


Lys 


Gly 


Arg 


Pro 






275 










280 










285 








Asn 


Leu 


Arg 


Asp 


He 


Val 


Arg 


Pro 


Leu 


He 


Asn 


Leu 


Phe 


His 


Ser 


Glu 




290 










295 










300 










Ser 


Gly 


Asn 


Gly 


Gin 


Trp 


Lys 


Arg 


Arg 


Thr 


Asp 


Ala 


Ala 


Leu 


Leu 


His 


305 










310 










315 










320 


Cys 


Thr 


Thr 


Leu 


Gin 


Ser 


Phe 


Leu 


Asp 


Glu 


Val 


Leu 


Pro 


Ala 


He 


Pro 










325 










330 










335 




Asp 


Tyr 


Val 


Leu 


Asp 


Ser 


Ser 


Ala 


Val 


Lys 


Glu 


Ala 


Thr 


Gly 


Arg 


Glu 








340 










345 










350 






Asp 


Leu 


Phe 


Ala 


Asp 


Val 


Gin 


Arg 


Leu 


Leu 


Pro 


Pro 


Pro 


Tyr 


Glu 


Lys 






355 










360 










365 








Glu 


Ser 


Leu 


Lys 


Ala 


Leu 


Glu 


Arg 


Met 


Pro 


Thr 


Arg 


Pro 


Val 


He 


Leu 




370 










375 










380 
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Asp Glu Glu 
385 

(2) INFORMATION FOR SEQ ID NO: 182: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 371 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..371 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582042 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 182: 
Met Met Gly Trp Thr Asp Asn His Tyr Arg Thr Leu Ala Arg Leu lie 
15 10 15 

Thr Lys His Ala Trp Leu Tyr Thr Glu Met Leu Ala Ala Glu Thr He 

20 25 30 

Val Tyr Gin Glu Asp Asn Leu Asp Ser Phe Leu Ala Phe Ser Pro Asp 

35 40 45 

Gin His Pro He Val Leu Gin He Gly Gly Arg Asn Leu Glu Asn Leu 

50 55 60 

Ala Lys Ala Thr Arg Leu Ala Asn Ala Tyr Ala Tyr Asp Glu He Asn 
65 70 75 80 

Phe Asn Cys Gly Cys Pro Ser Pro Lys Val Ser Gly Arg Gly Cys Phe 

85 90 95 

Gly Ala Leu Leu Met Leu Asp Pro Lys Phe Val Gly Glu Ala Met Ser 

100 105 HO 

Val He Ala Ala Asn Thr Asn Ala Ala Val Thr Val Lys Cys Arg He 

115 120 125 

Gly Val Asp Asp His Asp Ser Tyr Asn Glu Leu Cys Asp Phe He His 

130 135 140 

He Val Ser Ser Leu Ser Pro Thr Lys His Phe He He His Ser Arg 
145 150 155 160 

Lys Ala Leu Leu Ser Gly Leu Ser Pro Ser Asp Asn Arg Arg He Pro 

165 170 175 

Pro Leu Lys Tyr Glu Phe Phe Phe Ala Leu Leu Arg Asp Phe Pro Tyr 

180 185 190 

Leu Lys Phe Thr He Asn Gly Gly He Asn Ser Val Val Glu Ala Asp 

195 200 205 

Ala Ala Leu Arg Ser Gly Ala His Gly Val Met Leu Gly Arg Ala Val 

210 215 220 

Tyr Tyr Asn Pro Trp His He Leu Gly His Val Asp Thr Val He Tyr 
225 230 235 240 

Gly Ser Pro Ser Ser Gly He Thr Arg Arg Gin Val Leu Glu Lys Tyr 

245 250 255 

Lys Val Tyr Gly Glu Ser Val Leu Gly Lys Tyr Gly Lys Gly Arg Pro 

260 265 270 

Asn Leu Arg Asp He Val Arg Pro Leu He Asn Leu Phe His Ser Glu 

275 280 285 

Ser Gly Asn Gly Gin Trp Lys Arg Arg Thr Asp Ala Ala Leu Leu His 

290 295 300 

Cys Thr Thr Leu Gin Ser Phe Leu Asp Glu Val Leu Pro Ala He Pro 
305 310 315 320 

Asp Tyr Val Leu Asp Ser Ser Ala Val Lys Glu Ala Thr Gly Arg Glu 

325 330 335 

Asp Leu Phe Ala Asp Val Gin Arg Leu Leu Pro Pro Pro Tyr Glu Lys 

340 345 350 

Glu Ser Leu Lys Ala Leu Glu Arg Met Pro Thr Arg Pro Val He Leu 

355 360 365 

Asp Glu Glu 
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370 

(2) INFORMATION FOR SEQ ID NO: 183: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

( D ) TOPOLOGY ; linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/ KEY : peptide 

(B) LOCATION: 1..370 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582043 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 183: 
Met Gly Trp Thr Asp Asn His Tyr Arg Thr Leu Ala Arg Leu lie Thr 
15 10 15 

Lys His Ala Trp Leu Tyr Thr Glu Met Leu Ala Ala Glu Thr He Val 

20 25 30 

Tyr Gin Glu Asp Asn Leu Asp Ser Phe Leu Ala Phe Ser Pro Asp Gin 

35 40 45 

His Pro He Val Leu Gin He Gly Gly Arg Asn Leu Glu Asn Leu Ala 

50 55 60 

Lvs Ala Thr Arg Leu Ala Asn Ala Tyr Ala Tyr Asp Glu He Asn Phe 
65 70 75 80 

Asn Cys Gly Cys Pro Ser Pro Lys Val Ser Gly Arg Gly Cys Phe Gly 

85 90 95 

Ala Leu Leu Met Leu Asp Pro Lys Phe Val Gly Glu Ala Met Ser Val 

100 105 HO 

He Ala Ala Asn Thr Asn Ala Ala Val Thr Val Lys Cys Arg He Gly 

115 120 125 

Val Asp Asp His Asp Ser Tyr Asn Glu Leu Cys Asp Phe He His He 

130 135 140 

Val Ser Ser Leu Ser Pro Thr Lys His Phe He He His Ser Arg Lys 
145 150 155 160 

Ala Leu Leu Ser Gly Leu Ser Pro Ser Asp Asn Arg Arg He Pro Pro 

165 170 175 

Leu Lys Tyr Glu Phe Phe Phe Ala Leu Leu Arg Asp Phe Pro Tyr Leu 

180 185 190 

Lys Phe Thr He Asn Gly Gly He Asn Ser Val Val Glu Ala Asp Ala 

195 200 205 

Ala Leu Arg Ser Gly Ala His Gly Val Met Leu Gly Arg Ala Val Tyr 

210 215 220 

Tvr Asn Pro Trp His He Leu Gly His Val Asp Thr Val He Tyr Gly 
225 230 235 240 

Ser Pro Ser Ser Gly He Thr Arg Arg Gin Val Leu Glu Lys Tyr Lys 

245 250 255 

Val Tyr Gly Glu Ser Val Leu Gly Lys Tyr Gly Lys Gly Arg Pro Asn 

260 265 270 

Leu Arg Asp He Val Arg Pro Leu He Asn Leu Phe His Ser Glu Ser 

275 280 285 

Gly Asn Gly Gin Trp Lys Arg Arg Thr Asp Ala Ala Leu Leu His Cys 

290 295 300 

Thr Thr Leu Gin Ser Phe Leu Asp Glu Val Leu Pro Ala lie Pro Asp 
305 310 315 320 

Tyr Val Leu Asp Ser Ser Ala Val Lys Glu Ala Thr Gly Arg Glu Asp 

325 330 335 

Leu Phe Ala Asp Val Gin Arg Leu Leu Pro Pro Pro Tyr Glu Lys Glu 

340 345 350 

Ser Leu Lys Ala Leu Glu Arg Met Pro Thr Arg Pro Val He Leu Asp 
355 360 365 

Glu Glu 
370 
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(2) INFORMATION FOR SEQ ID NO: 184: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1423 base pairs 

(B) TYPE: nucleic acicl 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..1423 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582064 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 184: 



gatgattatt tcaacgtttt tagaggaaca cagtcaaaac caatacgagc agacacagct 60 

ccgtacgatc acaaggagct gaaggattca ctgatttctt cttcatagag aaaaaggtgg 120 

aattcgatgg tggatgcgag tcaagtagct gagctgagaa gattcgtcga gcagttgaaa 180 

ttgaatcctt cgattctcca cgatcctagt ctggttttct tcaaagaata tctccgaagt 240 

ttaggagctc aagttccgaa gattgagaaa actgaaagag attatgaaga taaggctgag 300 

acaaaaccca gtttctctcc taaacatgat gatgatgatg atattatgga gtctgatgtt 360 

gaactcgaca actctgatgt agttgaacca gacaatgagc ctcctcagcc gatgggagac 420 

cctactgctg aagtgaccga tgaaaatagg gatgatgctc agtcggagaa gagcacagct 480 

atggaggcaa tctctaatgg gaggtttgac gaagctatag agcatctaac aaaagctgtc 540 

atgctaaatc ctacttcagc gattctctac gccactagag ctagtgtgtt tttaaaagtt 600 

aagaagccta atgctgcaac tcgtgatgcc aacgtggcgt tacagttcaa ctctgattca 660 

gctaaggggt acaaatcacg aggtatggct aaggccatgt taggccaatg ggaagaggct 720 

gcagctgatc tacatgtcgc atcaaaatta gattacgatg aggagatagg gacaatgctt 780 

aagaaggttg aacccaatgc aaagagaatc gaagaacacc gcaggaaata tcaacgcctg 840 

aggaaagaaa aggagctcca aagggctgaa cgcgagagac ggaaacagca agaagcccag 900 

gagcgagaag ctcaggctgc actcaatgac ggagaagtga ttagcatcca ctcaacaagc 960 

gagctagaag caaagacaaa ggctgctaaa aaggcatcac gtctgctcat cctttacttc 1020 

acagccacat ggtgtgggcc gtgccgatat atgtctcctc tgtactcaaa cctggccaca 1080 

cagcacccga gagtcgtttt cttgaaagtt gatatagaca aggccaacga cgtggctgct 1140 

tcttggaaca ttagcagcgt cccgaccttt tgcttcatca gagatggcaa agaggtggac 1200 

aaagttgtcg gagctgataa aggttcgctt gagcagaaga ttgcacagca ctcttcttct 1260 

aagtaatgtt ttatctcttc tctcattaca ctcctctcgc ttgtactgta tctttgcaac 1320 

acaatgattg tcccgtaatt taaaagtgta gcataacttt taaagtactt gtaattggtt 1380 



agatgacaga tcagatcagt tgaataagaa gagttaagag age 
(2) INFORMATION FOR SEQ TD NO: 185: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 37 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: l.,379 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582065 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 185: 



Met 


Val 


Asp 


Ala 


Ser 


Gin 


Val 


Ala 


Glu 


Leu 


Arg Arg 


Phe 


Val 


Glu 


Gin 


1 








5 










10 










15 




Leu 


Lys 


Leu 


Asn 


Pro 


Ser 


He 


Leu 


His 


Asp 


Pro 


Ser 


Leu 


Val 


Phe 


Phe 








20 










25 










30 






Lys 


Glu 


Tyr 


Leu 


Arg 


Ser 


Leu 


Gly 


Ala 


Gin 


Val 


Pro 


Lys 


He 


Glu 


Lys 






35 










40 










45 








Thr 


Glu 


Arg 


Asp 


Tyr 


Glu 


Asp 


Lys 


Ala 


Glu 


Thr 


Lys 


Pro 


Ser 


Phe 


Ser 




50 










55 










60 










Pro 


Lys 


His 


Asp 


Asp 


Asp 


Asp 


Asp 


He 


Met 


Glu 


Ser 


Asp 


Val 


Glu 


Leu 


65 










70 










75 










80 


Asp 


Asn 


Ser 


Asp 


Val 


Val 


Glu 


Pro 


Asp 


Asn 


Glu 


Pro 


Pro 


Gin 


Pro 


Met 










85 










90 










95 




Gly 


Asp 


Pro 


Thr 


Ala 


Glu 


Val 


Thr 


Asp 


Glu 


Asn 


Arg 


Asp 


Asp 


Ala 


Gin 
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100 105 HO 

Ser Glu Lys Ser Thr Ala Met Glu Ala lie Ser Asn Gly Arg Phe Asp 

115 120 125 

Glu Ala He Glu His Leu Thr Lys Ala Val Met Leu Asn Pro Thr Ser 

130 135 140 

Ala He Leu Tyr Ala Thr Arg Ala Ser Val Phe Leu Lys Val Lys Lys 
145 150 155 160 

Pro Asn Ala Ala Thr Arg Asp Ala Asn Val Ala Leu Gin Phe Asn Ser 

165 170 175 

Asp Ser Ala Lys Gly Tyr Lys Ser Arg Gly Met Ala Lys Ala Met Leu 

180 185 190 

Gly Gin Trp Glu Glu Ala Ala Ala Asp Leu His Val Ala Ser Lys Leu 

195 200 205 

Asp Tyr Asp Glu Glu He Gly Thr Met Leu Lys Lys Val Glu Pro Asn 

210 215 220 

Ala Lys Arg He Glu Glu His Arg Arg Lys Tyr Gin Arg Leu Arg Lys 
225 230 235 240 

Glu Lys Glu Leu Gin Arg Ala Glu Arg Glu Arg Arg Lys Gin Gin Glu 

245 250 255 

Ala Gin Glu Arg Glu Ala Gin Ala Ala Leu Asn Asp Gly Glu Val He 

260 265 270 

Ser He His Ser Thr Ser Glu Leu Glu Ala Lys Thr Lys Ala Ala Lys 

275 280 285 

Lys Ala Ser Arg Leu Leu He Leu Tyr Phe Thr Ala Thr Trp Cys Gly 

290 295 300 

Pro Cys Arg Tyr Met Ser Pro Leu Tyr Ser Asn Leu Ala Thr Gin His 
305 310 315 320 

Pro Arg Val Val Phe Leu Lys Val Asp He Asp Lys Ala Asn Asp Val 

325 330 335 

Ala Ala Ser Trp Asn He Ser Ser Val Pro Thr Phe Cys Phe He Arg 

340 345 350 

Asp Gly Lys Glu Val Asp Lys Val Val Gly Ala Asp Lys Gly Ser Leu 

355 360 365 

Glu Gin Lys He Ala Gin His Ser Ser Ser Lys 

370 375 
(2) INFORMATION FOR SEQ ID NO: 186: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..306 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582066 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 186: 
Met Glu Ser Asp Val Glu Leu Asp Asn Ser Asp Val Val Glu Pro Asp 
1 5 10 15 

Asn Glu Pro Pro Gin Pro Met Gly Asp Pro Thr Ala Glu Val Thr Asp 

20 25 30 

Glu Asn Arg Asp Asp Ala Gin Ser Glu Lys Ser Thr Ala Met Glu Ala 

35 40 45 

He Ser Asn Gly Arg Phe Asp Glu Ala He Glu His Leu Thr Lys Ala 

50 55 60 

Val Met Leu Asn Pro Thr Ser Ala He Leu Tyr Ala Thr Arg Ala Ser 
65 70 75 80 

Val Phe Leu Lys Val Lys Lys Pro Asn Ala Ala Thr Arg Asp Ala Asn 

85 90 95 

Val Ala Leu Gin Phe Asn Ser Asp Ser Ala Lys Gly Tyr Lys Ser Arg 
100 105 HO 
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Gly Met Ala Lys Ala Met Leu Gly Gin Trp Glu Glu Ala Ala Ala Asp 

115 120 125 

Leu His Val Ala Ser Lys Leu Asp Tyr Asp Glu Glu He Gly Thr Met 

130 135 140 

Leu Lys Lys Val Glu Pro Asn Ala Lys Arg He Glu Glu His Arg Arg 
145 150 155 160 

Lys Tyr Gin Arg Leu Arg Lys Glu Lys Glu Leu Gin Arg Ala Glu Arg 

165 170 175 

Glu Arg Arg Lys Gin Gin Glu Ala Gin Glu Arg Glu Ala Gin Ala Ala 

180 185 190 

Leu Asn Asp Gly Glu Val He Ser He His Ser Thr Ser Glu Leu Glu 

195 200 205 

Ala Lys Thr Lys Ala Ala Lys Lys Ala Ser Arg Leu Leu He Leu Tyr 

210 215 220 

Phe Thr Ala Thr Trp Cys Gly Pro Cys Arg Tyr Met Ser Pro Leu Tyr 
225 230 235 240 

Ser Asn Leu Ala Thr Gin His Pro Arg Val Val Phe Leu Lys Val Asp 

245 250 255 

He Asp Lys Ala Asn Asp Val Ala Ala Ser Trp Asn He Ser Ser Val 

260 265 270 

Pro Thr Phe Cys Phe He Arg Asp Gly Lys Glu Val Asp Lys Val Val 

275 280 285 

Gly Ala Asp Lys Gly Ser Leu Glu Gin Lys He Ala Gin His Ser Ser 

290 295 300 

Ser Lys 
305 

(2) INFORMATION FOR SEQ ID NO: 187: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1.-284 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582067 
{xi} SEQUENCE DESCRIPTION: SEQ ID NO: 187: 
Met Gly Asp Pro Thr Ala Glu Val Thr Asp Glu Asn Arg Asp Asp Ala 
15 10 15 

Gin Ser Glu Lys Ser Thr Ala Met Glu Ala He Ser Asn Gly Arg Phe 

20 25 30 

Asp Glu Ala He Glu His Leu Thr Lys Ala Val Met Leu Asn Pro Thr 

35 40 45 

Ser Ala He Leu Tyr Ala Thr Arg Ala Ser Val Phe Leu Lys Val Lys 

50 55 60 

Lvs Pro Asn Ala Ala Thr Arg Asp Ala Asn Val Ala Leu Gin Phe Asn 
65 70 75 80 

Ser Asp Ser Ala Lys Gly Tyr Lys Ser Arg Gly Met Ala Lys Ala Met 

85 90 95 

Leu Gly Gin Trp Glu Glu Ala Ala Ala Asp Leu His Val Ala Ser Lys 

100 105 HO 

Leu Asp Tyr Asp Glu Glu He Gly Thr Met Leu Lys Lys Val Glu Pro 

115 120 125 

Asn Ala Lys Arg He Glu Glu His Arg Arg Lys Tyr Gin Arg Leu Arg 

130 135 140 

Lys Glu Lys Glu Leu Gin Arg Ala Glu Arg Glu Arg Arg Lys Gin Gin 
145 150 155 160 

Glu Ala Gin Glu Arg Glu Ala Gin Ala Ala Leu Asn Asp Gly Glu Val 

165 170 175 

He Ser He His Ser Thr Ser Glu Leu Glu Ala Lys Thr Lys Ala Ala 
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180 










185 










190 






Lys 


Lys 


Ala 


Ser 


Arg 


Leu 


Leu 


He 


Leu 


Tyr 


Phe 


Thr 


Ala 


Thr 


Trp 


Cys 


195 










200 










205 








Gly 


Pro 


Cys 


Arg 


Tyr 


Met 


Ser 


Pro 


Leu 


Tyr 


Ser 


Asn 


Leu 


Ala 


Thr 


Gin 


210 










215 










220 










His 


Pro 


Arg 


Val 


Val 


Phe 


Leu 


Lys 


Val 


Asp 


He 


Asp 


Lys 


Ala 


Asn 


Asp 


225 








230 










235 










240 


Val 


Ala 


Ala 


Ser 


Trp 
245 


Asn 


He 


Ser 


Ser 


Val 
250 


Pro 


Thr 


Phe 


Cys 


Phe 
255 


He 


Arg 


Asp 


Gly 


Lys 


Glu 


Val 


Asp 


Lys 


Val 


Val 


Gly 


Ala 


Asp 


Lys 


Gly 


Ser 




260 










265 










270 






Leu 


Glu 


Gin 
275 


Lys 


He 


Ala 


Gin 


His 
280 


Ser 


Ser 


Ser 


Lys 










(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:l 


38: 

















(i) 



(ii) 
(ix) 



(xi) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 425 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..425 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582076 
SEQUENCE DESCRIPTION: SEQ ID NO: 188: 
tgttgcggtt aaacggctgg aaattacatc gaaccaaggt gctaaagagt tcgatacaga 
gctcgagatg ctttcaaagc ttcgacatgt acacctcgtc tctctaatcg gatattgcga 
tgacgacaac gagatggtac ttgtctatga rtatatgcca catcgntaca cttaaagatc 
atcttttcag gagagacaag gcctctgatc ctccattgtc atggaaacga aggctagaga 
tttgcattgg agcagctcgt ggattacagt atcttcatac tggagccaag tacacgatca 
ttcatagaga catcaaaacc acaaacatac ttctcgatga gaacttcgtc gctaaagtat 
ctgactttgg tttatcaaga gttggtccta ctagtgcttc tcaaacgcat gtctccaccg 
tcgtt 

(2) INFORMATION FOR SEQ ID NO: 18 9: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
peptide 



60 
120 
180 
240 
300 
360 
420 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 
1. .57 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582077 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 189: 
Val Ala Val Lys Arg Leu Glu He Thr Ser Asn Gin Gly Ala Lys Glu 
15 10 15 

Phe Asp Thr Glu Leu Glu Met Leu Ser Lys Leu Arg His Val His Leu 

20 25 30 

Val Ser Leu He Gly Tyr Cys Asp Asp Asp Asn Glu Met Val Leu Val 

35 40 45 

Tyr Xaa Tyr Met Pro His Xaa Tyr Thr 

50 55 
(2) INFORMATION FOR SEQ ID NO: 190: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 102 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 
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(A) NAME /KEY: peptide 

(B) LOCATION: 1..102 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582078 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 190 : 



LYie l. 


i nr 


T V-i -v 

i nr 


Thr 


Arg 


Trp 


Tyr 


Leu 




M.o-h 


Add 


He 


Cys 


"H -S o 

nl S 


Tip 

j. ± e 


Add 


± 








0 










1 u 










± o 




Thr 


Leu 


Lys 


Asp 


His 


Leu 


Phe 


Arg 


Arg 


Asp 


Lys 


Ala 


Ser 


Asp 


Pro 


Pro 








20 










25 










30 






Leu 


Ser 


Trp 


Lys 


Arg 


Arg 


Leu 


Glu 


He 


Cys 


He 


Gly Ala 


Ala 


Arg 


Gly 






35 










40 










45 








Leu 


Gin 


Tyr 


Leu 


His 


Thr 


Gly 


Ala 


Lys 


Tyr 


Thr 


He 


He 


His 


Arg 


Asp 




50 










55 










60 










He 


Lys 


Thr 


Thr 


Asn 


He 


Leu 


Leu 


Asp 


Glu 


Asn 


Phe 


Val 


Ala 


Lys 


Val 


65 










70 










75 










80 


Ser 


Asp 


Phe 


Gly 


Leu 


Ser 


Arg 


Val 


Gly 


Pro 


Thr 


Ser 


Ala 


Ser 


Gin 


Thr 










85 










90 










95 




His 


Val 


Ser 


Thr 


Val 


Val 























100 

(2) INFORMATION FOR SEQ ID NO: 191: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 93 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
{ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1 . . 93 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582079 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 191: 










Met 


Xaa 


He 


Cys 


His 


He 


Xaa 


Thr 


Leu 


Lys 


Asp 


His 


Leu 


Phe 


Arg Arg 


1 








5 










10 










15 




Asp 


Lys 


Ala 


Ser 


Asp 


Pro 


Pro 


Leu 


Ser 


Trp 


Lys 


Arg 


Arg 


Leu 


Glu 


He 








20 










25 










30 






Cys 


He 


Gly 


Ala 


Ala 


Arg 


Gly 


Leu 


Gin 


Tyr 


Leu 


His 


Thr 


Gly Ala 


Lys 






35 










40 










45 








Tyr 


Thr 


He 


He 


His 


Arg 


Asp 


He 


Lys 


Thr 


Thr 


Asn 


He 


Leu 


Leu 


Asp 




50 










55 










60 










Glu 


Asn 


Phe 


Val 


Ala 


Lys 


Val 


Ser 


Asp 


Phe 


Gly 


Leu 


Ser 


Arg 


Val 


Gly 


65 










70 










75 










80 


Pro 


Thr 


Ser 


Ala 


Ser 


Gin 


Thr 


His 


Val 


Ser 
90 


Thr 


Val 


Val 








(2) 


INFORMATION 


85 
FOR 


SEQ 


ID i 


NfO: 192 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1444 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..1444 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582098 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 192: 



aatcttaaca caagcttgat cgtcctcatc tgccaccaaa ccaaagacat aatttcttcg 60 

gtgacttcca tggctccgcC tgtctctgat gattccctac agcctcgaga tgtttgtgtt 120 

gtgggagtgg cgcgaacgcc tataggggac ttccttggct ccctctcgtc tttaactgct 180 

acaagacttg ggtccatagc catccaagcc gcacttaaga gagcacatgt tgatccggcc 24 0 

cttgtggaag aggtcttctt tggcaatgtc ttaactgcca atcttgggca agcaccagca 300 

agacaggctg cacttggtgc tgggattccc tattctgtga tctgcaccac tatcaacaaa 360 

gtttgtgctg caggaatgaa atctgtaatg ctagcgtctc aaagtatcca gctcggcttg 420 
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aatgatattg ttgttgctgg tgggatggag agcatgtcaa acgtacctaa gtacctcccg 
gacgcaagaa ggggttctcg attaggtcat gatactgttg ttgatggtat gatgaaagat 
ggactttggg atgtctataa tgactttgga atgggagttt gtggagaaat atgcgctgac 
cagtaccgta ttacaagaga agaacaggat gcttatgcta tccagagctt tgagcgtggt 
attgctgcgc aaaacactca gttgttcgct tgggaaattg ttccggtcga agtttccact 
ggaagaggga ggccttcagt tgttattgac aaggatgaag gattggggaa gtttgatgca 
gccaagttaa aaaagcttag accaagtttc aaggaggatg ggggatcagt cactgctgga 
aatgcatcaa gcataagtga tggtgcggca gcgttagtgc tagtgagtgg agagaaggct 
cttgagcttg gactgcatgt tatcgctaag attagaggat acgctgatgc tgctcaggca 
ccagagttat tcacaaccac gccagctctt gctattccta aagctataaa gcgggctggt 
ttggatgcat ctcaagtgga ttattatgaa ataaacgaag cattttctgt ggtagctcta 
gccaatcaga aactactggg attagatcct gaacggctca atgcgcatgg aggggctgtt 
tcactgggac atccattggg ctgtagcggt gctcgtatct tggtcacatt attgggggtg 
ttgagagcaa agaagggaaa gtatggagtg gcatcaatat gcaacggagg aggaggagca 
tcagcacttg tccttgagtt catgtcggag aagacaatcg gatattcggc actctgaagc 
taatagctgt tgttttataa cgcaaagctt atgtatatgg ttcatgtgtg tgtcacaaga 
ctgtaacttt gtacttgaga gagatgcaat aaaagtgtgt tagaaaatga gcggtagatt 
tccc 

(2) INFORMATION FOR SEQ ID NO: 193: 



(i) 



(ii) 
(ix) 



(xi 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 38 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 
1. .438 

(D) OTHER INFORMATION: / Ceres Seq, 
SEQUENCE DESCRIPTION: SEQ ID NO: 193: 



ID 1582099 



Asn Leu Asn Thr Ser Leu He Val Leu He Cys His Gin Thr Lys Asp 

15 



10 



He He Ser Ser Val Thr Ser Met Ala Pro Pro Val Ser Asp Asp Ser 



20 



25 



30 



Leu Gin Pro Arg Asp Val Cys Val Val Gly Val Ala Arg Thr Pro He 



35 



40 



45 



Gly Asp Phe Leu Gly Ser Leu Ser Ser Leu Thr Ala Thr Arg Leu Gly 



50 



55 



60 



Ser He Ala He Gin Ala Ala Leu Lys Arg Ala His Val Asp Pro Ala 



65 

Leu Val Glu Glu Val 
35 



70 75 80 

Phe Phe Gly Asn Val Leu Thr Ala Asn Leu Gly 
90 95 



Gin Ala Pro Ala Arg Gin Ala Ala Leu Gly Ala Gly He Pro Tyr Ser 



100 



105 



110 



Val He Cys Thr Thr He Asn Lys Val Cys Ala Ala Gly Met Lys Ser 



115 



120 



125 



Val Met Leu Ala Ser Gin Ser He Gin Leu Gly Leu Asn Asp He Val 



130 



135 



140 



Val Ala Gly Gly Met Glu Ser Met Ser Asn Val Pro Lys Tyr Leu Pro 

,- r- 160 



145 



150 



155 



Asp Ala Arg Arg Gly Ser Arg Leu Gly His Asp Thr Val Val Asp Gly 



165 



170 



175 



Met Met Lys Asp Gly Leu Trp Asp Val Tyr Asn Asp Phe Gly Met Gly 

190 



180 



185 



Val Cys Gly Glu He Cys Ala Asp Gin Tyr Arg He Thr Arg Glu Glu 



195 



200 



205 



Gin Asp Ala Tyr Ala He Gin Ser Phe Glu Arg Gly He Ala Ala Gin 



210 



215 



220 



Asn Thr Gin Leu Phe Ala Trp Glu He Val Pro Val Glu Val Ser Thr 

— - 240 



225 



230 



235 



Gly Arg Gly Arg Pro Ser Val Val He Asp Lys Asp Glu Gly Leu Gly 



480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
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245 










250 










255 




Lys 


Phe 


Asp 


Ala 


Ala 


Lys 


Leu 


Lys 


Lys 


Leu 


Arg 


Pro 


Ser 


Phe 


Lys 


Glu 






260 










265 










270 






Asp 


Gly 


Gly 


Ser 


Val 


Thr 


Ala 


Gly 


Asn 


Ala 


Ser 


Ser 


He 


Ser 


Asp 


Gly 




275 










280 










285 








Ala 


Ala 
290 


Ala 


Leu 


Val 


Leu 


Val 
295 


Ser 


Gly 


Glu 


Lys 


Ala 

300 


Leu 


Glu 


Leu 


Gly 


Leu 


His 


Val 


He 


Ala 


Lys 


He 


Arg 


Gly 


Tyr 


Ala 


Asp 


Ala 


Ala 


Gin 


Ala 


305 










310 










315 










320 


Pro 


Glu 


Leu 


Phe 


Thr 
325 


Thr 


Thr 


Pro 


Ala 


Leu 

330 


Ala 


He 


Pro 


Lys 


Ala 
335 


He 


Lys 


Arg 


Ala 


Gly 


Leu 


Asp 


Ala 


Ser 


Gin 


Val 


Asp 


Tyr 


Tyr 


Glu 


He 


Asn 






340 










345 










350 






Glu 


Ala 


Phe 
355 


Ser 


Val 


Val 


Ala 


Leu 
360 


Ala 


Asn 


Gin 


Lys 


Leu 
365 


Leu 


Gly 


Leu 


Asp 


Pro 


Glu 


Arg 


Leu 


Asn 


Ala 


His 


Gly 


Gly 


Ala 


Val 


Ser 


Leu 


Gly 


His 


370 










375 










380 










Pro 


Leu 


Gly 


Cys 


Ser 


Gly 


Ala 


Arg 


He 


Leu 


Val 


Thr 


Leu 


Leu 


Gly 


Val 


385 








390 










395 










400 


Leu 


Arg 


Ala 


Lys 


Lys 


Gly 


Lys 


Tyr 


Gly 


Val 


Ala 


Ser 


He 


Cys 


Asn 


Gly 








405 










410 










415 




Gly 


Gly 


Gly 


Ala 


Ser 


Ala 


Leu 


Val 


Leu 


Glu 


Phe 


Met 


Ser 


Glu 


Lys 


Thr 


420 










425 










430 






He 


Gly 


Tyr 
435 


Ser 


Ala 


Leu 






















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO: 194 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 415 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME / KEY : peptide 

(B) LOCATION: 1..415 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582100 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 194: 



Met 


Ala 


Pro 


Pro 


Val 


Ser 


Asp 


Asp 


Ser 


Leu 


Gin 


Pro 


Arg 


Asp 


Val 


Cys 


1 








5 










10 










15 




Val 


Val 


Gly 


Val 


Ala 


Arg 


Thr 


Pro 


He 


Gly Asp 


Phe 


Leu 


Gly 


Ser 


Leu 






20 










25 










30 






Ser 


Ser 


Leu 


Thr 


Ala 


Thr 


Arg 


Leu 


Gly 


Ser 


He 


Ala 


He 


Gin 


Ala 


Ala 






35 










40 










45 








Leu 


Lys 


Arg 


Ala 


His 


Val 


Asp 


Pro 


Ala 


Leu 


Val 


Glu 


Glu 


Val 


Phe 


Phe 




50 










55 










60 










Gly 


Asn 


Val 


Leu 


Thr 


Ala 


Asn 


Leu 


Gly 


Gin 


Ala 


Pro 


Ala 


Arg 


Gin 


Ala 


65 










70 










75 










80 


Ala 


Leu 


Gly 


Ala 


Gly 


He 


Pro 


Tyr 


Ser 


Val 


He 


Cys 


Thr 


Thr 


He 


Asn 








85 










90 










95 




Lys 


Val 


Cys 


Ala 


Ala 


Gly 


Met 


Lys 


Ser 


Val 


Met 


Leu 


Ala 


Ser 


Gin 


Ser 






100 










105 










110 






He 


Gin 


Leu 


Gly 


Leu 


Asn 


Asp 


He 


Val 


Val 


Ala 


Gly 


Gly Met 


Glu 


Ser 






115 








120 










125 








Met 


Ser 


Asn 


Val 


Pro 


Lys 


Tyr 


Leu 


Pro 


Asp 


Ala 


Arg 


Arg 


Gly 


Ser 


Arg 




130 










135 










140 










Leu 


Gly 


His 


Asp 


Thr 


Val 


Val 


Asp 


Gly 


Met 


Met 


Lys 


Asp 


Gly 


Leu 


Trp 


145 






150 










155 










160 


Asp 


Val 


Tyr 


Asn 


Asp 


Phe 


Gly 


Met 


Gly 


Val 


Cys 


Gly 


Glu 


He 


Cys 


Ala 






165 










170 










175 




Asp 


Gin 


Tyr 


Arg 


He 


Thr 


Arg 


Glu 


Glu 


Gin 


Asp 


Ala 


Tyr 


Ala 


He 


Gin 




180 










185 










190 
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Ser 


Phe 


Glu 
195 


Arg 


Gly 


He 


Ala 


Ala 
200 


Gin 


Asn 


Thr 


Gin 


Leu 
205 


Phe 


Ala 


Trp 


Glu 


He 


Val 


Pro 


Val 


Glu 


Val 


Ser 


Thr 


Gly Arg 


Gly 


Arg 


Pro 


Ser 


Val 




210 










215 










220 










Val 


He 


Asp 


Lys 


Asp 


Glu 


Gly 


Leu 


Gly 


Lys 


Phe 


Asp 


Ala 


Ala 


Lys 


Leu 


225 








230 










235 










240 


Lys 


Lys 


Leu 


Arg 


Pro 


Ser 


Phe 


Lys 


Glu 


Asp 


Gly 


Gly 


Ser 


Val 


Thr 


Ala 






245 










250 










255 




Gly 


Asn 


Ala 


Ser 


Ser 


He 


Ser 


Asp 


Gly 


Ala 


Ala 


Ala 


Leu 


Val 


Leu 


Val 






260 










265 










270 






Ser 


Gly 


Glu 


Lys 


Ala 


Leu 


Glu 


Leu 


Gly 


Leu 


His 


Val 


He 


Ala 


Lys 


He 




275 










280 










285 








Arg 


Gly 


Tyr 


Ala 


Asp 


Ala 


Ala 


Gin 


Ala 


Pro 


Glu 


Leu 


Phe 


Thr 


Thr 


Thr 


290 








295 










300 










Pro 


Ala 


Leu 


Ala 


lie 


Pro 


Lys 


Ala 


He 


Lys 


Arg 


Ala 


Gly 


Leu 


Asp 


Ala 


305 










310 










315 










320 


Ser 


Gin 


Val 


Asp 


Tyr 
325 


Tyr 


Glu 


He 


Asn 


Glu 
330 


Ala 


Phe 


Ser 


Val 


Val 
335 


Ala 


Leu 


Ala 


Asn 


Gin 


Lys 


Leu 


Leu 


Gly 


Leu 


Asp 


Pro 


Glu 


Arg 


Leu 


Asn 


Ala 








340 








345 










350 






His 


Gly 


Gly 


Ala 


Val 


Ser 


Leu 


Gly 


His 


Pro 


Leu 


Gly 


Cys 


Ser 


Gly Ala 




355 










360 










365 








Arg 


He 


Leu 


Val 


Thr 


Leu 


Leu 


Gly 


Val 


Leu 


Arg 


Ala 


Lys 


Lys 


Gly 


Lys 


370 










375 










380 












Gly 


Val 


Ala 


Ser 


He 


Cys 


Asn 


Gly 


Gly 


Gly 


Gly 


Ala 


Ser 


Ala 


Leu 


385 








390 










395 










400 


Val 


Leu 


Glu 


Phe 


Met 


Ser 


Glu 


Lys 


Thr 


He 


Gly 


Tyr 


Ser 


Ala 


Leu 





405 410 415 



(2) INFORMATION FOR SEQ ID NO: 195: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 313 amino acids 
{ B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..313 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582101 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 195: 



Met 


Lys 


Ser 


Val 


Met 


Leu 


Ala 


Ser 


Gin 


Ser 


He 


Gin 


Leu 


Gly 


Leu 


Asn 


1 






5 










10 










15 




Asp 


He 


Val 


Val 


Ala 


Gly 


Gly 


Met 


Glu 


Ser 


Met 


Ser 


Asn 


Val 


Pro 


Lys 






20 










25 










30 






Tyr 


Leu 


Pro 


Asp 


Ala 


Arg 


Arg 


Gly 


Ser 


Arg 


Leu 


Gly 


His 


Asp 


Thr 


Val 




35 










40 










45 








Val 


Asp 


Gly 


Met 


Met 


Lys 


Asp 


Gly 


Leu 


Trp 


Asp 


Val 


Tyr 


Asn 


Asp 


Phe 




50 










55 










60 










Gly 


Met 


Gly 


Val 


Cys 


Gly 


Glu 


He 


Cys 


Ala 


Asp 


Gin 


Tyr 


Arg 


He 


Thr 


65 








70 










75 










80 


Arg 


Glu 


Glu 


Gin 


Asp 


Ala 


Tyr 


Ala 


He 


Gin 


Ser 


Phe 


Glu 


Arg 


Gly 


He 








85 










90 










95 




Ala 


Ala 


Gin 


Asn 


Thr 


Gin 


Leu 


Phe 


Ala 


Trp 


Glu 


He 


Val 


Pro 


Val 


Glu 








100 










105 










110 






Val 


Ser 


Thr 


Gly 


Arg 


Gly Arg 


Pro 


Ser 


Val 


Val 


He 


Asp 


Lys 


Asp 


Glu 






115 








120 










125 








Gly 


Leu 


Gly 


Lys 


Phe 


Asp 


Ala 


Ala 


Lys 


Leu 


Lys 


Lys 


Leu 


Arg 


Pro 


Ser 


130 










135 










140 










Phe 


Lys 


Glu 


Asp 


Gly 


Gly 


Ser 


Val 


Thr 


Ala 


Gly 


Asn 


Ala 


Ser 


Ser 


He 


145 






150 










155 










160 


Ser 


Asp 


Gly 


Ala 


Ala 


Ala 


Leu 


Val 


Leu 


Val 


Ser 


Gly 


Glu 


Lys 


Ala 


Leu 
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165 170 175 

Glu Leu Gly Leu His Val He Ala Lys He Arg Gly Tyr Ala Asp Ala 

180 185 190 

Ala Gin Ala Pro Glu Leu Phe Thr Thr Thr Pro Ala Leu Ala He Pro 

195 200 205 

Lys Ala He Lys Arg Ala Gly Leu Asp Ala Ser Gin Val Asp Tyr Tyr 

210 215 220 

Glu He Asn Glu Ala Phe Ser Val Val Ala Leu Ala Asn Gin Lys Leu 
225 230 235 240 

Leu Gly Leu Asp Pro Glu Arg Leu Asn Ala His Gly Gly Ala Val Ser 

245 250 255 

Leu Gly His Pro Leu Gly Cys Ser Gly Ala Arg He Leu Val Thr Leu 

260 265 270 

Leu Gly Val Leu Arg Ala Lys Lys Gly Lys Tyr Gly Val Ala Ser He 

275 280 285 

Cys Asn Gly Gly Gly Gly Ala Ser Ala Leu Val Leu Glu Phe Met Ser 

290 295 300 

Glu Lys Thr He Gly Tyr Ser Ala Leu 
305 310 
(2) INFORMATION FOR SEQ ID NO: 196: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..44 6 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582106 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 196: 
aaggcaacaa acataatcaa cttaaatctt atctactttc tatttctttt taatcaaaat 
taccgttctt aactatggcg aagtggtttt tcactatctt cttggttttt gccctagcct 
cagctttagc ttgtggcgca agaaacgtcc cagtaggcct ctctgaccna aagaactacc 
tcggatatgg tggcggatat tccggcgttg gagacaatgg tttacccttt ggtggcgtcg 
gtggaggtgt gtctggtccc ggaggtaatc ttggttatgg gggatttggt ggtgctggtg 
gcggcttagg cggtggtttg ggcggtggag caggcagtgg attaggcggt ggcttaggtg 
gtggaagtgg aattggtgcc ggaaccagtg gaggaagtac cggagagttc atttcccttg 
agttgttact ttgactcttt gtagtt 
(2) INFORMATION FOR SEQ ID NO: 197: 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 119 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 



peptide 



ID 1582107 



(xi) 



peptide 
1. .119 

(D) OTHER INFORMATION: / Ceres Seq. 
SEQUENCE DESCRIPTION: SEQ ID NO: 197: 
Met Ala' Lys Trp Phe Phe Thr He Phe Leu Val Phe Ala Leu Ala Ser 
15 10 15 

Ala Leu Ala Cys Gly Ala Arg Asn Val Pro Val Gly Leu Ser Asp Xaa 

20 25 30 

Lys Asn Tyr Leu Gly Tyr Gly Gly Gly Tyr Ser Gly Val Gly Asp Asn 

35 40 45 

Gly Leu Pro Phe Gly Gly Val Gly Gly Gly Val Ser Gly Pro Gly Gly 

50 55 60 

Asn Leu Gly Tyr Gly Gly Phe Gly Gly Ala Gly Gly Gly Leu Gly Gly 
65 70 75 80 



60 
120 
180 
240 
300 
360 
420 
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Gly Leu Gly Gly Gly Ala Gly Ser Gly Leu Gly Gly Gly Leu Gly Gly 

85 90 95 

Gly Ser Gly lie Gly Ala Gly Thr Ser Gly Gly Ser Thr Gly Glu Phe 

100 105 110 

lie Ser Leu Glu Leu Leu Leu 
115 

(2) INFORMATION FOR SEQ ID NO:198: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 226 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic} 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..226 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582111 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 198: 
attcactgat tattgtttta aggcaaatta agatcatctt ctcagatctc ttccaatttt 60 
ctagaaaaaa catgtcttgc tgtggtggaa gctgtggttg tggatctgcc tgcaagtgcg 120 
gcaatggttg cggaggttgc aaaaggtacc ctgacttgga gaacaccgcc accgagactc 180 
ttgtcctcgg tgttgctccg gcgatgaact ctcagtacga ggcttc 
(2) INFORMATION FOR SEQ ID NO: 199: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 74 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

{D} TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..7 4 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582112 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 199: 



Ser 


Leu 


He 


He 


Val 


Leu 


Arg 


Gin He 


Lys 


He 


He 


Phe 


Ser 


Asp Leu 


1 








5 








10 










15 


Phe 


Gin 


Phe 


Ser 


Arg 


Lys 


Asn 


Met Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys Gly 








20 








25 










30 




Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


Gly Asn 


Gly 


Cys 


Gly 


Gly 


Cys 


Lys Arg 






35 










40 








45 






Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 


Ala Thr 


Glu 


Thr 


Leu 


Val 


Leu 


Gly Val 




50 










55 








60 








Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr Glu 


Ala 












65 










70 


















(2) 


INFORMATION 


FOR 


SEQ 


ID NO:200: 















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..51 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582113 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 200: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 
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35 40 45 

Tyr Glu Ala 
50 

(2) INFORMATION FOR SEQ ID NO: 201: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 470 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..470 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582124 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 201: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 120 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 180 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 240 
agactttcgt tgccgaaaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 300 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgttggtc 
tttaagatat ctctgcaaag ttttatcttt gtgactttat taatcctaag 
(2) INFORMATION FOR SEQ ID NO: 202: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582125 



60 



360 
420 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO 


>: 202 










Phe 


Ser 


Leu 


He 


He 


Val 


Leu 


Arg 


Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


1 








5 










10 










15 




Ser 


Asp 


Leu 


Phe 


Gin 


Phe 


Ser 


Arg 


Lys 


Asn 


Met 


Ser 


Cys 


Cys 


Gly 


Gly 






20 










25 










30 




Gly 


Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


Gly 


Asn 


Gly 


Cys 


Gly 




35 










40 










45 






Val 


cys 


Lys 


Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 


Ala 


Thr 


Glu 


Thr 


Leu 


50 










55 










60 










Leu 


Gly 


Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


65 








70 










75 










80 


Thr 


Phe 


Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


Cys 


Gly 


Ser 


Asp 


Cys 


Lys 








85 










90 










95 




Cys 


Asn 


Pro 


Cys 
100 


Thr 


Cys 


Lys 




















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:203: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582126 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 203: 
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Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 




5 










10 










15 




Gly 


Asn 


Gly 


Cys 


Gly 


Gly 


Cys 


Lys 


Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 






20 










25 










30 






Ala 


Thr 


Glu 
35 


Thr 


Leu 


Val 


Leu 


Gly 
40 


Val 


Ala 


Pro 


Ala 


Met 
45 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe 


Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


50 










55 










60 










Cys 


Gly 


Ser 


Asp 


Cys 


Lys 


Cys 


Asn 


Pro 


Cys 


Thr 


Cys 


Lys 








65 










70 










75 












(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:204 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 483 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..483 

<D) OTHER INFORMATION: / Ceres Seq. ID 1582179 
SEQUENCE DESCRIPTION: SEQ ID NO: 204: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgttggtc 
tttctcttct tcttcttctt ctgtgtgtgt ttttatggtt tggtgggttc ttgagaacaa 
acg 

INFORMATION FOR SEQ ID NO: 205: 



(ii) 
<ix) 



<xi) 



60 
120 
180 
240 
300 
360 
420 
480 



(2) 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 

peptide 
1. .103 



(D) OTHER INFORMATION: / Ceres Seq. ID 1582180 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 


:205 












Ser 


Leu 


lie 


lie 


Val 


Leu 


Arg 


Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


Phe 


1 








5 










10 










15 




Ser 


Asp 


Leu 


Phe 


Gin 


Phe 


Ser 


Arg 


Lys 


Asn 


Met 


Ser 


Cys 


Cys 


Gly 


Gly 






20 










25 










30 






Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


Gly 


Asn 


Gly 


Cys 


Gly 


Gly 




35 










40 










45 








Cys 


Lys 
50 


Arg 


Tyr 


Pro 


Asp 


Leu 
55 


Glu 


Asn 


Thr 


Ala 


Thr 

60 


Glu 


Thr 


Leu 


Val 


Leu 


Gly 


Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


65 








70 










75 










80 


Thr 


Phe 


Val 


Ala 


Glu 
85 


Asn 


Asp 


Ala 


Cys 


Lys 
90 


Cys 


Gly 


Ser 


Asp 


Cys 
95 


Lys 


Cys 


Asn 


Pro 


Cys 
100 


Thr 


Cys 


Lys 




















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:206: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 
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(ii) 
(ix) 



(xi) 



(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE : 

(A) NAME /KEY 

(B) LOCATION 



peptide 
1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582181 
SEQUENCE DESCRIPTION: SEQ ID NO: 206: 



Met 
1 



Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 



10 



15 



Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 



20 



25 



30 



Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 



35 



40 



45 



Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 



50 



55 



60 



DNA (genomic) 



Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 
65 70 75 

(2) INFORMATION FOR SEQ ID NO:207: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 483 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..483 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582190 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 207: 
tcactgatta ttgttttaag gcaaattaag atcatcttca aaatcttctc agatctcttc 
caattttcta gaaaaaacat gtcttgctgt ggtggaagct gtggttgtgg atctgcctgc 
aagtgcggca atggttgcgg aggttgcaaa aggtaccctg acttggagaa caccgccacc 
gagactcttg tcctcggtgt tgctccggcg atgaactctc agtacgaggc ttccggcgag 
actttcgttg ccgagaatga tgcctgcaaa tgcggatctg actgcaagtg caacccttgt 
acctgcaaat gaagaacttc ataaacccta agtctgtaat aaccctaatg ttatgttagg 
tttgcttata tgtaataatt ggctgatttt tccggtagtt ttgccggcga cgttggtctt 
tctcttgtac tgtatttcgt aatgtataat tacgctttgg aataaaaatt tgagtttgtg 
ate 

(2) INFORMATION FOR SEQ ID NO: 208: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
peptide 



60 
120 
180 
240 
300 
360 
420 
480 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 
1. .103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582191 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 208: 
Ser Leu He He Val Leu Arg Gin He Lys He He Phe Lys 
1 5 10 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly 



He Phe 
15 



20 



25 



30 



Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys Gly Gly 



35 



40 



45 



Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr Ala Thr Glu Thr Leu Val 



50 



55 



60 



Leu Gly Val Ala Pro Ala Met Asn Ser Gin Tyr Glu Ala Ser Gly Glu 

" " 8 0 



65 



70 



75 
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Thr Phe Val Ala Glu Asn Asp Ala Cys Lys Cys Gly Ser Asp Cys Lys 

85 90 95 

Cys Asn Pro Cys Thr Cys Lys 
100 

(2) INFORMATION FOR SEQ ID NO: 209: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582192 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 209: 



Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 






5 










10 










15 




Gly 


Asn 


Gly 


Cys 


Gly 


Gly 


Cys 


Lys 


Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 






20 










25 










30 






Ala 


Thr 


Glu 
35 


Thr 


Leu 


Val 


Leu 


Gly 
40 


Val 


Ala 


Pro 


Ala 


Met 
45 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe 


Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


50 










55 










60 










Cys 


Gly 


Ser 


Asp 


Cys 


Lys 


Cys 


Asn 


Pro 


Cys 


Thr 


Cys 


Lys 








65 










70 










75 












(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:210: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 488 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..488 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582199 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:210: 
aattcactga ttattgtttt aaggcaaatt aagatcatct tcaaaatctt ctcagatctc 60 
ttccaatttt ctagaaaaaa catgtcttgc tgtggtggaa gctgtggttg tggatctgcc 120 
tgcaagtgcg gcaatggttg cggaggttgc aaaaggtacc ctgacttgga gaacaccgcc 180 
accgagactc ttgtcctcgg tgttgctccg gcgatgaact ctcagtacga ggcttccggc 240 
gagactttcg ttgccgagaa tgatgcctgc aaatgcggat ctgactgcaa gtgcaaccct 300 
tgtacctgca aatgaagaac ttcataaacc ctaagtctgt aataacccta atgttatgtt 360 
aggtttgctt atatgtaata attggctgat ttttccggta gttttgccgg cgacgttggt 420 
ctttctcttc ttcttcttct tctgtgtgtg tttttatggt ttatctatgt gccctcttaa 480 
tgcaatat 

(2) INFORMATION FOR SEQ ID NO: 211: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 104 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..104 

(D) OTHER INFORMATION : / Ceres Seq. ID 1582200 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 211: 
Asn Ser Leu lie lie Val Leu Arg Gin lie Lys lie lie Phe Lys He 
15 10 15 
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Phe 


Ser 


Asp 


Leu 


Phe 


Gin 


Phe 


Ser 


Arg 


Lys 


Asn 


Met 


Ser 


Cys 


Cys 


Gly 






20 










25 










30 






Gly 


Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


Gly 


Asn 


Gly 


Cys 


Gly 




35 










40 










45 








Gly 


Cys 


Lys 


Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 


Ala 


Thr 


Glu 


Thr 


Leu 


50 










55 










60 








Gly 


Val 


Leu 


Gly 


Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


65 








70 










75 










o U 


Glu 


Thr 


Phe 


Val 


Ala 
85 


Glu 


Asn 


Asp 


Ala 


Cys 
90 


Lys 


Cys 


Gly 


Ser 


Asp 
95 




Lys 


C Y S 


Asn 


Pro 
100 


Cys 


Thr 


Cys 


Lys 


















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:212: 

















(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



(xi) 



65 

(2] 



peptide 
1. .77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582201 
VAJ - y SEQUENCE DESCRIPTION: SEQ ID NO: 212: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 

70 75 
INFORMATION FOR SEQ ID NO: 213: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 517 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..517 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582204 
SEQUENCE DESCRIPTION: SEQ ID NO: 213: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgttggtc 
tttctcatca aaggagctca agaactagag cagaagcacc tcctcctccg ttgcacactc 
ctcccacacc gtactttccg tttctcttct ttagact 
(2) INFORMATION FOR SEQ ID NO: 214: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) 
(ix) 



(xi) 



60 
120 
180 
240 
300 
360 
420 
480 
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(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582205 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 214: 



Ser 


Leu 


lie 


lie 


Val 


Leu 


Arg 


bin 


lie 


Lys 


1 le 


lie 


rne 


Lys 


lie 


rne 


1 








5 










10 










15 




Ser 


Asp 


Leu 


Phe 
20 


Gin 


Phe 


Ser 


Arg 


Lys 
25 


Asn 


Met 


Ser 


Cys 


Cys 
30 


Gly 


Gly 


Ser 


Cys 


Gly 
35 


Cys 


Gly 


Ser 


Ala 


Cys 
40 


Lys 


Cys 


Gly 


Asn 


Gly 
45 


Cys 


Gly 


Gly 


Cys 


Lys 
50 


Arg 


Tyr 


Pro 


Asp 


Leu 
55 


Glu 


Asn 


Thr 


Ala 


Thr 

60 


Glu 


Thr 


Leu 


Val 


Leu 


Gly 


Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


65 










70 










75 










80 


Thr 


Phe 


Val 


Ala 


Glu 
85 


Asn 


Asp 


Ala 


Cys 


Lys 
90 


Cys 


Gly 


Ser 


Asp 


Cys 
95 


Lys 


Cys 


Asn 


Pro 


Cys 


Thr 


Cys 


Lys 





















100 

(2) INFORMATION FOR SEQ ID NO: 215: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582206 
(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 215: 



Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 








5 








10 










15 




Gly Asn 


Gly 


Cys 


Gly 


Gly 


Cys 


Lys Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 








20 








25 










30 






Ala 


Thr 


Glu 


Thr 


Leu 


Val 


Leu 


Gly Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 






35 










40 








45 








Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 




50 










55 








60 










Cys 


Gly 


Ser 


Asp 


Cys 


Lys 


Cys 


Asn Pro 


Cys 


Thr 


Cys 


Lys 








65 










70 








75 












(2) 


INFORMATION 


FOR 


SEQ 


ID " 


NO: 216; 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..54 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582207 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 216: 
Met Leu Gly Leu Leu He Cys Asn Asn Trp Leu lie Phe Pro Val Val 
15 10 15 

Leu Pro Ala Thr Leu Val Phe Leu lie Lys Gly Ala Gin Glu Leu Glu 

20 25 30 

Gin Lys His Leu Leu Leu Arg Cys Thr Leu Leu Pro His Arg Thr Phe 

35 40 45 

Arg Phe Ser Ser Leu Asp 
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50 

(2) INFORMATION FOR SEQ ID NO: 217: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 459 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..459 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582236 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 217: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaraaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggtwgc ggaggwtgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgncctcggt dncgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgttggtc 
ttkctcttct tgattatgct ttgtgttctt caaagactt 
(2) INFORMATION FOR SEQ ID NO: 218: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1 . . 103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582237 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 218: 



Ser 


Leu 


He 


He 


Val 


Leu 


Arg 


Gin 


He 


Lys 


He 


He 


Phe 


Lys 


He 


Phe 


1 








5 










10 










15 




Ser 


Asp 


Leu 


Phe 


Gin 


Phe 


Ser 


Arg 


Xaa 


Asn 


Met 


Ser 


Cys 


Cys 


Gly 


Gly 






20 










25 










30 






Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


Gly 


Asn 


Gly 


Xaa 


Gly 


Xaa 




35 










40 










45 








Cys 


Lys 


Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 


Ala 


Thr 


Glu 


Thr 


Leu 


Xaa 


50 










55 










60 










Leu 


Gly 


Xaa 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


65 








70 










75 










80 


Thr 


Phe 


Val 


Ala 


Glu 
85 


Asn 


Asp 


Ala 


Cys 


Lys 
90 


Cys 


Gly 


Ser 


Asp 


Cys 
95 


Lys 


Cys 


Asn 


Pro 


Cys 
100 


Thr 


Cys 


Lys 




















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:219: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582238 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 219: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 
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Gly Asn Gly Xaa Gly Xaa Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Xaa Leu Gly Xaa Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 



65 
(2) 



70 



75 



INFORMATION FOR SEQ ID NO: 22 0: 



(1) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 488 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: 
FEATURE : 

(A) NAME /KEY : 

(B) LOCATION: 



DNA (genomic) 



1. .488 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582239 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 220: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttttttttt ttccggtagt tttgccggcg 
acgttggtct ttctcttctt cttcttcttc tgtgtgtgtt tttatggttt ggtcattaag 
atatctct 

(2) INFORMATION FOR SEQ ID NO: 221: 



60 
120 
180 
240 
300 
360 
420 
480 



<i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE : 

(A) NAME /KEY : 

(B) LOCATION: 



Ser 

1 



peptide 
1. .103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582240 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 221: 

Leu He He Val Leu Arg Gin He Lys He He Phe Lys He Phe 
5 10 15 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly 

20 25 30 

Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys Gly Gly 

35 40 45 

Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr Ala Thr Glu Thr Leu Val 

50 55 60 

Leu Glv Val Ala Pro Ala Met Asn Ser Gin Tyr Glu Ala Ser Gly Glu 
65 70 75 80 

Thr Phe Val Ala Glu Asn Asp Ala Cys Lys Cys Gly Ser Asp Cys Lys 

85 90 95 

Cys Asn Pro Cys Thr Cys Lys 
100 

(2) INFORMATION FOR SEQ ID NO: 222: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 77 
(D) OTHER INFORMATION: 



/ Ceres Seq. ID 1582241 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 222: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 

70 75 
INFORMATION FOR SEQ ID NO: 223: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 492 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
DNA (genomic) 



65 
(2) 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



1. .492 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582245 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 223: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgttggtc 
tttctcttct tcttcttctt ctgtgtcttt gtttgtaatg gattcaactt ctctttttgt 
ttcaatgtca ag 

(2) INFORMATION FOR SEQ ID NO: 224: 



60 
120 
180 
240 
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(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 



peptide 
1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582246 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 224: 
Ser Leu He He Val Leu Arg Gin He Lys He He Phe Lys He Phe 
15 10 15 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly 

20 25 30 

Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys Gly Gly 

35 40 45 

Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr Ala Thr Glu Thr Leu Val 

50 55 60 

Leu Gly Val Ala Pro Ala Met Asn Ser Gin Tyr Glu Ala Ser Gly Glu 
65 70 75 80 

Thr Phe Val Ala Glu Asn Asp Ala Cys Lys Cys Gly Ser Asp Cys Lys 
85 90 95 
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Cys Asn Pro Cys Thr Cys Lys 
100 

(2) INFORMATION FOR SEQ ID NO: 225: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582247 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 225: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 
65 70 75 

(2) INFORMATION FOR SEQ ID NO: 226: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 502 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..502 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582248 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 6: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 60 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 120 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 180 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 240 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgttggtc 
tttctcttct tcttcttctt ctgtgtgtgt ttttatggtt tggtcattaa gatatctctg 
caaagtttta tctttggcgg ac 
(2) INFORMATION FOR SEQ ID NO: 227: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582249 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 227: 
Ser Leu He He Val Leu Arg Gin He Lys He He Phe Lys He Phe 
1 5 10 15 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly 
20 25 30 



300 
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Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys Gly Gly 

35 40 45 

Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr Ala Thr Glu Thr Leu Val 

50 55 60 

Leu Gly Val Ala Pro Ala Met Asn Ser Gin Tyr Glu Ala Ser Gly Glu 
65 70 75 80 

Thr Phe Val Ala Glu Asn Asp Ala Cys Lys Cys Gly Ser Asp Cys Lys 

85 90 95 

Cys Asn Pro Cys Thr Cys Lys 
100 

INFORMATION FOR SEQ ID NO: 228 



(2) 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 
1. .77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582250 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 228: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 

70 75 
INFORMATION FOR SEQ ID NO: 22 9: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2033 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

DNA (genomic) 



65 
(2) 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



1. .2033 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582254 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 229: 
aagtatagct caagacatgg atttgataag ctctgatgag agaaaggaag aatctttact 
tcatccagtg gaagtagaag gaaactctgt ttctattgat cctggtgtta aatctggtgg 
tggaggagga gaagagaaag ggtttgtgtg gtggaaaatt ccaatagagg tgttgaagta 
ttgtgtgttg aaaattaatc ctatttggtc tttgtccatg gctgcagcat ttgtgggttt 
tgttatgtta gggcgtagat tgtacaatat gaagaagaag actcgttcct tgcagcttaa 
ggttctttga tgataagaag gtggcgaatc atgctgctcg gtggaacgaa gcaatctcgg 
tagtgaaacg tgtgcccata atccggccag cacttccgtc atcagtgggt atgaaccagt 
ggtccatgat gagtttaagg tgaaaattaa gtgaatttga gggcataata atgcatatat 
gtgagattat atgatttgat gtggttggtg catgtatcat atgattgtat taaaaatgtt 
acaaaaacat acaaaaagat gcttgtaagt ttgtactgtg tgtgtatata ttgttacttg 
cttggtggaa aaaaaaaaaa aaaaattttt cttcaaattt tccccattaa acaaaaaaaa 
atcaaatctc tctctttctc tctctaatgg cggcgacatt aggcagagac cagtatgtgt 
acatggcgaa gctcgccgag caggcggagc gttacAgaag agatggttca attcatggaa 
cagctcgtta caggcgctac tccagcggaa gagctcaccg ttgaagagag gaatctcctc 
tctgttgctt acaaaaacgt gatcggatct ctacgcgccg cctggaggat cgtgtcttcg 
attgagcaga aggaagagag taggaagaac gacgagcacg tgtcgcttgt caaggattac 
agatctaaag ttgagtctga gctttcttct gtttgctctg gaatccttaa gctccttgac 
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tcgcatctga tcccatctgc tggagcgagt gagtctaagg tcttttactt 
ggtgattatc atcggtacat ggctgagttt aagtctggtg atgagagcaa 
gagcgggcta aaatattttt ccatggatgg ttcatggggc ctctaacaga 
ccagacatca tgagggaata tgttggtgat cggcttccag agttcagtga 
gcacttgtaa agggttcata tgattttctt ggtctcaact attacgtcac 
caaaataatc agacgattgt tccttcggac gtacacactg ccttgatgga 
actctcacat ctaaaaatgc aactggtcat gctcctggtc caccgttcaa 
tactactacc caaaaggcat ttactacgta atggattact tcaaaaccac 
cctttaatat atgtcactga gaatggattt agtaccccag gtgatgagga 
gctactgccg attacaagcg gattgattat ctctgtagtc atctctgttt 
gtcatcaagg agaagaatgt caacgtgaaa ggatattttg cttggtctct 
tacgaattct gtaacggatt taccgtcaga ttcggactaa gttacgttga 
atcactggtg atagagacct caaagcatct ggcaaatggt tccagaagtt 
accgacgaag actctacgaa ccaagatcta ctccgctcaa gcgtctcctc 
gatcggaaga gtcttgcaga tgcatgaaat atccaatcca ctatatgtcc 
tcttcatgtt tcctctttct acttgctcca tagataagga gctttttcta 
taaaataaaa tcctaataaa agatgatcaa taataataaa gactttgttt 
(2) INFORMATION FOR SEQ ID NO: 230: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 390 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
peptide 



gaagatgaaa 
agatgcaact 
aggtaaatac 
aacagaagcc 
tcaatacgcc 
ctcacgcaca 
tgcagccagt 
ttacggtgac 
ctttgagaag 
cctcagtaaa 
tggggataat 
tttcgcaaat 
cataaacgtt 
caagaaccgt 
accaagatca 
ccatatgtat 
act 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



ID 1582255 



(xi! 



peptide 
1..390 

(D) OTHER INFORMATION: / Ceres Seq. 
v ^_w SEQUENCE DESCRIPTION: SEQ ID NO: 230: 
Met Cys Thr Trp Arg Ser Ser Pro Ser Arg Arg Ser Val Thr 
1 5 10 

Met Val Gin Phe Met Glu Gin Leu Val Thr Gly Ala Thr Pro 

20 25 30 

Glu Leu Thr Val Glu Glu Arg Asn Leu Leu Ser Val Ala Tyr 



35 



40 



45 



Val He Gly Ser Leu Arg Ala Ala Trp Arg He Val Ser Ser 



Glu Glu 
15 

Ala Glu 
Lys Asn 
He Glu 



50 



55 



60 



65 



70 



75 



Gin Lys Glu Glu Ser Arg Lys Asn Asp Glu His Val Ser Leu Val Lys 

80 

Ser Gly 
95 

Ala Ser 



Asp Tyr Arg Ser Lys Val Glu Ser Glu Leu Ser Ser Val Cys 

90 



85 



He Leu Lys Leu Leu Asp Ser His Leu He Pro Ser Ala Gly 

110 



100 



105 



Glu Ser Lys Val Phe Tyr Leu Lys Met Lys Gly Asp Tyr His 

— ~ 125 



Arg Tyr 

Met Ala Glu Phe Lys Ser Gly Asp Glu Ser Lys Asp Ala Thr Glu Arg 

— 140 



115 



120 



130 



135 



Ala Lys He Phe Phe His Gly Trp Phe Met Gly Pro Leu Thr 

— 155 



145 



150 



Lys Tyr Pro Asp He Met Arg Glu Tyr Val Gly Asp Arg Leu 

170 



165 



Phe 



Ser Glu Thr Glu Ala Ala Leu Val Lys Gly Ser Tyr Asp 

190 



180 



185 



Glv Leu Asn Tyr Tyr Val Thr Gin Tyr Ala Gin Asn Asn Gin 
* — 205 



195 



200 



Val Pro Ser Asp Val His Thr Ala Leu Met Asp Ser Arg Thr 

220 



210 



215 



Thr Ser Lys Asn Ala Thr Gly His Ala Pro Gly Pro Pro Phe 



225 
Ala 



230 



235 



Ser Tyr Tyr Tyr Pro Lys Gly He Tyr Tyr Val Met Asp 



245 



250 



Glu Gly 
160 
Pro Glu 
175 

Phe Leu 

Thr He 

Thr Leu 

Asn Ala 
240 
Tyr Phe 
255 



1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
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Lys Thr Thr Tyr Gly Asp Pro Leu He Tyr Val Thr Glu Asn Gly Phe 

260 265 270 

Ser Thr Pro Gly Asp Glu Asp Phe Glu Lys Ala Thr Ala Asp Tyr Lys 

275 280 285 

Arg He Asp Tyr Leu Cys Ser His Leu Cys Phe Leu Ser Lys Val He 

290 295 300 

Lys Glu Lys Asn Val Asn Val Lys Gly Tyr Phe Ala Trp Ser Leu Gly 
305 310 315 320 

Asp Asn Tyr Glu Phe Cys Asn Gly Phe Thr Val Arg Phe Gly Leu Ser 

325 330 335 

Tyr Val Asp Phe Ala Asn He Thr Gly Asp Arg Asp Leu Lys Ala Ser 

340 345 350 

Gly Lys Trp Phe Gin Lys Phe He Asn Val Thr Asp Glu Asp Ser Thr 

355 360 365 

Asn Gin Asp Leu Leu Arg Ser Ser Val Ser Ser Lys Asn Arg Asp Arg 

370 375 380 

Lys Ser Leu Ala Asp Ala 
385 390 
(2) INFORMATION FOR SEQ ID NO: 231: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1. . 374 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582256 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:231: 
Met Val Gin Phe Met Glu Gin Leu Val Thr Gly Ala Thr Pro Ala Glu 
15 10 15 

Glu Leu Thr Val Glu Glu Arg Asn Leu Leu Ser Val Ala Tyr Lys Asn 

20 25 30 

Val He Gly Ser Leu Arg Ala Ala Trp Arg He Val Ser Ser He Glu 

35 40 45 

Gin Lys Glu Glu Ser Arg Lys Asn Asp Glu His Val Ser Leu Val Lys 

50 55 60 

Asp Tyr Arg Ser Lys Val Glu Ser Glu Leu Ser Ser Val Cys Ser Gly 
65 70 75 80 

He Leu Lys Leu Leu Asp Ser His Leu He Pro Ser Ala Gly Ala Ser 

85 90 95 

Glu Ser Lys Val Phe Tyr Leu Lys Met Lys Gly Asp Tyr His Arg Tyr 

100 105 HO 

Met Ala Glu Phe Lys Ser Gly Asp Glu Ser Lys Asp Ala Thr Glu Arg 

115 120 125 

Ala Lys He Phe Phe His Gly Trp Phe Met Gly Pro Leu Thr Glu Gly 

130 135 140 

Lys Tyr Pro Asp He Met Arg Glu Tyr Val Gly Asp Arg Leu Pro Glu 
145 150 155 160 

Phe Ser Glu Thr Glu Ala Ala Leu Val Lys Gly Ser Tyr Asp Phe Leu 

165 170 175 

Gly Leu Asn Tyr Tyr Val Thr Gin Tyr Ala Gin Asn Asn Gin Thr He 

180 185 190 

Val Pro Ser Asp Val His Thr Ala Leu Met Asp Ser Arg Thr Thr Leu 

195 200 205 

Thr Ser Lys Asn Ala Thr Gly His Ala Pro Gly Pro Pro Phe Asn Ala 

210 215 220 

Ala Ser Tyr Tyr Tyr Pro Lys Gly He Tyr Tyr Val Met Asp Tyr Phe 
225 230 235 240 

Lys Thr Thr Tyr Gly Asp Pro Leu He Tyr Val Thr Glu Asn Gly Phe 
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245 250 255 

Ser Thr Pro Gly Asp Glu Asp Phe Glu Lys Ala Thr Ala Asp Tyr Lys 

260 265 270 

Arg lie Asp Tyr Leu Cys Ser His Leu Cys Phe Leu Ser Lys Val He 

275 280 285 

Lys Glu Lys Asn Val Asn Val Lys Gly Tyr Phe Ala Trp Ser Leu Gly 

290 295 300 

Asp Asn Tyr Glu Phe Cys Asn Gly Phe Thr Val Arg Phe Gly Leu Ser 
305 310 315 320 

Tyr Val Asp Phe Ala Asn He Thr Gly Asp Arg Asp Leu Lys Ala Ser 

325 330 335 

Gly Lys Trp Phe Gin Lys Phe He Asn Val Thr Asp Glu Asp Ser Thr 

340 345 350 

Asn Gin Asp Leu Leu Arg Ser Ser Val Ser Ser Lys Asn Arg Asp Arg 

355 360 365 

Lys Ser Leu Ala Asp Ala 
370 

(2) INFORMATION FOR SEQ ID NO: 232: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..370 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582257 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 232: 
Met Glu Gin Leu Val Thr Gly Ala Thr Pro Ala Glu Glu Leu Thr Val 
15 10 15 

Glu Glu Arg Asn Leu Leu Ser Val Ala Tyr Lys Asn Val He Gly Ser 

20 25 30 

Leu Arg Ala Ala Trp Arg He Val Ser Ser He Glu Gin Lys Glu Glu 

35 40 45 

Ser Arg Lys Asn Asp Glu His Val Ser Leu Val Lys Asp Tyr Arg Ser 

50 55 60 

Lvs Val Glu Ser Glu Leu Ser Ser Val Cys Ser Gly He Leu Lys Leu 
65 70 75 80 

Leu Asp Ser His Leu He Pro Ser Ala Gly Ala Ser Glu Ser Lys Val 

85 90 95 

Phe Tyr Leu Lys Met Lys Gly Asp Tyr His Arg Tyr Met Ala Glu Phe 

100 105 HO 

Lys Ser Gly Asp Glu Ser Lys Asp Ala Thr Glu Arg Ala Lys He Phe 

115 120 125 

Phe His Gly Trp Phe Met Gly Pro Leu Thr Glu Gly Lys Tyr Pro Asp 

130 135 140 

He Met Arg Glu Tyr Val Gly Asp Arg Leu Pro Glu Phe Ser Glu Thr 
145 150 155 160 

Glu Ala Ala Leu Val Lys Gly Ser Tyr Asp Phe Leu Gly Leu Asn Tyr 

165 170 175 

Tyr Val Thr Gin Tyr Ala Gin Asn Asn Gin Thr He Val Pro Ser Asp 

180 185 190 

Val His Thr Ala Leu Met Asp Ser Arg Thr Thr Leu Thr Ser Lys Asn 

195 200 205 

Ala Thr Gly His Ala Pro Gly Pro Pro Phe Asn Ala Ala Ser Tyr Tyr 

210 215 220 

Tyr Pro Lys Gly He Tyr Tyr Val Met Asp Tyr Phe Lys Thr Thr Tyr 
225 230 235 240 

Gly Asp Pro Leu He Tyr Val Thr Glu Asn Gly Phe Ser Thr Pro Gly 
245 250 255 
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Asp Glu Asp Phe Glu Lys Ala Thr Ala Asp Tyr Lys Arg lie Asp Tyr 

260 265 270 

Leu Cys Ser His Leu Cys Phe Leu Ser Lys Val He Lys Glu Lys Asn 

275 280 285 

Val Asn Val Lys Gly Tyr Phe Ala Trp Ser Leu Gly Asp Asn Tyr Glu 

290 295 300 

Phe Cys Asn Gly Phe Thr Val Arg Phe Gly Leu Ser Tyr Val Asp Phe 
305 310 315 320 

Ala Asn He Thr Gly Asp Arg Asp Leu Lys Ala Ser Gly Lys Trp Phe 

325 330 335 

Gin Lys Phe He Asn Val Thr Asp Glu Asp Ser Thr Asn Gin Asp Leu 

340 345 350 

Leu Arg Ser Ser Val Ser Ser Lys Asn Arg Asp Arg Lys Ser Leu Ala 
355 360 365 

Asp Ala 
370 

(2) INFORMATION FOR SEQ ID NO: 233: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1160 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1160 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582293 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 233: 
agaggagaac aatggaagag aagaagaaga cgatggagat ggaagcaaca acaatgaaag 
gcacggcggc ggaagatata acggagagac tctcttctct cgacaatctc tattttccac 120 
gcgccgtcca attaaccgct gcatcttccg accaacgcaa atccatcctc ctcgacctcc 1f5n 
tccgtcgtga tcccgccgtt tttctagaga gatatggatc ggagctacta gtagatgaat 
tgcttgagtt tgatgctatg aaacatgact acgaggttga ttggcatttg aaaaacctgc 
ggaagaagat aagtccgact tcagaagaga ttaaatcgag gtctgtagct gtgaggaata 
ggagattggc ttatttgaat aagcttgtat ccgagggaca gtatttctca gaggatgcta 
tgagagatag agagccgtat ctacatcatg agtatgttgg gaagtttcag gatgtgatgg 
gaaggaacat ggctaggcct ggagaacgtt ggtctgagac tttgatgaga cgggctgagg 540 
aagcggtgtt ggttactcgg attagagagg agcagcagag gttaggtgtt gcagagagtg 600 
attgggttgg taatgagaag atggaggaat cagaagagga agaggaggaa tcagaagagg 660 
aggaagaaga agaagatgaa gaagcgaaga atcctacaga agctagctct tcaagtctga 
atggtaaaga acagaaagag aaggcawcaa cagtcttacc gccagaggag atgcaagata 
tgatggatca gttcacatca atcatggaac agaagttctt atcgggagaa gatcatcaac 
atttggatta cgcaaagata gacaatgacg agactcttga tgatcattgg cttcgagaga 
ttggccgtga cgctgaagat aagtactttg atgaagacta attggatatg taagatatga 
tctctcctct ctctctctct cgttacctgc agagttatat atatcatatg atagagagtg 
ctatcatcgg attacaagtc caaacaccaa cacaacgtca tgttaagttt tggttgagag 
attattcatc aaatctctat gtacactaat agaattggtt caaattatat ggcccagaat 1140 
agtaaagcgg tgcgatgatg 
(2) INFORMATION FOR SEQ ID NO: 234: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 312 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..312 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582294 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 234: 
Arg Arg Thr Met Glu Glu Lys Lys Lys Thr Met Glu Met Glu Ala Thr 



60 



180 
240 
300 
360 
420 
480 
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780 
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960 
1020 
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15 10 15 

Thr Met Lys Gly Thr Ala Ala Glu Asp He Thr Glu Arg Leu Ser Ser 

20 25 30 

Leu Asp Asn Leu Tyr Phe Pro Arg Ala Val Gin Leu Thr Ala Ala Ser 

35 40 45 

Ser Asp Gin Arg Lys Ser He Leu Leu Asp Leu Leu Arg Arg Asp Pro 

50 55 60 

Ala Val Phe Leu Glu Arg Tyr Gly Ser Glu Leu Leu Val Asp Glu Leu 
65 70 75 80 

Leu Glu Phe Asp Ala Met Lys His Asp Tyr Glu Val Asp Trp His Leu 

85 90 95 

Lys Asn Leu Arg Lys Lys He Ser Pro Thr Ser Glu Glu He Lys Ser 

100 105 HO 

Arg Ser Val Ala Val Arg Asn Arg Arg Leu Ala Tyr Leu Asn Lys Leu 

115 120 125 

Val Ser Glu Gly Gin Tyr Phe Ser Glu Asp Ala Met Arg Asp Arg Glu 

130 135 140 

Pro Tyr Leu His His Glu Tyr Val Gly Lys Phe Gin Asp Val Met Gly 
145 150 155 160 

Arg Asn Met Ala Arg Pro Gly Glu Arg Trp Ser Glu Thr Leu Met Arg 

165 170 175 

Arg Ala Glu Glu Ala Val Leu Val Thr Arg He Arg Glu Glu Gin Gin 

180 185 190 

Arg Leu Gly Val Ala Glu Ser Asp Trp Val Gly Asn Glu Lys Met Glu 

195 200 205 

Glu Ser Glu Glu Glu Glu Glu Glu Ser Glu Glu Glu Glu Glu Glu Glu 

210 215 220 

Asp Glu Glu Ala Lys Asn Pro Thr Glu Ala Ser Ser Ser Ser Leu Asn 
"225 230 235 240 

Gly Lys Glu Gin Lys Glu Lys Ala Xaa Thr Val Leu Pro Pro Glu Glu 

245 250 255 

Met Gin Asp Met Met Asp Gin Phe Thr Ser He Met Glu Gin Lys Phe 

260 265 270 

Leu Ser Gly Glu Asp His Gin His Leu Asp Tyr Ala Lys He Asp Asn 

275 280 285 

Asp Glu Thr Leu Asp Asp His Trp Leu Arg Glu He Gly Arg Asp Ala 

290 295 300 

Glu Asp Lys Tyr Phe Asp Glu Asp 
305 310 
(2) INFORMATION FOR SEQ ID NO: 235: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 309 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 309 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582295 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 235: 
Met Glu Glu Lys Lys Lys Thr Met Glu Met Glu Ala Thr Thr Met Lys 
15 10 15 

Gly Thr Ala Ala Glu Asp He Thr Glu Arg Leu Ser Ser Leu Asp Asn 

20 25 30 

Leu Tyr Phe Pro Arg Ala Val Gin Leu Thr Ala Ala Ser Ser Asp Gin 

35 40 45 

Arg Lys Ser He Leu Leu Asp Leu Leu Arg Arg Asp Pro Ala Val Phe 

50 55 60 

Leu Glu Arg Tyr Gly Ser Glu Leu Leu Val Asp Glu Leu Leu Glu Phe 
65 70 75 80 
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Asp Ala Met Lys His Asp Tyr Glu Val Asp Trp His Leu Lys Asn Leu 

85 90 95 

Arg Lys Lys He Ser Pro Thr Ser Glu Glu He Lys Ser Arg Ser Val 

100 105 HO 

Ala Val Arg Asn Arg Arg Leu Ala Tyr Leu Asn Lys Leu Val Ser Glu 

115 120 125 

Gly Gin Tyr Phe Ser Glu Asp Ala Met Arg Asp Arg Glu Pro Tyr Leu 

130 135 140 

His His Glu Tyr Val Gly Lys Phe Gin Asp Val Met Gly Arg Asn Met 
145 150 155 160 

Ala Arg Pro Gly Glu Arg Trp Ser Glu Thr Leu Met Arg Arg Ala Glu 

165 170 175 

Glu Ala Val Leu Val Thr Arg He Arg Glu Glu Gin Gin Arg Leu Gly 

180 185 190 

Val Ala Glu Ser Asp Trp Val Gly Asn Glu Lys Met Glu Glu Ser Glu 

195 200 205 

Glu Glu Glu Glu Glu Ser Glu Glu Glu Glu Glu Glu Glu Asp Glu Glu 

210 215 220 

Ala Lys Asn Pro Thr Glu Ala Ser Ser Ser Ser Leu Asn Gly Lys Glu 
225 230 235 240 

Gin Lys Glu Lys Ala Xaa Thr Val Leu Pro Pro Glu Glu Met Gin Asp 

245 250 255 

Met Met Asp Gin Phe Thr Ser He Met Glu Gin Lys Phe Leu Ser Gly 

260 265 270 

Glu Asp His Gin His Leu Asp Tyr Ala Lys He Asp Asn Asp Glu Thr 

275 280 285 

Leu Asp Asp His Trp Leu Arg Glu He Gly Arg Asp Ala Glu Asp Lys 

290 295 - 300 

Tyr Phe Asp Glu Asp 
305 

(2) INFORMATION FOR SEQ ID NO: 236: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 302 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..302 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582296 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 236: 
Met Glu Met Glu Ala Thr Thr Met Lys Gly Thr Ala Ala Glu Asp He 
15 10 15 

Thr Glu Arg Leu Ser Ser Leu Asp Asn Leu Tyr Phe Pro Arg Ala Val 

20 25 30 

Gin Leu Thr Ala Ala Ser Ser Asp Gin Arg Lys Ser He Leu Leu Asp 

35 40 45 

Leu Leu Arg Arg Asp Pro Ala Val Phe Leu Glu Arg Tyr Gly Ser Glu 

50 55 60 

Leu Leu Val Asp Glu Leu Leu Glu Phe Asp Ala Met Lys His Asp Tyr 
65 70 75 80 

Glu Val Asp Trp His Leu Lys Asn Leu Arg Lys Lys He Ser Pro Thr 

85 90 95 

Ser Glu Glu He Lys Ser Arg Ser Val Ala Val Arg Asn Arg Arg Leu 

100 105 HO 

Ala Tyr Leu Asn Lys Leu Val Ser Glu Gly Gin Tyr Phe Ser Glu Asp 

115 120 125 

Ala Met Arg Asp Arg Glu Pro Tyr Leu His His Glu Tyr Val Gly Lys 

130 135 140 

Phe Gin Asp Val Met Gly Arg Asn Met Ala Arg Pro Gly Glu Arg Trp 
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145 










150 










155 










160 


Ser 


Glu 


Thr 


Leu 


Met 


Arg 


Arg 


Ala 


Glu 


Glu 


Ala 


Val 


Leu 


Val 


Thr 


Arg 










165 










170 










175 




He 


Arg 


Glu 


Glu 


Gin 


Gin 


Arg 


Leu 


Gly 


Val 


Ala 


Glu 


Ser 


Asp 


Trp 


Val 






180 










185 










190 






Gly Asn 


Glu 


Lys 


Met 


Glu 


Glu 


Ser 


Glu 


Glu 


Glu 


Glu 


Glu 


Glu 


Ser 


Glu 






195 








200 










205 








Glu 


Glu 


Glu 


Glu 


Glu 


Glu 


Asp 


Glu 


Glu 


Ala 


Lys 


Asn 


Pro 


Thr 


Glu 


Ala 




210 










215 










220 










Ser 


Ser 


Ser 


Ser 


Leu 


Asn 


Gly 


Lys 


Glu 


Gin 


Lys 


Glu 


Lys 


Ala 


Xaa 


Thr 


225 










230 










235 










240 


Val 


Leu 


Pro 


Pro 


Glu 


Glu 


Met 


Gin 


Asp 


Met 


Met 


Asp 


Gin 


Phe 


Thr 


Ser 










245 










250 










o c c; 
ZOO 




He 


Met 


Glu 


Gin 


Lys 


Phe 


Leu 


Ser 


Gly 


Glu 


Asp 


His 


Gin 


His 


Leu 


Asp 








260 










265 










270 






Tyr 


Ala 


Lys 


He 


Asp 


Asn 


Asp 


Glu 


Thr 


Leu 


Asp 


Asp 


His 


Trp 


Leu 


Arg 




275 










280 










285 








Glu 


He 


Gly 


Arg 


Asp 


Ala 


Glu 


Asp 


Lys 


Tyr 


Phe 


Asp 


Glu 


Asp 







60 
120 



290 295 300 

(2) INFORMATION FOR SEQ ID NO: 237: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 483 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION : 1..483 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582315 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 237: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 180 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 240 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 300 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 360 
ggtttgctta tatgtaataa ttggctgatt tttccggtag tttttggatc atgactcatc 420 
gtgtatgaac ttatttcaac cttagacttg tattctctct tgagttaagt ttgaaatcag 480 
atg 

(2) INFORMATION FOR SEQ ID NO: 238: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582316 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 238: 
Ser Leu He He Val Leu Arg Gin He Lys He He Phe Lys He Phe 
15 10 15 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly 

20 25 30 

Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys Gly Gly 

35 40 45 

Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr Ala Thr Glu Thr Leu Val 



50 



55 60 



Leu Gly Val Ala Pro Ala Met Asn Ser Gin Tyr Glu Ala Ser Gly Glu 
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65 70 75 80 

Thr Phe Val Ala Glu Asn Asp Ala Cys Lys Cys Gly Ser Asp Cys Lys 

85 90 95 

Cys Asn Pro Cys Thr Cys Lys 
100 

(2) INFORMATION FOR SEQ ID NO: 239: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582317 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 239: 



Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 








5 








10 










15 




Gly 


Asn 


Gly 


Cys 


Gly 


Gly 


Cys 


Lys Arg 




Pro 


Asp 


Leu 


Glu 


Asn 


Thr 






20 








25 










30 






Ala 


Thr 


Glu 
35 


Thr 


Leu 


Val 


Leu 


Gly Val 
40 


Ala 


Pro 


Ala 


Met 
45 


Asn 


Ser 


Gin 


T Y r 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


50 










55 








60 










Cys 


Gly 


Ser 


Asp 


Cys 


Lys 


Cys 


Asn Pro 


Cys 


Thr 


Cys 


Lys 








65 










70 








75 












(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:240 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 470 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..470 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582339 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 240: 
atagagtaaa gaagataaaa aacacaattg aagcttttat aatattttct cagaaacttt 
caaagagctt agaaaatgag tacagctact ttcgttgata ttattatcgc catcctcttg 
cctccactcg gtgtctttct cagatttggt tgcggggttg agttttggat atgtttggtt 
ttgacgctac ttgggtatat tcctgggatc atatacgcca tttatgtcct caccaaatga 
tttaccatct atcatcatct ccttgaacag ctgttccgtc gtgttctcct atctttgtga 
ctgattcagc gtttcttttt ctttcatcag agtttttatg tttcaagtaa tttaattaat 
catcactgtt gtgtttgcat tgttatataa atgttgtgtt gatataaaag aagagagcgt 
tggtttgtac tttgtgtgaa cattttttaa aaatatagtt ggtttattac 
(2) INFORMATION FOR SEQ ID NO: 241: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 31 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582340 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 241: 
Arg Val Lys Lys He Lys Asn Thr He Glu Ala Phe He He Phe Ser 
15 10 15 
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Gin Lys Leu Ser Lys Ser Leu Glu Asn Glu Tyr Ser Tyr Phe Arg 
20 25 30 

INFORMATION FOR SEQ ID NO: 242: 



(2) 



(i) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..5 4 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582341 
SEQUENCE DESCRIPTION: SEQ ID NO: 242: 
Ser Thr Ala Thr Phe Val Asp lie lie lie Ala lie Leu Leu Pro 
5 10 15 

Pro Leu Gly Val Phe Leu Arg Phe Gly Cys Gly Val Glu Phe Trp lie 
20 25 30 

Leu Val Leu Thr Leu Leu Gly Tyr He Pro Gly He He Tyr Ala 
35 40 45 

lie Tyr Val Leu Thr Lys 
50 

INFORMATION FOR SEQ ID NO: 24 3: 



(ii) 
(ix) 



(xi) 



Met 
1 



Cys 



(2] 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE : 

(A) NAME/KEY: 

(B) LOCATION: 



peptide 



(xi) 



peptide 
1. .31 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582342 
SEQUENCE DESCRIPTION: SEQ ID NO: 24 3: 
Phe' Gly Phe Asp Ala Thr Trp Val Tyr Ser Trp Asp His He Arg 
5 10 15 

His Leu Cys Pro His Gin Met He Tyr His Leu Ser Ser Ser Pro 
20 25 30 

INFORMATION FOR SEQ ID NO: 24 4: 



Met 
1 



(2) 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1370 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 



MOLECULE TYPE: 
FEATURE : 

(A) NAME /KEY : 

(B) LOCATION: 



DNA (genomic] 



1. .1370 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582349 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 244; 
aatatatacg aaacgtacca caaatttcta actaaagcat tcatagtctc tcgaaagcct 
cttttcagaa ccgaagctct ttactttcgt ccaccgggaa atatgccagt cgacgtagcc 
tcaccggccg gaaaaaccgt ctgcgtcacc ggagctggtg gatacatcgc ttcttggatt 
gttaagatac ttctcgagag aggttacaca gtcaaaggaa ccgtacggaa tccagatgat 
ccgaagaaca cacatttgag agaactagaa ggaggaaagg agagactgat tctgtgcaaa 
gcagatcttc aggactacga ggctcttaag gcggcgattg atggttgcga cggcgtcttt 
cacacggctt ctcctgtcac cgacgatccg gaacaaatgg tggagccggc cgtgaatgga 
gccaagtttg taattaatgc tgcggctgag gccaaggtca agcgcgtggt catcacctcc 
tccattggtg ccgtctacat ggacccgaac cgtgaccctg aggctgtcgt tgacgaaagt 
tgttggagtg atcttgactt ctgcaaaaac accaagaatt ggtattgtta cggcaagatg 
gtggcggaac aagcggcgtg ggagacagca aaggagaaag gtgttgactt ggtggtgttg 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
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aatccggtgc tggttcttgg accgccgtta cagccgacga tcaacgccag tctttaccac 
gtcctcaaat atctaaccgg ctcggctaag acttatgcta atttgactca agcttatgtg 
gatgttcgcg atgtcgcgct ggctcatgtt ctggtctatg aggcaccctc ggcctccgga 
cgttatctcc tagccgagag tgctcgccac cgcggggaag ttgttgagat tctggctaag 
ctattcccgg agtatcctct tccgaccaag tgcaaggacg agaagaaccc tagagccaag 
ccatacaaat tcactaacca gaagattaag gacttaggct tagagttcac ttccaccaag 
caaagcctct acgacacagt caagagctta caagagaaag gccatcttgc tcctcctcct 
cctcctcctt cagcatcgca agaatccgtg gaaaatggca ttaagatcgg gtcttgaaaa 
gcttattaat tccctcaagt atccccctta agtatcctta accattgaag ttgcttttgt 
ttgttgtctc tggttatgtg aaacctctgt ttcaatatgt ctcgtctggt tatgaatctg 
tacactcagt tctttggcca aaccgtttga tgtattttgt agtcaagtct tcatgtttga 
tctatgtaat agattacact taagtaaaca acttcatttt tattctatgc 
(2) INFORMATION FOR SEQ ID NO: 245: 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 378 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE: 

(A) NAME /KEY: 

(B) LOCATION: 



peptide 



ID 1582350 



(xi) 



peptide 
1. .378 

(D) OTHER INFORMATION: / Ceres Seq. 
v^, SEQUENCE DESCRIPTION: SEQ ID NO: 245: 
Asn He Tyr Glu Thr Tyr His Lys Phe Leu Thr Lys Ala Phe He Val 
15 10 15 

Ser Arg Lys Pro Leu Phe Arg Thr Glu Ala Leu Tyr Phe Arg Pro Pro 

20 25 30 

Gly Asn Met Pro Val Asp Val Ala Ser Pro Ala Gly Lys Thr Val Cys 

35 40 45 

Val Thr Gly Ala Gly Gly Tyr He Ala Ser Trp He Val Lys He Leu 

50 55 60 

Leu Glu Arg Gly Tyr Thr Val Lys Gly Thr Val Arg Asn Pro Asp Asp 
65 70 75 80 

Pro Lys Asn Thr His Leu Arg Glu Leu Glu Gly Gly Lys Glu Arg Leu 

85 90 95 

He Leu Cys Lys Ala Asp Leu Gin Asp Tyr Glu Ala Leu Lys Ala Ala 

100 105 HO 

He Asp Gly Cys Asp Gly Val Phe His Thr Ala Ser Pro Val Thr Asp 

115 120 125 

Asp Pro Glu Gin Met Val Glu Pro Ala Val Asn Gly Ala Lys Phe Val 

130 135 140 

He Asn Ala Ala Ala Glu Ala Lys Val Lys Arg Val Val He Thr Ser 
145 150 155 160 

Ser He Gly Ala Val Tyr Met Asp Pro Asn Arg Asp Pro Glu Ala Val 

165 170 175 

Val Asp Glu Ser Cys Trp Ser Asp Leu Asp Phe Cys Lys Asn Thr Lys 

180 185 190 

Asn Trp Tyr Cys Tyr Gly Lys Met Val Ala Glu Gin Ala Ala Trp Glu 

195 200 205 

Thr Ala Lys Glu Lys Gly Val Asp Leu Val Val Leu Asn Pro Val Leu 

210 215 220 

Val Leu Gly Pro Pro Leu Gin Pro Thr He Asn Ala Ser Leu Tyr His 
225 230 235 240 

Val Leu Lys Tyr Leu Thr Gly Ser Ala Lys Thr Tyr Ala Asn Leu Thr 

245 250 255 

Gin Ala Tyr Val Asp Val Arg Asp Val Ala Leu Ala His Val Leu Val 

260 265 270 

Tyr Glu Ala Pro Ser Ala Ser Gly Arg Tyr Leu Leu Ala Glu Ser Ala 

275 280 285 

Arg His Arg Gly Glu Val Val Glu He Leu Ala Lys Leu Phe Pro Glu 



720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
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290 295 300 

Tyr Pro Leu Pro Thr Lys Cys Lys Asp Glu Lys Asn Pro Arg Ala Lys 
305 310 315 320 

Pro Tyr Lys Phe Thr Asn Gin Lys lie Lys Asp Leu Gly Leu Glu Phe 

325 330 335 

Thr Ser Thr Lys Gin Ser Leu Tyr Asp Thr Val Lys Ser Leu Gin Glu 

340 345 350 

Lys Gly His Leu Ala Pro Pro Pro Pro Pro Pro Ser Ala Ser Gin Glu 

355 360 365 

Ser Val Glu Asn Gly lie Lys lie Gly Ser 

370 375 
(2) INFORMATION FOR SEQ ID NO: 246: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..344 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582351 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 246: 
Met Pro Val Asp Val Ala Ser Pro Ala Gly Lys Thr Val Cys Val Thr 
15 10 15 

Gly Ala Gly Gly Tyr He Ala Ser Trp He Val Lys He Leu Leu Glu 

20 25 30 

Arg Gly Tyr Thr Val Lys Gly Thr Val Arg Asn Pro Asp Asp Pro Lys 

35 40 45 

Asn Thr His Leu Arg Glu Leu Glu Gly Gly Lys Glu Arg Leu He Leu 

50 55 60 

Cys Lys Ala Asp Leu Gin Asp Tyr Glu Ala Leu Lys Ala Ala He Asp 
65 70 75 80 

Gly Cys Asp Gly Val Phe His Thr Ala Ser Pro Val Thr Asp Asp Pro 

85 90 95 

Glu Gin Met Val Glu Pro Ala Val Asn Gly Ala Lys Phe Val He Asn 

100 105 HO 

Ala Ala Ala Glu Ala Lys Val Lys Arg Val Val He Thr Ser Ser He 

115 120 125 

Gly Ala Val Tyr Met A.sp Pro Asn Arg Asp Pro Glu Ala Val Val Asp 

130 135 140 

Glu Ser Cys Trp Ser Asp Leu Asp Phe Cys Lys Asn Thr Lys Asn Trp 
145 150 155 160 

Tyr Cys Tyr Gly Lys Met Val Ala Glu Gin Ala Ala Trp Glu Thr Ala 

165 170 175 

Lys Glu Lys Gly Val Asp Leu Val Val Leu Asn Pro Val Leu Val Leu 

180 185 190 

Gly Pro Pro Leu Gin Pro Thr He Asn Ala Ser Leu Tyr His Val Leu 

195 200 205 

Lys Tyr Leu Thr Gly Ser Ala Lys Thr Tyr Ala Asn Leu Thr Gin Ala 

210 215 220 

Tyr Val Asp Val Arg Asp Val Ala Leu Ala His Val Leu Val Tyr Glu 
225 230 235 240 

Ala Pro Ser Ala Ser Gly Arg Tyr Leu Leu Ala Glu Ser Ala Arg His 

245 250 255 

Arg Gly Glu Val Val Glu He Leu Ala Lys Leu Phe Pro Glu Tyr Pro 

260 265 270 

Leu Pro Thr Lys Cys Lys Asp Glu Lys Asn Pro Arg Ala Lys Pro Tyr 

275 280 285 

Lys Phe Thr Asn Gin Lys He Lys Asp Leu Gly Leu Glu Phe Thr Ser 
290 295 300 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 2 
Page 139 



Thr Lys Gin Ser Leu Tyr Asp Thr Val Lys Ser Leu Gin Glu Lys Gly 
305 310 315 320 

His Leu Ala Pro Pro Pro Pro Pro Pro Ser Ala Ser Gin Glu Ser Val 

325 330 335 

Glu Asn Gly lie Lys He Gly Ser 
340 

(2) INFORMATION FOR SEQ ID NO: 247: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..246 

(D) OTHER INFORMATION: / Ceres Seq* ID 1582352 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 247: 
Met Val Glu Pro Ala Val Asn Gly Ala Lys Phe Val He Asn Ala Ala 
15 10 15 

Ala Glu Ala Lys Val Lys Arg Val Val He Thr Ser Ser He Gly Ala 

20 25 30 

Val Tyr Met Asp Pro Asn Arg Asp Pro Glu Ala Val Val Asp Glu Ser 

35 40 45 

Cys Trp Ser Asp Leu Asp Phe Cys Lys Asn Thr Lys Asn Trp Tyr Cys 

50 55 60 

Tyr Gly Lys Met Val Ala Glu Gin Ala Ala Trp Glu Thr Ala Lys Glu 
65 70 75 80 

Lys Gly Val Asp Leu Val Val Leu Asn Pro Val Leu Val Leu Gly Pro 

85 90 95 

Pro Leu Gin Pro Thr He Asn Ala Ser Leu Tyr His Val Leu Lys Tyr 

100 105 HO 

Leu Thr Gly Ser Ala Lys Thr Tyr Ala Asn Leu Thr Gin Ala Tyr Val 

115 120 125 

Asp Val Arg Asp Val Ala Leu Ala His Val Leu Val Tyr Glu Ala Pro 

130 135 140 

Ser Ala Ser Gly Arg Tyr Leu Leu Ala Glu Ser Ala Arg His Arg Gly 
145 150 155 160 

Glu Val Val Glu He Leu Ala Lys Leu Phe Pro Glu Tyr Pro Leu Pro 

165 170 175 

Thr Lys Cys Lys Asp Glu Lys Asn Pro Arg Ala Lys Pro Tyr Lys Phe 

180 185 190 

Thr Asn Gin Lys He Lys Asp Leu Gly Leu Glu Phe Thr Ser Thr Lys 

195 200 205 

Gin Ser Leu Tyr Asp Thr Val Lys Ser Leu Gin Glu Lys Gly His Leu 

210 215 220 

Ala Pro Pro Pro Pro Pro Pro Ser Ala Ser Gin Glu Ser Val Glu Asn 
225 230 235 240 

Gly He Lys He Gly Ser 
245 

(2) INFORMATION FOR SEQ ID NO: 248: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 508 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..508 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582398 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:248: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaagataccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgagg ttacagatcg tggattgttc gatttcttgg 
gaaagaagaa agacgaaaca aaaccagagg agactccgat cgcttcagag tttgagcaga 
aggttcatat ttcagagccg gagccaga 
(2) INFORMATION FOR SEQ ID NO: 249: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: 

(B) LOCATION: 



60 
120 
180 
240 
300 
360 
420 
480 



peptide 
1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582399 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 249: 
Ser Leu lie lie Val Leu Arg Gin lie Lys lie lie Phe Lys 
15 10 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys 

20 25 30 

Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys 

35 40 45 

Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr Ala Thr Glu Thr 

50 55 60 

Leu Gly Val Ala Pro Ala Met Asn Ser Gin Tyr Glu Ala Ser 
65 70 75 

Thr Phe Val Ala Glu Asn Asp Ala Cys Lys Cys Gly Ser Asp 

85 90 
Cys Asn Pro Cys Thr Cys Lys 
100 

(2) INFORMATION FOR SEQ ID NO: 250: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
peptide 



He Phe 

15 

Gly Gly 

Gly Gly 

Leu Val 

Gly Glu 
80 

Cys Lys 
95 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 



Met 
1 



peptide 
1. .77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582400 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 250: 
Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys 
5 10 



Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu 



Lys Cys 
15 

Asn Thr 



20 



25 



30 



Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn 



Ser Gin 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 



35 



40 



45 



50 



55 



60 



Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 



65 70 
(2) INFORMATION FOR SEQ ID NO: 251: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1590 base pairs 



75 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY: 

(B) LOCATION: 



DNA (genomic) 



1. .1590 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582409 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 251: 
gtacctaact ctgttttcag tttcacctat ctctcgacgc gagcttcttc ttcttctagg 
gcttccacgc gactacgcct gccaaaatca ttctacagga agcatgaagc cagtcttctg 
tgggaacttt gagtatgatg cgcgcgaagg tgacctggaa cgactattca ggaaatacgg 
caaggttgag agggttgata tgaaagctgg atgtgtttga taatcttggg acgcctatac 
tcacacctgg ccatgaatgg accataggga aatccatctc ttatccctca aaatcagtca 
ctttccatcc gcattagcac tgccatctta attgcatttc attcctcatc actttgcaca 
cttggacatg cctctgcaat ggagaaggct ctctattttc attcatatcc ccgtctgact 
tccacatttt cagttgttcc ttgacatatt aattccataa tgcaagggtt tgcttttgta 
tacatggaag atgaaaggga tgcggaagat gccatccgag cacttgaccg ctttgaattt 
gggcgtaagg gacgcagact tcgtgttgaa tggacaaaga gtgaacgtgg aggtgataaa 
agatctggtg gtggttcaag gagatcctca tccagcatga gaccttccaa gactctcttt 
gtgattaact ttgatgcgga taatactagg acccgggatc tagagaaaca ctttgagccg 
tatggaaaga tcgtaaacgt taggatcagg aggaattttg catttatcca gtatgaggca 
caagaggatg ccaccagagc attggatgct tcaaataaca gtaagctgat ggataaggtg 
atctcggtgg agtatgctgt gaaggatgat gatgctagag gaaatggaca cagtcctgaa 
agacgccgtg ataggtcacc tgaaaggaga aggcgatcac ctagtcctta caaaagagaa 
agaggaagcc ctgattatgg ccgaggagct agtcctgttg ctgcctacag aaaggaaagg 
accagtcctg actatggtcg aagacgtagc ccaagtcctt acaagaaatc aagacgtggc 
agtcccgagt atggtcgtga ccgcagaggc aatgatagcc ctcgcaggag ggagagagtc 
gcaagcccta ctaagtacag ccgcagtccc aacaacaaga gagagaggat gagccctaat 
cacagcccgt tcaagaagga gagtccgaga aatggggttg gtgaagttga aagtcccatt 
gaaaggagag agagatcgag gtctagcccc gagaatggcc aagttgaaag ccctgggtca 
ataggaagaa gagacagtga tggtgggtat gatggtgcag agagcccaat gcagaagagc 
cggtctcctc gttcgccacc agctgacgag tgataagagt ggatccacaa tctctatcaa 
agtaggatgt tgtaactgtt tgtagtcaac aacgctatgt cgtcgtgatt aAgtMttttg 
tcgtttggtt tttgatagat tcgaactcgg atcactttta ttgtcggatt gaaaaactta 
tgtcaagtta ctacatttcc gttttttttt 
(2) INFORMATION FOR SEQ ID NO: 252: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 317 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..317 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582410 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 252: 
Met Gin Gly Phe Ala Phe Val Tyr Met Glu Asp Glu Arg Asp Ala Glu 
15 10 15 

Asp Ala He Arg Ala Leu Asp Arg Phe Glu Phe Gly Arg Lys Gly Arg 

20 25 30 

Arg Leu Arg Val Glu Trp Thr Lys Ser Glu Arg Gly Gly Asp Lys Arg 

35 40 45 

Ser Gly Gly Gly Ser Arg Arg Ser Ser Ser Ser Met Arg Pro Ser Lys 

50 55 60 

Thr Leu Phe Val He Asn Phe Asp Ala Asp Asn Thr Arg Thr Arg Asp 
65 70 75 80 

Leu Glu Lys His Phe Glu Pro Tyr Gly Lys He Val Asn Val Arg He 

85 90 95 

Arg Arg Asn Phe Ala Phe He Gin Tyr Glu Ala Gin Glu Asp Ala Thr 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
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100 105 HO 



Arg 


Ala 


Leu 


Asp 


Ala 


Ser 


Asn 


Asn 


Ser 


Lys 


Leu 


Met 


Asp 


Lys 


Val 


He 




115 










120 










125 








Ser 


Val 


Glu 


Tyr 


Ala 


Val 


Lys 


Asp 


Asp 


Asp 


Ala 


Arg 


Gly 


Asn 


Gly 


His 




130 










135 










140 










Ser 


Pro 


Glu 


Arg 


Arg 


Arg 


Asp 


Arg 


Ser 


Pro 


Glu 


Arg 


Arg 


Arg 


Arg 


Ser 


145 










150 










155 










160 


Pro 


Ser 


Pro 


Tyr 


Lys 


Arg 


Glu 


Arg 


Gly 


Ser 


Pro 


Asp 


Tyr 


Gly Arg 


Gly 








165 










170 










175 




Ala 


Ser 


Pro 


Val 


Ala 


Ala 


Tyr 


Arg 


Lys 


Glu 


Arg 


Thr 


Ser 


Pro 


Asp 


Tyr 








180 










185 










190 






Gly Arg 


Arg 


Arg 


Ser 


Pro 


Ser 


Pro 


Tyr 


Lys 


Lys 


Ser 


Arg 


Arg 


Gly 


Ser 






195 










200 










205 








Pro 


Glu 


Tyr 


Gly Arg Asp 


Arg 


Arg 


Gly 


Asn 


Asp 


Ser 


Pro 


Arg 


Arg 


Arg 




210 








215 










220 










Glu 


Arg 


Val 


Ala 


Ser 


Pro 


Thr 


Lys 


Tyr 


Ser 


Arg 


Ser 


Pro 


Asn 


Asn 


Lys 


225 








230 










235 










240 


Arg 


Glu 


Arg 


Met 


Ser 


Pro 


Asn 


His 


Ser 


Pro 


Phe 


Lys 


Lys 


Glu 


Ser 


Pro 






245 










250 










255 




Arg 


Asn 


Gly 


Val 


Gly 


Glu 


Val 


Glu 


Ser 


Pro 


He 


Glu 


Arg 


Arg 


Glu 


Arg 




260 










265 










270 






Ser 


Arg 


Ser 


Ser 


Pro 


Glu 


Asn 


Gly 


Gin 


Val 


Glu 


Ser 


Pro 


Gly 


Ser 


He 




275 










280 










285 








Gly 


Arg 


Arg 


Asp 


Ser 


Asp 


Gly 


Gly 


Tyr 


Asp 


Gly 


Ala 


Glu 


Ser 


Pro 


Met 


290 










295 










300 










Gin 


Lys 


Ser 


Arg 


Ser 


Pro 


Arg 


Ser 


Pro 


Pro 


Ala 


Asp 


Glu 









305 310 315 

(2) INFORMATION FOR SEQ ID NO: 253: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..309 

(D) OTHER INFORMATION: / Ceres Seq- ID 1582411 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 253: 



Met 


Glu 


Asp 


Glu 


Arg 


Asp 


Ala 


Glu 


Asp 


Ala 


He 


Arg 


Ala 


Leu 


Asp 


Arg 


1 






5 










10 










15 




Phe 


Glu 


Phe 


Gly 


Arg 


Lys 


Gly 


Arg 


Arg 


Leu 


Arg 


Val 


Glu 


Trp 


Thr 


Lys 








20 










25 










30 






Ser 


Glu 


Arg 


Gly 


Gly 


Asp 


Lys 


Arg 


Ser 


Gly 


Gly 


Gly 


Ser 


Arg 


Arg 


Ser 






35 










40 










45 








Ser 


Ser 


Ser 


Met 


Arg 


Pro 


Ser 


Lys 


Thr 


Leu 


Phe 


Val 


He 


Asn 


Phe 


Asp 




50 










55 










60 










Ala 


Asp 


Asn 


Thr 


Arg 


Thr 


Arg 


Asp 


Leu 


Glu 


Lys 


His 


Phe 


Glu 


Pro 


Tyr 


65 








70 










75 










80 


Gly 


Lys 


He 


Val 


Asn 


Val 


Arg 


He 


Arg 


Arg 


Asn 


Phe 


Ala 


Phe 


He 


Gin 






85 










90 










95 




Tyr 


Glu 


Ala 


Gin 


Glu 


Asp 


Ala 


Thr 


Arg 


Ala 


Leu 


Asp 


Ala 


Ser 


Asn 


Asn 






100 










105 










110 






Ser 


Lys 


Leu 


Met 


Asp 


Lys 


Val 


He 


Ser 


Val 


Glu 


Tyr 


Ala 


Val 


Lys 


Asp 




115 










120 










125 








Asp 


Asp 


Ala 


Arg 


Gly 


Asn 


Gly 


His 


Ser 


Pro 


Glu 


Arg 


Arg 


Arg 


Asp 


Arg 




130 










135 










140 










Ser 


Pro 


Glu 


Arg 


Arg 


Arg 


Arg 


Ser 


Pro 


Ser 


Pro 


Tyr 


Lys 


Arg 


Glu Arg 


145 










150 










155 










160 


Gly 


Ser 


Pro 


Asp 


Tyr 


Gly 


Arg 


Gly Ala 


Ser 


Pro 


Val 


Ala 


Ala 


Tyr 


Arg 








165 










170 










175 
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Lys 


Glu 


Arg 


Thr 


Ser 


Pro 


Asp 


Tyr 


Gly 


Arg 


Arg 


Arg 


Ser 


Pro 


Ser 


Pro 






180 










185 










190 






Tyr 


Lys 


Lys 


Ser 


Arg 


Arg 


Gly 


Ser 


Pro 


Glu 


Tyr 


Gly Arg 


Asp 


Arg 


Arg 






195 










200 










205 








Gly 


Asn 


Asp 


Ser 


Pro 


Arg 


Arg 


Arg 


Glu 


Arg 


Val 


Ala 


Ser 


Pro 


Thr 


Lys 




210 










215 










220 










Tyr 


Ser 


Arg 


Ser 


Pro 


Asn 


Asn 


Lys 


Arg 


Glu 


Arg 


Met 


Ser 


Pro 


Asn 


His 


225 










230 










235 










240 


Ser 


Pro 


Phe 


Lys 


Lys 


Glu 


Ser 


Pro 


Arg 


Asn 


Gly 


Val 


Gly 


Glu 


Val 


Glu 










245 










250 










255 




Ser 


Pro 


He 


Glu 


Arg 


Arg 


Glu 


Arg 


Ser 


Arg 


Ser 


Ser 


Pro 


Glu 


Asn 


Gly 








260 










265 










270 






Gin 


Val 


Glu 


Ser 


Pro 


Gly 


Ser 


He 


Gly 


Arg 


Arg 


Asp 


Ser 


Asp 


Gly 


Gly 






275 










280 










285 








Tyr 


Asp 


Gly 


Ala 


Glu 


Ser 


Pro 


Met 


Gin 


Lys 


Ser 


Arg 


Ser 


Pro 


Arg 


Ser 


290 










295 










300 










Pro 


Pro 


Ala 


Asp 


Glu 
























305 
































(2) 


INFORMATION 


FOR 


SEQ 


ID NO:254 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 258 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 



(A) NAME /KEY : peptide 

(B) LOCATION: 1..258 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582412 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 


:254 












Met 


Arg 


Pro 


Ser 


Lys 


Thr 


Leu 


Phe 


Val 


He 


Asn 


Phe 


Asp 


Ala 


Asp 


Asn 


1 






5 










10 










15 




Thr 


Arg 


Thr 


Arg 


Asp 


Leu 


Glu 


Lys 


His 


Phe 


Glu 


Pro 


Tyr 


Gly 


Lys 


He 






20 










25 










30 






Val 


Asn 


Val 


Arg 


He 


Arg 


Arg 


Asn 


Phe 


Ala 


Phe 


He 


Gin 


Tyr 


Glu 


Ala 






35 










40 










45 








Gin 


Glu 


Asp 


Ala 


Thr 


Arg 


Ala 


Leu 


Asp 


Ala 


Ser 


Asn 


Asn 


Ser 


Lys 


Leu 




50 










55 










60 










Met 


Asp 


Lys 


Val 


He 


Ser 


Val 


Glu 


Tyr 


Ala 


Val 


Lys 


Asp 


Asp 


Asp 


Ala 


65 






70 










75 










80 


Arg 


Gly 


Asn 


Gly 


His 


Ser 


Pro 


Glu Arg 


Arg 


Arg 


Asp 


Arg 


Ser 


Pro 


Glu 




85 










90 










95 




Arg 


Arg 


Arg 


Arg 


Ser 


Pro 


Ser 


Pro 


Tyr 


Lys 


Arg 


Glu 


Arg 


Gly 


Ser 


Pro 




100 










105 










110 






Asp 


Tyr 


Gly 


Arg 


Gly 


Ala 


Ser 


Pro 


Val 


Ala 


Ala 


Tyr 


Arg 


Lys 


Glu 


Arg 




115 










120 










125 








Thr 


Ser 


Pro 


Asp 


Tyr 


Gly Arg 


Arg 


Arg 


Ser 


Pro 


Ser 


Pro 


Tyr 


Lys 


Lys 




130 










135 










140 










Ser 


Arg 


Arg 


Gly 


Ser 


Pro 


Glu 


Tyr 


Gly 


Arg 


Asp 


Arg 


Arg 


Gly Asn Asp 


145 






150 










155 










160 


Ser 


Pro 


Arg 


Arg 


Arg 


Glu 


Arg 


Val 


Ala 


Ser 


Pro 


Thr 


Lys 


Tyr 


Ser 


Arg 










165 










170 










175 




Ser 


Pro 


Asn 


Asn 


Lys 


Arg 


Glu 


Arg 


Met 


Ser 


Pro 


Asn 


His 


Ser 


Pro 


Phe 








180 










185 










190 






Lys 


Lys 


Glu 


Ser 


Pro 


Arg 


Asn 


Gly 


Val 


Gly 


Glu 


Val 


Glu 


Ser 


Pro 


He 


195 










200 










205 








Glu 


Arg 


Arg 


Glu 


Arg 


Ser 


Arg 


Ser 


Ser 


Pro 


Glu 


Asn 


Gly 


Gin 


Val 


Glu 




210 










215 










220 










Ser 


Pro 


Gly 


Ser 


lie 


Gly Arg 


Arg 


Asp 


Ser 


Asp 


Gly 


Gly 


Tyr 


Asp 


Gly 


225 








230 










235 










240 


Ala 


Glu 


Ser 


Pro 


Met 


Gin 


Lys 


Ser 


Arg 


Ser 


Pro 


Arg 


Ser 


Pro 


Pro 


Ala 
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245 250 255 

Asp Glu 

(2) INFORMATION FOR SEQ ID NO: 255: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1358 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..135 8 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582416 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 255: 
acaacataaa cccaatctct cactcagact aagacagagt cagaaacaat ggcgaaatct 
ccagaaacag agcatccgaa caaagtcttt ggttggggtg ctagagacaa atccggtgtt 120 
ctctctcctt ttcacttctc tagaagagac aatggtgaaa atgatgtgac agtgaagatc 1Qn 
ttgttctgtg gagtttgcca cactgattta cacaccatca aaaacgactg gggatactcg 
tattacccag tagttccagg gcatgaaatc gttgggatcg ctacaaaagt tggtaagaac 
gtgactaaat tcaaagaagg agatcgtgtc ggagtaggag tgatcagtgg ctcgtgccaa 
tcttgcgaat cttgtgacca agatcttgaa aactactgtc ctcaaatgtc tttcacatac 
aatgcgattg gatccgatgg aaccaagaat tacggtggct attcggagaa cattgtggtt 
gatcaacggt ttgttttgcg gtttccggag aatttaccga gcgattcggg tgcgccgttg 
ctgtgtgctg gaatcactgt gtatagtcca atgaagtatt atggtatgac tgaggcaggg 
aagcatttag gggttgctgg acttggtggg cttggtcatg ttgctgttaa gattggtaaa 
gcttttggtt tgaaagttac tgtcattagt tcttcttcta cgaaagcaga ggaagccatt 720 
aatcatcttg gtgctgattc gtttcttgtc acaactgatc ctcagaaaat gaaggctgca 780 
attggaacaa tggactacat tatcgatacg atatcagcag tacatgctct gtatccgttg 
ctcggtttac tcaaagtcaa cggaaagctc attgctttag gcttacctga gaagcctctc 
gagctaccaa tgttccctct tgttctcgga aggaaaatgg ttggaggaag tgacgtggga 
gggatgaagg agacacaaga gatgcttgat ttctgcgcta agcacaacat tacagctgat 1020 
attgaattga ttaagatgga tgagattaac actgcgatgg agaggcttgc taagtctgat 1080 
gttaggtaca ggttcgtgat caacgtggct aactccttga gccctccatg aatgatccgg 1140 
atctaagaat tgagcattga ggaggcttta aatctatgtc ataatcttgg tgtttgtttg 1200 
tgtctctcga gttatcttcg ttttctgctt tcggtttgag aatcggtttc ttctcagaca 1260 
agttacctta tttcgttgtt ttcttctcat gttctgtttc ctgagagaaa ctctttctga 1320 
ttccagataa gactttgatc cattttcagt ttgctaat 
(2) INFORMATION FOR SEQ ID NO:256: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 360 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..360 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582417 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 256: 
Met Ala Lys Ser Pro Glu Thr Glu His Pro Asn Lys Val Phe Gly Trp 
15 10 15 

Gly Ala Arg Asp Lys Ser Gly Val Leu Ser Pro Phe His Phe Ser Arg 

20 25 30 

Arg Asp Asn Gly Glu Asn Asp Val Thr Val Lys lie Leu Phe Cys Gly 

35 40 45 

Val Cys His Thr Asp Leu His Thr He Lys Asn Asp Trp Gly Tyr Ser 

50 55 60 

Tyr Tyr Pro Val Val Pro Gly His Glu He Val Gly He Ala Thr Lys 
65 70 75 80 

Val Gly Lys Asn Val Thr Lys Phe Lys Glu Gly Asp Arg Val Gly Val 



60 



180 
240 
300 
360 
420 
480 
540 
600 
660 



840 
900 
960 
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85 90 95 



Glv 

^ - 1 - _Y 


Val 


He 


Ser 


Gly 


Ser 


Cys 


Gin 


Ser 


Cys 


Glu 


Ser 


Cys 


Asp 


Gin 


Asp 






100 










105 










110 






Leu 


Glu 


Asn 


Tyr 


Cys 


Pro 


Gin 


Met 


Ser 


Phe 


Thr 


Tyr 


Asn 


Ala 


He 


Gly 






115 








120 










125 








Ser 


Asp 
130 


Gly 


Thr 


Lys 


Asn 


Tyr 
135 


Gly 


Gly 


Tyr 


Ser 


Glu 

140 


Asn 


He 


Val 


Val 


Asp 


Gin 


Arq 


Phe 


Val 


Leu 


Arg 


Phe 


Pro 


Glu 


Asn 


Leu 


Pro 


Ser 


Asp 


Ser 


145 








150 










155 










160 


Gly Ala 


Pro 


Leu 


Leu 


Cys 


Ala 


Gly 


He 


Thr 


Val 


Tyr 


Ser 


Pro 


Met 


Lys 










165 










170 










175 




Tvr 


Tvr 


Gly 


Met 


Thr 


Glu 


Ala 


Gly 


Lys 


His 


Leu 


Gly 


Val 


Ala 


Gly 


Leu 




180 










185 










190 






Gly 


Gly 


Leu 


Gly 


His 


Val 


Ala 


Val 


Lys 


He 


Gly 


Lys 


Ala 


Phe 


Gly 


Leu 


195 










200 










205 








Lys 


Val 


Thr 


Val 


He 


Ser 


Ser 


Ser 


Ser 


Thr 


Lys 


Ala 


Glu 


Glu 


Ala 


He 


210 










215 










220 










Asn 


His 


Leu 


Gly Ala 


Asp 


Ser 


Phe 


Leu 


Val 


Thr 


Thr 


Asp 


Pro 


Gin 


Lys 


225 










230 










235 










240 


Met 


Lys 


Ala 


Ala 


He 


Gly 


Thr 


Met 


Asp 


Tyr 


He 


He 


Asp 


Thr 


He 


Ser 








245 










250 










255 




Ala 


Val 


His 


Ala 
260 


Leu 


Tyr 


Pro 


Leu 


Leu 
265 


Gly 


Leu 


Leu 


Lys 


Val 
270 


Asn 


Gly 


Lys 


Leu 


lie 


Ala 


Leu 


Gly 


Leu 


Pro 


Glu 


Lys 


Pro 


Leu 


Glu 


Leu 


Pro 


Met 




275 










280 










285 








Phe 


Pro 


Leu 


Val 


Leu 


Gly Arg 


Lys 


Met 


Val 


Gly 


Gly 


Ser 


As P 


Val 


Gly 




290 










295 










300 










Gly 


Met 


Lys 


Glu 


Thr 


Gin 


Glu 


Met 


Leu 


Asp 


Phe 


Cys 


Ala 


Lys 


His 


Asn 


305 








310 










315 










320 


He 


Thr 


Ala 


Asp 


He 


Glu 


Leu 


He 


Lys 


Met 


Asp 


Glu 


He 


Asn 


Thr 


Ala 








325 










330 










335 




Met 


Glu 


Arg 


Leu 


Ala 


Lys 


Ser 


Asp 


Val 


Arg 


Tyr 


Arg 


Phe 


Val 


He 


Asn 






340 










345 










350 






Val 


Ala 


Asn 


Ser 


Leu 


Ser 


Pro 


Pro 



















355 360 
(2) INFORMATION FOR SEQ ID NO: 257: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 241 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 



(A} NAME /KEY : peptide 
(B) LOCATION: 1. .241 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582418 





<xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 257 












Met 


Ser 


Phe 


Thr 


Tyr 


Asn 


Ala 


He 


Gly 


Ser 


Asp 


Gly 


Thr 


Lys 


Asn 


Tyr 


1 








5 










10 










15 




Gly 


Gly 


Tyr 


Ser 


Glu 


Asn 


He 


Val 


Val 


Asp 


Gin 


Arg 


Phe 


Val 


Leu 


Arg 




20 










25 










30 






Phe 


Pro 


Glu 
35 


Asn 


Leu 


Pro 


Ser 


Asp 
40 


Ser 


Gly 


Ala 


Pro 


Leu 
45 


Leu 


Cys 


Ala 


Gly 


He 


Thr 


Val 


Tyr 


Ser 


Pro 


Met 


Lys 


Tyr 


Tyr 


Gly 


Met 


Thr 


Glu 


Ala 


50 










55 










60 










Gly 


Lys 


His 


Leu 


Gly 


Val 


Ala 


Gly 


Leu 


Gly 


Gly 


Leu 


Gly 


His 


Val 


Ala 


65 








70 










75 










80 


Val 


Lys 


He 


Gly 


Lys 


Ala 


Phe 


Gly 


Leu 


Lys 


Val 


Thr 


Val 


He 


Ser 


Ser 








85 










90 










95 




Ser 


Ser 


Thr 


Lys 
100 


Ala 


Glu 


Glu 


Ala 


He 
105 


Asn 


His 


Leu 


Gly 


Ala 
110 


Asp 


Ser 
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Phe 


Leu 


Val 


Thr 


Thr 


Asp 


Pro 


Gin 


Lys 


Met 


Lys 


Ala 


Ala 


He 


Gly 


Thr 






115 










120 










125 








Met 


Asp 


Tyr 


He 


He 


Asp 


Thr 


He 


Ser 


Ala 


Val 


His 


Ala 


Leu 


Tyr 


Pro 




130 










135 










140 










Leu 


Leu 


Gly 


Leu 


Leu 


Lys 


Val 


Asn 


Gly 


Lys 


Leu 


He 


Ala 


Leu 


Gly 


Leu 


145 








150 










155 










160 


Pro 


Glu 


Lys 


Pro 


Leu 


Glu 


Leu 


Pro 


Met 


Phe 


Pro 


Leu 


Val 


Leu 


Gly Arg 








165 










170 










175 




Lys 


Met 


Val 


Gly 


Gly 


Ser 


Asp 


Val 


Gly 


Gly 


Met 


Lys 


Glu 


Thr 


Gin 


Glu 






180 










185 










ion 
± y u 






Met 


Leu 


Asp 


Phe 


Cys 


Ala 


Lys 


His 


Asn 


He 


Thr 


Ala 


Asp 


He 


Glu 


Leu 






195 










200 










205 








He 


Lys 


Met 


Asp 


Glu 


He 


Asn 


Thr 


Ala 


Met 


Glu 


Arg 


Leu 


Ala 


Lys 


Ser 




210 








215 










220 










Asp 


Val 


Arg 


Tyr 


Arg 


Phe 


Val 


He 


Asn 


Val 


Ala 


Asn 


Ser 


Leu 


Ser 


Pro 



225 230 235 

Pro 

(2) INFORMATION FOR SEQ ID NO: 258: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1.H86 







(D) OTHER 


INFORMATION: 


/ Ceres 


Seq 


[. ID 


■ 158 


2419 






(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO 


»:258 












Met 


Lys 


Tyr 


Tyr 


Gly 


Met 


Thr 


Glu 


Ala 


Gly 


Lys 


His 


Leu 


Gly 


Val 


Ala 


1 




5 










10 










15 




Gly 


Leu 


Gly 


Gly 


Leu 


Gly 


His 


Val 


Ala 


Val 


Lys 


He 


Gly 


Lys 


Ala 


Phe 






20 










25 










30 






Gly 


Leu 


Lys 


Val 


Thr 


Val 


He 


Ser 


Ser 


Ser 


Ser 


Thr 


Lys 


Ala 


Glu 


Glu 




35 










40 










45 








Ala 


He 


Asn 


His 


Leu 


Gly Ala 


Asp 


Ser 


Phe 


Leu 


Val 


Thr 


Thr 


Asp 


Pro 




50 










55 










60 










Gin 


Lys 


Met 


Lys 


Ala 


Ala 


He 


Gly 


Thr 


Met 


Asp 


Tyr 


He 


He 


Asp 


Thr 


65 






70 










75 










80 


He 


Ser 


Ala 


Val 


His 


Ala 


Leu 


Tyr 


Pro 


Leu 


Leu 


Gly 


Leu 


Leu 


Lys 


Val 










85 










90 










95 




Asn 


Gly 


Lys 


Leu 


He 


Ala 


Leu 


Gly 


Leu 


Pro 


Glu 


Lys 


Pro 


Leu 


Glu 


Leu 




100 










105 










110 






Pro 


Met 


Phe 


Pro 


Leu 


Val 


Leu 


Gly Arg 


Lys 


Met 


Val 


Gly 


Gly 


Ser 


Asp 






115 










120 










125 








Val 


Gly 


Gly 


Met 


Lys 


Glu 


Thr 


Gin 


Glu 


Met 


Leu 


Asp 


Phe 


Cys 


Ala 


Lys 




130 








135 










140 










His 


Asn 


He 


Thr 


Ala 


Asp 


He 


Glu 


Leu 


He 


Lys 


Met 


Asp 


Glu 


He 


Asn 


145 










150 










155 










160 


Thr 


Ala 


Met 


Glu 


Arg 


Leu 


Ala 


Lys 


Ser 


Asp 


Val 


Arg 


Tyr 


Arg 


Phe 


Val 










165 










170 










175 




He 


Asn 


Val 


Ala 


Asn 


Ser 


Leu 


Ser 


Pro 


Pro 















180 185 
(2) INFORMATION FOR SEQ ID NO: 259: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 570 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..570 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582420 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:259: 
aaactcatca cttacttaac atactaagag agttattaga tcttgaaaaa catggcttcc 60 
aaggctttga ttctgttggg tctcttcgca attcttctgg tggtctccga agtttctgcc 120 
gcaaggcagt cgggcatggt gaagccagag agtgaggaaa ctgtgcaacc tgaaggttat 180 
cacggaggac atggtggtca cggaggggga ggccactacg gaggaggagg ccacgggcat 240 
ggaggacaca acggaggagg gggccacgga cttgacggat acggaggagg acatggagga 300 
cactacggag gaggaggagg acactacgga ggaggaggag gccacggtgg tggtggacac 360 
tatggaggtg gaggacacca tggaggagga ggtcacgggc tgaacgaacc tgttcagacg 420 
aagccgggtg tttaaaagtt ataactatca aataaattca ccatgcataa ttgcatctct 480 
atatacactt atgtcttata tgtatccatc aaaataaacc atggtgagtt tgtaatgcag 540 
ttccttcaga aatgtgtgga ataatgtttc 
(2) INFORMATION FOR SEQ ID NO: 260: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 127 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..127 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582421 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 


:260 












Met 


Ala 


Ser 


Lys 


Ala 


Leu 


He 


Leu 


Leu 


Gly 


Leu 


Phe 


Ala 


He 


Leu 


Leu 


1 






5 










10 










15 




Val 


Val 


Ser 


Glu 


Val 


Ser 


Ala 


Ala 


Arg 


Gin 


Ser 


Gly 


Met 


Val 


Lys 


Pro 








20 










25 










30 






Glu 


Ser 


Glu 


Glu 


Thr 


Val 


Gin 


Pro 


Glu 


Gly 


Tyr 


His 


Gly 


Gly 


His 


Gly 






35 










40 










45 








Gly 


His 


Gly 


Gly 


Gly 


Gly 


His 


Tyr 


Gly 


Gly 


Gly 


Gly 


His 


Gly 


His 


Gly 


50 










55 










60 










Gly 


His 


Asn 


Gly 


Gly 


Gly 


Gly 


His 


Gly 


Leu 


Asp 


Gly 


Tyr 


Gly 


Gly 


Gly 


65 










70 










75 










80 


His 


Gly 


Gly 


His 


Tyr 


Gly 


Gly 


Gly 


Gly 


Gly 


His 


Tyr 


Gly 


Gly 


Gly 


Gly 






85 










90 










95 




Gly 


His 


Gly 


Gly 


Gly 


Gly 


His 


Tyr 


Gly 


Gly 


Gly 


Gly 


His 


His 


Gly 


Gly 




100 










105 










110 






Gly 


Gly 


His 


Gly 


Leu 


Asn 


Glu 


Pro 


Val 


Gin 


Thr 


Lys 


Pro 


Gly 


Val 




115 










120 










125 








(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:261: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 99 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..99 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582422 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 261: 
Met Val Lys Pro Glu Ser Glu Glu Thr Val Gin Pro Glu Gly Tyr His 
15 10 15 

Gly Gly His Gly Gly His Gly Gly Gly Gly His Tyr Gly Gly Gly Gly 

20 25 30 

His Gly His Gly Gly His Asn Gly Gly Gly Gly His Gly Leu Asp Gly 
35 40 45 
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Tyr Gly Gly Gly His Gly Gly His Tyr Gly Gly Gly Gly Gly His Tyr 

50 55 60 

Gly Gly Gly Gly Gly His Gly Gly Gly Gly His Tyr Gly Gly Gly Gly 
65 70 75 80 

His His Gly Gly Gly Gly His Gly Leu Asn Glu Pro Val Gin Thr Lys 
85 90 95 

Pro Gly Val 

(2) INFORMATION FOR SEQ ID NO: 262: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 70 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..70 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582423 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 262: 
Met Val Val Thr Glu Gly Glu Ala Thr Thr Glu Glu Glu Ala Thr Gly 
15 10 15 

Met Glu Asp Thr Thr Glu Glu Gly Ala Thr Asp Leu Thr Asp Thr Glu 

20 25 30 

Glu Asp Met Glu Asp Thr Thr Glu Glu Glu Glu Asp Thr Thr Glu Glu 

35 40 45 

Glu Glu Ala Thr Val Val Val Asp Thr Met Glu Val Glu Asp Thr Met 

50 55 60 

Glu Glu Glu Val Thr Gly 

70 

INFORMATION FOR SEQ ID NO: 2 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 611 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE : 

(A) NAME /KEY : - 

(B) LOCATION: 1..611 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582548 
SEQUENCE DESCRIPTION: SEQ ID NO: 2 63: 
aaaaagtcag ctccgagtct gcgttttact tcttctcctt gagtttcttc ttctagatct 
gatcgcgaat caccaaaagg ttttatttta tacaatgggt cgtggaaaca gctgtggtgg 
aggtcaaagc tcattggatt atctctttgg tggtgacgct cctgctccta agccagttcc 
agctcctcgt cccgctccta ctgagtctaa caacggacct gcaccaccag taacagctgt 
gactgcaacc gcactcacga ctgctactac ttctgttgag cctgcagagc ttaacaagca 
gattcctgct ggtatcaaaa ctcctgttaa caactatgcc agagctgaag gacagaacac 
cggcaacttc ctcactgacc gtccttcgac caaagttcac gcagctccgg gaggaggatc 
atccttggat tatctcttca ctggtggcaa gtaaaataat tgcaaagacc tttatctatc 
cattgtcttt gctgcgttat ctcactatga aactgtttga tgtgagcctt taaatgataa 
gaagtcggtt tcttgtctca actcttatct gtaatatttt gctgaaaaaa tgtttgaatc 
aaaaccttcc c 

(2) INFORMATION FOR SEQ ID NO: 2 64: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 150 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 



65 
(2) 



(ii) 
(ix) 



(xi) 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 2 
Page 14 9 



(A) NAME /KEY : peptide 

(B) LOCATION: 1..150 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582549 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 264: 



Lys 


Ser 


Gin 


Leu 


Arg 


Val 


Cys 


Val 


Leu 


Leu 


Leu 


Leu 


Leu 


Glu 


Phe 


Leu 


1 








5 










10 










15 




Leu 


Leu 


Asp 


Leu 


He 


Ala 


Asn 


His 


Gin 


Lys 


Val 


Leu 


Phe 


Tyr 


Thr 


Met 






20 










25 










30 






Gly Arg 


Gly 


Asn 


Ser 


Cys 


Gly 


Gly 


Gly 


Gin 


Ser 


Ser 


Leu 


Asp 


Tyr 


Leu 






35 










40 










45 








Phe 


Gly 


Gly 


Asp 


Ala 


Pro 


Ala 


Pro 


Lys 


Pro 


Val 


Pro 


Ala 


Pro 


Arg 


Pro 




50 










55 










60 










Ala 


Pro 


Thr 


Glu 


Ser 


Asn 


Asn 


Gly 


Pro 


Ala 


Pro 


Pro 


Val 


Thr 


Ala 


Val 


65 










70 










75 










80 


Thr 


Ala 


Thr 


Ala 


Leu 


Thr 


Thr 


Ala 


Thr 


Thr 


Ser 


Val 


Glu 


Pro 


Ala 


Glu 










85 










90 










95 




Leu 


Asn 


Lys 


Gin 


He 


Pro 


Ala 


Gly 


He 


Lys 


Thr 


Pro 


Val 


Asn 


Asn 


Tyr 






100 










105 










110 






Ala 


Arg 


Ala 


Glu 


Gly 


Gin 


Asn 


Thr 


Gly 


Asn 


Phe 


Leu 


Thr 


Asp 


Arg 


Pro 




115 










120 










125 








Ser 


Thr 


Lys 


Val 


His 


Ala 


Ala 


Pro 


Gly 


Gly 


Gly 


Ser 


Ser 


Leu 


Asp 


Tyr 




130 








135 










140 










Leu 


Phe 


Thr 


Gly 


Gly 


Lys 






















145 










150 






















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO: 2 65: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 119 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..119 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582550 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO 


:265 












Met 


Gly Arg 


Gly 


Asn 


Ser 


Cys 


Gly 


Gly 


Gly 


Gin 


Ser 


Ser 


Leu 


Asp 


Tyr 


1 








5 










10 










15 




Leu 


Phe 


Gly 


Gly 


Asp 


Ala 


Pro 


Ala 


Pro 


Lys 


Pro 


Val 


Pro 


Ala 


Pro 


Arg 






20 










25 










30 






Pro 


Ala 


Pro 

35 


Thr 


Glu 


Ser 


Asn 


Asn 
40 


Gly 


Pro 


Ala 


Pro 


Pro 
45 


Val 


Thr 


Ala 


Val 


Thr 
50 


Ala 


Thr 


Ala 


Leu 


Thr 
55 


Thr 


Ala 


Thr 


Thr 


Ser 
60 


Val 


Glu 


Pro 


Ala 


Glu 


Leu 


Asn 


Lys 


Gin 


He 


Pro 


Ala 


Gly 


He 


Lys 


Thr 


Pro 


Val 


Asn 


Asn 


65 








70 










75 










80 


Tyr 


Ala 


Arg 


Ala 


Glu 


Gly 


Gin 


Asn 


Thr 


Gly 


Asn 


Phe 


Leu 


Thr 


Asp 


Arg 






85 










90 










95 




Pro 


Ser 


Thr 


Lys 
100 


Val 


His 


Ala 


Ala 


Pro 
105 


Gly 


Gly 


Gly 


Ser 


Ser 
110 


Leu 


Asp 


Tyr 


Leu 


Phe 
115 


Thr 


Gly 


Gly 


Lys 




















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO: 266: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1815 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 
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180 
240 
300 



{B) LOCATION: 1..1815 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582551 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 266: 
cccgtcattt tggtctgagg ttttggtgac gatggcgacg gcggtagtga tgaacggcga 60 
gctgaaaaag caacctcggc caggtaaagg cggctatcag ggccgtggat taactgaaga 120 
agaagctcga gttcgcgcca tatcggagat tgttagcacc atgattgagc gttcacaccg 
caacgagaat gttgacctaa acgcaattaa aaccgccgct tgccggaaat acggcctagc 
acgtgcgcct aagctcgttg agatgattgc tgcgcttcct gattcagaga gagagactct 
tctcccgaag ctccgtgcca aaccggttcg aacagcttca gggatcgccg ttgtggcggt 360 
tatgtcgaag cctcataggt gcccgcatat agctacgacg gggaatatat gcgtttattg 420 
tcccggtgga cctgactctg actttgagta tagtactcag tcttacactg gatatgagcc 480 
taccagcatg cgagctattc gagccaggta caatccatat gttcaggcaa gaagcaggat 540 
agatcagctg aagaggttgg gtcacagtgt agataaggtt gagttcattt tgatgggagg 600 
tactttcatg tcactgcctg ctgagtatcg ggatttcttc atacggaatc ttcatgatgc 660 
tttatcagga cacacttctg ccaacgttga agaggcagtt gcttactctg aacatagtgc 720 
aactaaatgc attgggatga caattgAaaa cgaggccaga ttactgcctt ggacctcatt 780 
tacgacaaat gctgatttac ggttgcaccc ggctagagat aggtgtccag agcacatatg 
aagatgttgc ccgtgacaca aatagaggtc atactgttgc tgctgtagct gactgcttct 
gcttggctaa agatgctggt ttcaaggtgg ttgcacatat gatgcctgat cttcctaatg 
ttggggttga gagagacatg gaaagtttca aggagttttt cgagagccca tcttttagag 
cagatgggtt aaaaatatat cccacccttg tgatccgtgg aactggactt tatgaactat 
ggaaaactgg gaggtaccga aattatccac ctgagcagct tgtggatata gttgcaagga 1140 
ttctctccat ggtacctcca tggacacgtg tatatagagt tcagcgtgat attcctatgc 1200 
ctctggttac gtcaggggta gaaaaaggaa atcttcgtga actggctcta gccagaatgg 1260 
atgacttggg ccttaaatgc cgtgatgtcc gtactcgtga agctggaatt caggacattc 1320 
atcataaaat taagccagaa caagtagagc ttgtgcgtcg tgattacact gccaatgaag 1380 
gttgggagac gttcctttct tatgaagata cacgccagga cattcttgtt ggattgctac 1440 
gtttgcgaaa atgcgggaag aatgtaacgt gtccagaact catgggaaag tgttctgttg 1500 
tccgtgagct tcatgtatac ggaacagctg taccagttca tggtcgagat gctgataagt 1560 
tgcaacatca gggctatggt acacttctga tggaagaagc agagaggatt gctagaagag 1620 
aacatcgatc taacaaaatc ggtgtgattt ctggtgtagg aaccagacat tactacagaa 1680 
agttgggtta tgaattggaa ggtccttaca tggtgaagca tcttctttga aattatgttt 1740 
acttaaaaac caaattcgta aaatctttat gttgaactag tgagattaat cccttgtggt 1800 
atcattttgc tcttc 

(2) INFORMATION FOR SEQ ID NO: 2 67: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..279 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582552 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 267: 
Pro Ser Phe Trp Ser Glu Val Leu Val Thr Met Ala Thr Ala Val Val 
15 10 15 

Met Asn Gly Glu Leu Lys Lys Gin Pro Arg Pro Gly Lys Gly Gly Tyr 

20 25 30 

Gin Gly Arg Gly Leu Thr Glu Glu Glu Ala Arg Val Arg Ala lie Ser 

35 40 45 

Glu lie Val Ser Thr Met lie Glu Arg Ser His Arg Asn Glu Asn Val 

50 55 60 

Asp Leu Asn Ala lie Lys Thr Ala Ala Cys Arg Lys Tyr Gly Leu Ala 
65 70 75 80 

Arg Ala Pro Lys Leu Val Glu Met lie Ala Ala Leu Pro Asp Ser Glu 

85 90 95 

Arg Glu Thr Leu Leu Pro Lys Leu Arg Ala Lys Pro Val Arg Thr Ala 

100 105 110 

Ser Gly lie Ala Val Val Ala Val Met Ser Lys Pro His Arg Cys Pro 



840 
900 
960 
1020 
1080 
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115 120 125 

His He Ala Thr Thr Gly Asn He Cys Val Tyr Cys Pro Gly Gly Pro 

130 135 140 

Asp Ser Asp Phe Glu Tyr Ser Thr Gin Ser Tyr Thr Gly Tyr Glu Pro 
145 150 155 160 

Thr Ser Met Arg Ala He Arg Ala Arg Tyr Asn Pro Tyr Val Gin Ala 

165 170 175 

Arg Ser Arg lie Asp Gin Leu Lys Arg Leu Gly His Ser Val Asp Lys 

180 185 190 

Val Glu Phe He Leu Met Gly Gly Thr Phe Met Ser Leu Pro Ala Glu 

195 200 205 

Tyr Arg Asp Phe Phe He Arg Asn Leu His Asp Ala Leu Ser Gly His 

210 215 220 

Thr Ser Ala Asn Val Glu Glu Ala Val Ala Tyr Ser Glu His Ser Ala 
225 230 235 240 

Thr Lys Cys He Gly Met Thr He Glu Asn Glu Ala Arg Leu Leu Pro 

245 250 255 

Trp Thr Ser Phe Thr Thr Asn Ala Asp Leu Arg Leu His Pro Ala Arg 

260 265 270 

Asp Arg Cys Pro Glu His He 
275 

(2) INFORMATION FOR SEQ ID NO: 2 68 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 269 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..269 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582553 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 268: 
Met Ala Thr Ala Val Val Met Asn Gly Glu Leu Lys Lys Gin Pro Arg 
15 10 15 

Pro Gly Lys Gly Gly Tyr Gin Gly Arg Gly Leu Thr Glu Glu Glu Ala 

20 25 30 

Arg Val Arg Ala He Ser Glu He Val Ser Thr Met He Glu Arg Ser 

35 40 45 

His Arg Asn Glu Asn Val Asp Leu Asn Ala He Lys Thr Ala Ala Cys 

50 55 60 

Arg Lys Tyr Gly Leu Ala Arg Ala Pro Lys Leu Val Glu Met He Ala 
65 70 75 80 

Ala Leu Pro Asp Ser Glu Arg Glu Thr Leu Leu Pro Lys Leu Arg Ala 

85 90 95 

Lys Pro Val Arg Thr Ala Ser Gly He Ala Val Val Ala Val Met Ser 

100 105 HO 

Lys Pro His Arg Cys Pro His He Ala Thr Thr Gly Asn He Cys Val 

115 120 125 

Tyr Cys Pro Gly Gly Pro Asp Ser Asp Phe Glu Tyr Ser Thr Gin Ser 

130 135 140 

Tyr Thr Gly Tyr Glu Pro Thr Ser Met Arg Ala He Arg Ala Arg Tyr 
145 150 155 160 

Asn Pro Tyr Val Gin Ala Arg Ser Arg He Asp Gin Leu Lys Arg Leu 

165 170 175 

Gly His Ser Val Asp Lys Val Glu Phe He Leu Met Gly Gly Thr Phe 

180 135 190 

Met Ser Leu Pro Ala Glu Tyr Arg Asp Phe Phe He Arg Asn Leu His 

195 200 205 

Asp Ala Leu Ser Gly His Thr Ser Ala Asn Val Glu Glu Ala Val Ala 
210 215 220 
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Tyr Ser Glu His Ser Ala Thr Lys Cys He Gly Met Thr He Glu Asn 
225 230 235 240 

Glu Ala Arg Leu Leu Pro Trp Thr Ser Phe Thr Thr Asn Ala Asp Leu 

245 250 255 

Arg Leu His Pro Ala Arg Asp Arg Cys Pro Glu His He 

260 265 
(2) INFORMATION FOR SEQ ID NO: 269: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 313 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..313 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582554 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 269: 
Met Leu He Tyr Gly Cys Thr Arg Leu Glu He Gly Val Gin Ser Thr 
15 10 15 

Tyr Glu Asp Val Ala Arg Asp Thr Asn Arg Gly His Thr Val Ala Ala 

20 25 30 

Val Ala Asp Cys Phe Cys Leu Ala Lys Asp Ala Gly Phe Lys Val Val 

35 40 45 

Ala His Met Met Pro Asp Leu Pro Asn Val Gly Val Glu Arg Asp Met 

50 55 60 

Glu Ser Phe Lys Glu Phe Phe Glu Ser Pro Ser Phe Arg Ala Asp Gly 
65 70 75 80 

Leu Lys He Tyr Pro Thr Leu Val He Arg Gly Thr Gly Leu Tyr Glu 

85 90 95 

Leu Trp Lys Thr Gly Arg Tyr Arg Asn Tyr Pro Pro Glu Gin Leu Val 

100 105 HO 

Asp He Val Ala Arg He Leu Ser Met Val Pro Pro Trp Thr Arg Val 

115 120 125 

Tyr Arg Val Gin Arg Asp He Pro Met Pro Leu Val Thr Ser Gly Val 

130 135 140 

Glu Lys Gly Asn Leu Arg Glu Leu Ala Leu Ala Arg Met Asp Asp Leu 
145 150 155 160 

Gly Leu Lys Cys Arg Asp Val Arg Thr Arg Glu Ala Gly He Gin Asp 

165 170 175 

He His His Lys He Lys Pro Glu Gin Val Glu Leu Val Arg Arg Asp 

180 185 190 

Tyr Thr Ala Asn Glu Gly Trp Glu Thr Phe Leu Ser Tyr Glu Asp Thr 

195 200 205 

Arg Gin Asp He Leu Val Gly Leu Leu Arg Leu Arg Lys Cys Gly Lys 

210 215 220 

Asn Val Thr Cys Pro Glu Leu Met Gly Lys Cys Ser Val Val Arg Glu 
225 230 235 240 

Leu His Val Tyr Gly Thr Ala Val Pro Val His Gly Arg Asp Ala Asp 

245 250 255 

Lys Leu Gin His Gin Gly Tyr Gly Thr Leu Leu Met Glu Glu Ala Glu 

260 265 270 

Arg He Ala Arg Arg Glu His Arg Ser Asn Lys He Gly Val lie Ser 

275 280 285 

Gly Val Gly Thr Arg His Tyr Tyr Arg Lys Leu Gly Tyr Glu Leu Glu 

290 295 300 

Gly Pro Tyr Met Val Lys His Leu Leu 
305 310 
(2) INFORMATION FOR SEQ ID NO: 270: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1187 base pairs 
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(ii) 
(ix) 



(xi) 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1187 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582559 
SEQUENCE DESCRIPTION: SEQ ID NO: 270: 
ccaacttaaa aaaccctaat ttctcaatct cttcttctac ttttttttat aacgatggct 
tcagaggatc aatcggcggc gagatctacc gggaaggtga actggttcaa cgcttctaaa 
ggctatggtt tcattactcc tgacgatggc agcgtagagc ttttcgttca tcaatcttca 
attgtctccg aaggttaccg gagtttaacc gtcggcgacg cggttgagtt cgctattact 
cagggaagcg acggtaagac taaagccgtc aatgttactg ctcctggtgg tggttctctc 
aagaaggaga ataactctcg tggtaacggt gctaggcgcg gcggcggtgg aagcggttgc 
tacaattgcg gtgagttagg tcatatctct aaagattgtg gtattggtgg cggcggcgga 
ggtggtgaac gtagatctag aggaggagaa ggttgttaca attgtggtga tactggtcac 
ttcgctaggg attgtacttc agctggaaac ggtgaccaac gtggagccac caaaggtgga 
aacgatggtt gctacacttg cggtgatgtt ggtcacttgg ctagtattgt actcagaaat 
cagttggaaa cggagaccaa cgtggagcgg tcaaaggtgg aaacgatggt tgctacactt 
gtggtgatgt tggtcacttt gctagggatt gtactcagaa ggttgctgcc ggaaacgtca 
gaagcggtgg tggtggtagt ggaacttgtt attcatgcgg tggagttggt cacattgcaa 
gagattgtcc gactaagaga cagccttctc gtgggtgtta ccagtgtggt ggttctggtc 
acttggctcg tgattgtgac cagagaggaa gcggtggagg aggtaatgat aatgcgtgct 
acaagtgtgg taaggaaggt cactttgcaa gggaatgttc ttctgtagct taatcgattt 
cctaatcagc aaaacaaaaa aacaagaatg aaattgaatc gagttatata gtttggtata 
tattactctt cgttttcatt tatctttttt tttgttgttg atgggaatga aattgcctgg 
tccttttggt gtgtttttga gcttttatta ttatacagag tgatcccttt ttgttataac 
tattacaagt ttttagcttt atttgatatg gatgctctct ccttttc 
(2) INFORMATION FOR SEQ ID NO: 271: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 97 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE : 

(A) NAME /KEY : peptide 

(B) LOCATION: 1. .297 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582560 



(ii) 
(ix) 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO 


:271 












Met 


Ala 


Ser 


Glu 


Asp 


Gin 


Ser 


Ala 


Ala 


Arg 


Ser 


Thr 


Gly 


Lys 


Val 


Asn 


1 








5 










10 










15 




Trp 


Phe 


Asn 


Ala 


Ser 


Lys 


Gly 


Tyr 


Gly 


Phe 


He 


Thr 


Pro 


Asp 


Asp 


Gly 






20 










25 










30 






Ser 


Val 


Glu 


Leu 


Phe 


Val 


His 


Gin 


Ser 


Ser 


He 


Val 


Ser 


Glu 


Gly 


Tyr 






35 










40 










45 








Arg 


Ser 


Leu 


Thr 


Val 


Gly 


Asp 


Ala 


Val 


Glu 


Phe 


Ala 


He 


Thr 


Gin 


Gly 


50 










55 










60 










Ser 


Asp 


Gly 


Lys 


Thr 


Lys 


Ala 


Val 


Asn 


Val 


Thr 


Ala 


Pro 


Gly 


Gly 


Gly 


65 






70 










75 










80 


Ser 


Leu 


Lys 


Lys 


Glu 


Asn 


Asn 


Ser 


Arg 


Gly 


Asn 


Gly 


Ala 


Arg 


Arg 


Gly 






85 










90 










95 




Gly 


Gly 


Gly 


Ser 


Gly 


Cys 


Tyr 


Asn 


Cys 


Gly 


Glu 


Leu 


Gly 


His 


He 


Ser 






100 










105 










110 






Lys 


Asp 


Cys 


Gly 


He 


Gly 


Gly 


Gly 


Gly 


Gly 


Gly 


Gly 


Glu 


Arg 


Arg 


Ser 




115 










120 










125 








Arg 


Gly 


Gly 


Glu 


Gly 


Cys 


Tyr 


Asn 


Cys 


Gly 


Asp 


Thr 


Gly 


His 


Phe 


Ala 


130 










135 










140 










Arg 


Asp 


Cys 


Thr 


Ser 


Ala 


Gly 


Asn 


Gly 


Asp 


Gin 


Arg 


Gly 


Ala 


Thr 


Lys 


145 






150 










155 










160 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
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Gly 


Gly 


Asn 


Asp 


Gly 


Cys 


Tyr 


Thr 


Cys 


Gly 


Asp 


Val 


Gly 


His 


Leu 


Ala 








165 










170 










175 




Ser 


He 


Val 


Leu 


Arg 


Asn 


Gin 


Leu 


Glu 


Thr 


Glu 


Thr 


Asn 


Val 


Glu 


Arg 








180 








185 










190 






Ser 


Lys 


Val 


Glu 


Thr 


Met 


Val 


Ala 


Thr 


Leu 


Val 


Val 


Met 


Leu 


Val 


Thr 




195 










200 










205 








Leu 


Leu 
210 


Gly 


He 


Val 


Leu 


Arg 
215 


Arg 


Leu 


Leu 


Pro 


Glu 
220 


Thr 


Ser 


Glu 


Ala 


Val 


Val 


Val 


Val 


Val 


Glu 


Leu 


Val 


He 


His 


Ala 


Val 


Glu 


Leu 


Val 


Thr 


225 










230 










235 










240 


Leu 


Gin 


Glu 


He 


Val 
245 


Arg 


Leu 


Arg 


Asp 


Ser 
250 


Leu 


Leu 


Val 


Gly 


Val 
255 


Thr 


Ser 


Val 


Val 


Val 
260 


Leu 


Val 


Thr 


Trp 


Leu 
265 


Val 


He 


Val 


Thr 


Arg 
270 


Glu 


Glu 


Ala 


Val 


Glu 
275 


Glu 


Val 


Met 


He 


Met 

280 


Arg 


Ala 


Thr 


Ser 


Val 
285 


Val 


Arg 


Lys 


Val 


Thr 
290 


Leu 


Gin 


Gly 


Asn 


Val 
295 


Leu 


Leu 
















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:272: 

















(ii) 
(ix) 



(xi) 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1120 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
FEATURE : 

(A) NAME /KEY : - 

(B) LOCATION: 1..1120 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582614 
SEQUENCE DESCRIPTION: SEQ ID NO: 272: 
aaaactccga ttgagaggtg tttgcatgat tcctgagctt gtctgtgtca atagtgaggc 
actacatcga ggaataagaa agaagatcta aggagctaaa gctattatgt caacgaacac 
tgaatcttct tcttattctt ctcttcctag tcaaaggctt ttgggtaaag tggcattgat 
cactggagga gccacaggga taggtgagag cattgttcgt ctgttccaca agcacggtgc 
caaagtctgc attgttgatc tgcaagatga tctcggaggt gaggtgtgta aaagtctgct 
tcgtggtgag tccaaggaga cggctttttt catccatggc gatgttagag tggaagatga 
cattagcaat gcggttgact ttgcagtcaa aaattttggg acgcttgata tacttatcaa 
caatgcagga ttatgtggag caccgtgccc tgatattcgt aattatagtt tgagtgagtt 
cgagatgacc tttgatgtga atgtgaaagg agcttttcta agcatgaaac atgcagctcg 
tgtaatgata ccggagaaga aagggtcgat agtttcctta tgtagtgtgg gaggtgttgt 
gggaggcgtt ggtccacatt cttatgttgg ttccaagcat gctgttctag gcttgactag 
gagtgttgca gcggagcttg gacagcacgg gatacgtgtg aactgtgttt cgccttacgc 
ggttgcaact aaactcgctt tggctcattt gccggaggaa gaaagaacgg aggatgcatt 
tgttggtttc aggaattttg ctgctgcaaa cgcgaatcta aaaggggtgg aactgacggt 
tggtgatgta gcgaacgctg ttctgttttt ggctagcgat gactcgcggt acataagcgg 
agataatttg atgattgatg gaggattcac ttgcactaac cactccttta aagtcttcag 
atgatgcatt ttgctaaaga atgttgttta atgtttattg tccgccaatt tatcatgtct 
atcaaataat ttaactgtgg agcttattgk ggttttaatt gttactttta gcattgtaga 
aatgtttgat gttaactaca tttcttactg gtagacattg 
(2) INFORMATION FOR SEQ ID NO:273: 



(i) 



(ii) 
(ix) 



(xi) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 285 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 



(D) TOPOLOGY: 
MOLECULE TYPE: 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



linear 
peptide 



peptide 
1. .285 

(D) OTHER INFORMATION: / Ceres Seq. 
SEQUENCE DESCRIPTION: SEQ ID NO: 273: 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 



ID 1582615 
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Met Ser Thr Asn Thr Glu Ser Ser Ser Tyr Ser Ser Leu Pro Ser Gin 

15 10 15 

Arg Leu Leu Gly Lys Val Ala Leu lie Thr Gly Gly Ala Thr Gly He 

20 25 30 

Gly Glu Ser He Val Arg Leu Phe His Lys His Gly Ala Lys Val Cys 

35 40 45 

He Val Asp Leu Gin Asp Asp Leu Gly Gly Glu Val Cys Lys Ser Leu 

50 55 60 

Leu Arg Gly Glu Ser Lys Glu Thr Ala Phe Phe He His Gly Asp Val 
65 70 75 80 

Arg Val Glu Asp Asp He Ser Asn Ala Val Asp Phe Ala Val Lys Asn 

85 90 95 

Phe Gly Thr Leu Asp He Leu lie Asn Asn Ala Gly Leu Cys Gly Ala 

100 105 HO 

Pro Cys Pro Asp He Arg Asn Tyr Ser Leu Ser Glu Phe Glu Met Thr 

115 120 125 

Phe Asp Val Asn Val Lys Gly Ala Phe Leu Ser Met Lys His Ala Ala 

130 135 140 

Arg Val Met He Pro Glu Lys Lys Gly Ser He Val Ser Leu Cys Ser 
145 150 155 160 

Val Gly Gly Val Val Gly Gly Val Gly Pro His Ser Tyr Val Gly Ser 

165 170 175 

Lys His Ala Val Leu Gly Leu Thr Arg Ser Val Ala Ala Glu Leu Gly 

180 185 190 

Gin His Gly He Arg Val Asn Cys Val Ser Pro Tyr Ala Val Ala Thr 

195 200 205 

Lys Leu Ala Leu Ala His Leu Pro Glu Glu Glu Arg Thr Glu Asp Ala 

210 215 220 

Phe Val Gly Phe Arg Asn Phe Ala Ala Ala Asn Ala Asn Leu Lys Gly 
225 230 235 240 

Val Glu Leu Thr Val Gly Asp Val Ala Asn Ala Val Leu Phe Leu Ala 

245 250 255 

Ser Asp Asp Ser Arg Tyr He Ser Gly Asp Asn Leu Met He Asp Gly 

260 265 270 

Gly Phe Thr Cys Thr Asn His Ser Phe Lys Val Phe Arg 
275 280 285 

(2} INFORMATION FOR SEQ ID NO: 274: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 15 9 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582616 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 274: 
Met Thr Phe Asp Val Asn Val Lys Gly Ala Phe Leu Ser Met Lys His 
15 10 15 

Ala Ala Arg Val Met He Pro Glu Lys Lys Gly Ser He Val Ser Leu 

20 25 30 

Cys Ser Val Gly Gly Val Val Gly Gly Val Gly Pro His Ser Tyr Val 

35 40 45 

Gly Ser Lys His Ala Val Leu Gly Leu Thr Arg Ser Val Ala Ala Glu 

50 55 60 

Leu Gly Gin His Gly He Arg Val Asn Cys Val Ser Pro Tyr Ala Val 
65 70 75 80 

Ala Thr Lys Leu Ala Leu Ala His Leu Pro Glu Glu Glu Arg Thr Glu 

85 90 95 

Asp Ala Phe Val Gly Phe Arg Asn Phe Ala Ala Ala Asn Ala Asn Leu 
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100 105 HO 

Lys Gly Val Glu Leu Thr Val Gly Asp Val Ala Asn Ala Val Leu Phe 

115 120 125 

Leu Ala Ser Asp Asp Ser Arg Tyr lie Ser Gly Asp Asn Leu Met lie 

130 135 140 

Asp Gly Gly Phe Thr Cys Thr Asn His Ser Phe Lys Val Phe Arg 
145 150 155 

(2) INFORMATION FOR SEQ ID NO: 275: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

{ D ) TOPOLOGY : linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..146 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582617 





<xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 


):275: 










Met 


Lys 


His 


Ala 


Ala 


Arg 


Val 


Met 


He 


Pro 


Glu 


Lys 


Lys 


Gly 


Ser 


He 


1 






5 










10 










15 




Val 


Ser 


Leu 


Cys 
20 


Ser 


Val 


Gly 


Gly 


Val 
25 


Val 


Gly 


Gly 


Val 


Gly 
30 


Pro 


His 


Ser 


Tyr 


Val 


Gly 


Ser 


Lys 


His 


Ala 


Val 


Leu 


Gly 


Leu 


Thr 


Arg 


Ser 


Val 




35 










40 










45 








Ala 


Ala 
50 


Glu 


Leu 


Gly 


Gin 


His 
55 


Gly 


He 


Arg 


Val 


Asn 
60 


Cys 


Val 


Ser 


Pro 


Tyr 


Ala 


Val 


Ala 


Thr 


Lys 


Leu 


Ala 


Leu 


Ala 


His 


Leu 


Pro 


Glu 


Glu 


Glu 


65 










70 










75 










80 


Arg 


Thr 


Glu 


Asp 


Ala 


Phe 


Val 


Gly 


Phe 


Arg 


Asn 


Phe 


Ala 


Ala 


Ala 


Asn 






85 










90 










95 




Ala 


Asn 


Leu 


Lys 


Gly 


Val 


Glu 


Leu 


Thr 


Val 


Gly 


Asp 


Val 


Ala 


Asn 


Ala 








100 








105 










110 






Val 


Leu 


Phe 
115 


Leu 


Ala 


Ser 


Asp 


Asp 
120 


Ser 


Arg 


Tyr 


He 


Ser 
125 


Gly 


Asp 


Asn 


Leu 


Met 
130 


He 


Asp 


Gly 


Gly 


Phe 
135 


Thr 


Cys 


Thr 


Asn 


His 
140 


Ser 


Phe 


Lys 


Val 



Phe Arg 
145 

(2) INFORMATION FOR SEQ ID NO: 276: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2380 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..238 0 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582642 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:276: 
ctcgaggtct tagggtttta gctgttgact tgtcgcagct tcactggaga attgaatgaa 
ttttcttatg gtttcaggaa ggtgatgaat gttgagatcg atttgatttc aatcaacgtt 
caatgaagat tgtgatgatt tctcataaga tggatttgat tttggttttc ataatcgtaa 180 
tcggaggatc tattttccga cgagtttctg ctaatttcac cgaaccttgt aacggaagat 240 
gcggtggatt gactctgcct tatcctttcg ggttttcaaa cggttgttcg atccgattcg 
attgctctgc ggcggagaaa ccgatgatcg gagacttttc cgttcaaaac gtgacggaaa 
acagtatatt tgtcggtctc tctcacaatt gtactcggaa gattgaagat atgaatccgc 
ttttcggcga gaatttcgca ccgacgtcgg agaacagttt cttgatggag aattgcaacc 
gtaccaccga tggatgttcg atcaaacaga agtttttgga gaatgtgctg aaacttaaaa 540 
gttgtgatgc tactggaaac ataagttgtt tttctttaga tagtaattcg agttcgaaga 600 



60 
120 



300 
360 
420 
480 
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actcagctaa gtttttcagt atgaagacat taaggaacag ctcgtgtagt ttgttgttct 
cgtcgatagc tttcgagtct gtaggtgtca atgcgggtat agcgttagag tttgagagag 
ttcggttagg ttggtggctt aagggaggtt gccagagcgg aacttgcgcg gctaacaccg 
attgtacaga cgttgaaact cctcatggat atgcaggaca ccggtgctca tgtcttgacg 
gtttccacgg tgacggatac accaaccctt gccagagagc actaccggag tgccgtggtt 
ccaagctcgt gtggagacat tgtagatcta atcttattac tattgtagga ggaactgttg 
gtggagcgtt tttactagct ggcttggctc ttttgttctt ttgtaagcgg gtctactcct 
ttgagaagtc atttaagcgc aaagcgtctt ttgtctgaag ctgcagggaa ctcgagtgtc 
gcctttttcc cttacaagga aatcgagaaa gcgacagatg gtttctctga aaagcagaag 
ctaggaatag gtgcatatgg tacggtctat agaggaaagc tccaaaatga tgaatgggtt 
gctatcaaaa gactcagaca tagagattca gaaagtcttg accaagtcat gaatgagatc 
aagcttcttt cctctgtgag tcacccgaat cttgtccgtc tcttagggtg ttgtatagaa 
caaggcgatc cagttcttgt ttatgagtac atgccgaatg gaactctatc agaacatcta 
caaagagata gagggagtgg ccttccatgg accttgcgtc tcactgttgc tactcaaaca 
gctaaagcaa tcgcgtatct ccactcttca atgaacccgc caatctatca ccgtgacatc 
aaatctacca atatccttct tgattatgat ttcaactcca aagttgcgga tttcgggctc 
tctagactgg gaatgacgga atcttcccac atatcaacgg ctcctcaagg gactcctggt 
tatcttgacc cgcagtacca tcaatgcttt catctttctg ataagagcga cgtctacagc 
tttggagtcg tccttgccga gattataacg ggattgaaag tcgttgattt cacacgtccc 
cataccgaaa tcaacttagc agctcttgct gttgacaaaa tcgggtcagg ttgtatcgat 
gagataatag acccgattct tgacttggat ctcgacgcat ggactctctc atccatacac 
accgtggctg agcttgcatt tcgatgctta gccttccaca gtgacatgag accgacaatg 
accgaagtag cggacgagct tgaacagata agactcagtg gttggattcc aagcatgagc 
ttggattcac cagccggttc tctccgttca tctgatcgag gaagcgaaag atcagttaaa 
caatcatcaa taggaagcag aagagtcgtt atccctcaga aacaacctga ttgcctcgca 
tccgtcgaag agattagcga tagctcaccc atctcagttc aagatccttg gttaagtgca 
caaagctcac cgtctacaaa tacattgctc ggtaacattc caagatgata gtatgatctc 
aaaacacgat taccactttc ttgtttatat aagtccctct tttgtgtagc tttcacaatc 
attcagtgtt tataatgcta agtagagaga ttaacaatgg tgaaaacagg tttatacatg 
agttataatg taaacgatct tgaaacaaga tctttcaatc 
(2) INFORMATION FOR SEQ ID NO: 277: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 311 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1.-311 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582643 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 277: 
Met Lys He Val Met He Ser His Lys Met Asp Leu He Leu Val Phe 
15 10 15 

He He Val He Gly Gly Ser He Phe Arg Arg Val Ser Ala Asn Phe 

20 25 30 

Thr Glu Pro Cys Asn Gly Arg Cys Gly Gly Leu Thr Leu Pro Tyr Pro 

35 40 45 

Phe Gly Phe Ser Asn Gly Cys Ser He Arg Phe Asp Cys Ser Ala Ala 

50 55 60 

Glu Lys Pro Met He Gly Asp Phe Ser Val Gin Asn Val Thr Glu Asn 
65 70 75 80 

Ser He Phe Val Gly Leu Ser His Asn Cys Thr Arg Lys He Glu Asp 

85 90 95 

Met Asn Pro Leu Phe Gly Glu Asn Phe Ala Pro Thr Ser Glu Asn Ser 

100 105 HO 

Phe Leu Met Glu Asn Cys Asn Arg Thr Thr Asp Gly Cys Ser He Lys 

115 120 125 

Gin Lys Phe Leu Glu Asn Val Leu Lys Leu Lys Ser Cys Asp Ala Thr 

130 135 140 

Gly Asn He Ser Cys Phe Ser Leu Asp Ser Asn Ser Ser Ser Lys Asn 



660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
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145 150 155 160 

Ser Ala Lys Phe Phe Ser Met Lys Thr Leu Arg Asn Ser Ser Cys Ser 

165 170 175 

Leu Leu Phe Ser Ser lie Ala Phe Glu Ser Val Gly Val Asn Ala Gly 

180 185 190 

He Ala Leu Glu Phe Glu Arg Val Arg Leu Gly Trp Trp Leu Lys Gly 

195 200 205 

Gly Cys Gin Ser Gly Thr Cys Ala Ala Asn Thr Asp Cys Thr Asp Val 

210 215 220 

Glu Thr Pro His Gly Tyr Ala Gly His Arg Cys Ser Cys Leu Asp Gly 
225 230 235 240 

Phe His Gly Asp Gly Tyr Thr Asn Pro Cys Gin Arg Ala Leu Pro Glu 

245 250 255 

Cys Arg Gly Ser Lys Leu Val Trp Arg His Cys Arg Ser Asn Leu He 

260 265 270 

Thr He Val Gly Gly Thr Val Gly Gly Ala Phe Leu Leu Ala Gly Leu 

275 280 285 

Ala Leu Leu Phe Phe Cys Lys Arg Val Tyr Ser Phe Glu Lys Ser Phe 

290 295 300 

Lys Arg Lys Ala Ser Phe Val 
305 310 
{2} INFORMATION FOR SEQ ID NO: 278: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 307 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..307 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582644 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 8: 
Met He Ser His Lys Met Asp Leu He Leu Val Phe He lie Val He 
15 10 15 

Gly Gly Ser He Phe Arg Arg Val Ser Ala Asn Phe Thr Glu Pro Cys 

20 25 30 

Asn Gly Arg Cys Gly Gly Leu Thr Leu Pro Tyr Pro Phe Gly Phe Ser 

35 40 45 

Asn Gly Cys Ser He Arg Phe Asp Cys Ser Ala Ala Glu Lys Pro Met 

50 55 60 

He Gly Asp Phe Ser Val Gin Asn Val Thr Glu Asn Ser He Phe Val 
65 70 75 80 

Gly Leu Ser His Asn Cys Thr Arg Lys He Glu Asp Met Asn Pro Leu 

85 90 95 

Phe Gly Glu Asn Phe Ala Pro Thr Ser Glu Asn Ser Phe Leu Met Glu 

100 105 HO 

Asn Cys Asn Arg Thr Thr Asp Gly Cys Ser He Lys Gin Lys Phe Leu 

115 120 125 

Glu Asn Val Leu Lys Leu Lys Ser Cys Asp Ala Thr Gly Asn He Ser 

130 135 140 

Cys Phe Ser Leu Asp Ser Asn Ser Ser Ser Lys Asn Ser Ala Lys Phe 
145 150 155 160 

Phe Ser Met Lys Thr Leu Arg Asn Ser Ser Cys Ser Leu Leu Phe Ser 

165 170 175 

Ser He Ala Phe Glu Ser Val Gly Val Asn Ala Gly He Ala Leu Glu 

180 185 190 

Phe Glu Arg Val Arg Leu Gly Trp Trp Leu Lys Gly Gly Cys Gin Ser 

195 200 205 

Gly Thr Cys Ala Ala Asn Thr Asp Cys Thr Asp Val Glu Thr Pro His 
210 215 220 
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Gly Tyr Ala Gly His Arg Cys Ser Cys Leu Asp Gly Phe His Gly Asp 
225 230 235 240 

Gly Tyr Thr Asn Pro Cys Gin Arg Ala Leu Pro Glu Cys Arg Gly Ser 

245 250 255 

Lys Leu Val Trp Arg His Cys Arg Ser Asn Leu He Thr He Val Gly 

260 265 270 

Gly Thr Val Gly Gly Ala Phe Leu Leu Ala Gly Leu Ala Leu Leu Phe 

275 280 285 

Phe Cys Lys Arg Val Tyr Ser Phe Glu Lys Ser Phe Lys Arg Lys Ala 

290 295 300 

Ser Phe Val 
305 

(2) INFORMATION FOR SEQ ID NO: 27 9: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 319 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..319 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582645 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 9: 
Met Asn Glu He Lys Leu Leu Ser Ser Val Ser His Pro Asn Leu Val 
15 10 15 

Arg Leu Leu Gly Cys Cys He Glu Gin Gly Asp Pro Val Leu Val Tyr 

20 25 30 

Glu Tyr Met Pro Asn Gly Thr Leu Ser Glu His Leu Gin Arg Asp Arg 

35 40 45 

Gly Ser Gly Leu Pro Trp Thr Leu Arg Leu Thr Val Ala Thr Gin Thr 

50 55 60 

Ala Lys Ala He Ala Tyr Leu His Ser Ser Met Asn Pro Pro He Tyr 
65 70 75 80 

His Arg Asp He Lys Ser Thr Asn He Leu Leu Asp Tyr Asp Phe Asn 

85 90 95 

Ser Lys Val Ala Asp Phe Gly Leu Ser Arg Leu Gly Met Thr Glu Ser 

100 105 HO 

Ser His He Ser Thr Ala Pro Gin Gly Thr Pro Gly Tyr Leu Asp Pro 

115 120 125 

Gin Tyr His Gin Cys Phe His Leu Ser Asp Lys Ser Asp Val Tyr Ser 

130 135 140 

Phe Gly Val Val Leu Ala Glu He He Thr Gly Leu Lys Val Val Asp 
145 150 155 160 

Phe Thr Arg Pro His Thr Glu He Asn Leu Ala Ala Leu Ala Val Asp 

165 170 175 

Lys He Gly Ser Gly Cys He Asp Glu He He Asp Pro He Leu Asp 

180 185 190 

Leu Asp Leu Asp Ala Trp Thr Leu Ser Ser He His Thr Val Ala Glu 

195 200 205 

Leu Ala Phe Arg Cys Leu Ala Phe His Ser Asp Met Arg Pro Thr Met 

210 215 220 

Thr Glu Val Ala Asp Glu Leu Glu Gin He Arg Leu Ser Gly Trp He 
225 230 235 240 

Pro Ser Met Ser Leu Asp Ser Pro Ala Gly Ser Leu Arg Ser Ser Asp 

245 250 255 

Arg Gly Ser Glu Arg Ser Val Lys Gin Ser Ser He Gly Ser Arg Arg 

260 265 270 

Val Val He Pro Gin Lys Gin Pro Asp Cys Leu Ala Ser Val Glu Glu 

275 280 285 

He Ser Asp Ser Ser Pro He Ser Val Gin Asp Pro Trp Leu Ser Ala 
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290 295 300 

Gin Ser Ser Pro Ser Thr Asn Thr Leu Leu Gly Asn lie Pro Arg 
305 310 315 

(2) INFORMATION FOR SEQ ID NO: 280: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1263 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..1263 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582654 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:280: 
gaaactctct ttcagatttc aagaaaccct taaaatcctc tctctgtctc tttctgtttg 
tttgttttct tgtttccttc tctctctctc tttctttctt tgtcttcctt tcccaggttg 
tttttttttg ctctctctgc cttcttgact ttcaaaagac tctttctttc ttttggattg 
attwtggatt ctagggctct ctttctttta gtgggttttt gttcttgttg tggtctctct 
gatgattact gaacttgaga tggggaaagg tgagagtgag cttgagcttg gtctagggct 
gagtcttggc ggtggaacgg cggccaagat tggtaaatca ggtggtggtg gcgcgtgggg 360 
agagcgtgga aggcttttga cggctaagga ttttccttct gttggttcta aacgtgctgc 420 
tgattctgct tctcatgctg gttcatctcc tcctcgttca agtcaagttg ttggatggcc 480 
tcctataggg tcacacagga tgaacagttt ggttaataac caagctacaa agtcagcaag 540 
agaagaagaa gaagctggta agaagaaagt gaaagatgat gaacctaaag atgtgacaaa 600 
gaaagtgaat gggaaagtac aagttggatt tattaaggtg aacatggatg gagttgctat 660 
aggaagaaaa gtggatttga atgctcattc ttcttacgag aatttggcgc aaacattgga 720 

840 
900 
960 
1020 
1080 



60 
120 
180 
240 
300 



agatatgttc tttcgcacta atccgggtac tgtcgggtta accagtcagt tcactaaacc 
gttgaggctt ttagatggat cgtctgagtt tgtacttact tatgaagata aggaaggaga 
ttggatgctt gttggtgatg ttccatggag aatgttcatc aactcggtga aaaggctacg 
tgtgatgaaa acctctgaag ctaatggact cgctgcacga aatcaagaac caaacgagag 
acagcgaaag cagccggttt agatctcttt tcgacgttac ggtgttacag gttttatatt 
ttggggtttt gcaagtctga gatacttctg aagcaagcat aagctagatt gatcttatat 
ccagtttgtg tattttcttg gttcttataa tggtttttac tggttttctt tagttttttt 1140 
ttttgctgtc ttttaatttt cggttgcgat ttcactatat actatggatg gaagagaatg 1200 
ctctttatat cttttactac actgtaaata tttgaagctt atctaatatc gtttttaagg 1260 
gtc 

(2) INFORMATION FOR SEQ ID NO: 281: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 246 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582655 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 281: 
Met He Thr Glu Leu Glu Met Gly Lys Gly Glu Ser Glu Leu Glu Leu 
15 10 15 

Gly Leu Gly Leu Ser Leu Gly Gly Gly Thr Ala Ala Lys He Gly Lys 

20 25 30 

Ser Gly Gly Gly Gly Ala Trp Gly Glu Arg Gly Arg Leu Leu Thr Ala 

35 40 45 

Lys Asp Phe Pro Ser Val Gly Ser Lys Arg Ala Ala Asp Ser Ala Ser 

50 55 60 

His Ala Gly Ser Ser Pro Pro Arg Ser Ser Gin Val Val Gly Trp Pro 
65 70 75 80 

Pro He Gly Ser His Arg Met Asn Ser Leu Val Asn Asn Gin Ala Thr 
85 90 95 
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Lys Ser Ala Arg Glu Glu Glu Glu Ala Gly Lys Lys Lys Val Lys Asp 

100 105 HO 

Asp Glu Pro Lys Asp Val Thr Lys Lys Val Asn Gly Lys Val Gin Val 

115 120 125 

Gly Phe He Lys Val Asn Met Asp Gly Val Ala He Gly Arg Lys Val 

130 135 140 

Asp Leu Asn Ala His Ser Ser Tyr Glu Asn Leu Ala Gin Thr Leu Glu 
145 150 155 160 

Asp Met Phe Phe Arg Thr Asn Pro Gly Thr Val Gly Leu Thr Ser Gin 

165 170 175 

Phe Thr Lys Pro Leu Arg Leu Leu Asp Gly Ser Ser Glu Phe Val Leu 

180 185 190 

Thr Tyr Glu Asp Lys Glu Gly Asp Trp Met Leu Val Gly Asp Val Pro 

195 200 205 

Trp Arg Met Phe He Asn Ser Val Lys Arg Leu Arg Val Met Lys Thr 

210 215 220 

Ser Glu Ala Asn Gly Leu Ala Ala Arg Asn Gin Glu Pro Asn Glu Arg 
225 230 235 240 

Gin Arg Lys Gin Pro Val 
245 

(2) INFORMATION FOR SEQ ID NO: 282: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..240 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582656 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 282: 
Met Gly Lys Gly Glu Ser Glu Leu Glu Leu Gly Leu Gly Leu Ser Leu 
15 10 15 

Gly Gly Gly Thr Ala Ala Lys He Gly Lys Ser Gly Gly Gly Gly Ala 

20 25 30 

Trp Gly Glu Arg Gly Arg Leu Leu Thr Ala Lys Asp Phe Pro Ser Val 

35 40 45 

Gly Ser Lys Arg Ala Ala Asp Ser Ala Ser His Ala Gly Ser Ser Pro 

50 55 60 

Pro Arg Ser Ser Gin Val Val Gly Trp Pro Pro He Gly Ser His Arg 
65 70 75 80 

Met Asn Ser Leu Val Asn Asn Gin Ala Thr Lys Ser Ala Arg Glu Glu 

85 90 95 

Glu Glu Ala Gly Lys Lys Lys Val Lys Asp Asp Glu Pro Lys Asp Val 

100 105 HO 

Thr Lys Lys Val Asn Gly Lys Val Gin Val Gly Phe He Lys Val Asn 

115 120 125 

Met Asp Gly Val Ala lie Gly Arg Lys Val Asp Leu Asn Ala His Ser 

130 135 140 

Ser Tyr Glu Asn Leu Ala Gin Thr Leu Glu Asp Met Phe Phe Arg Thr 
145 150 155 160 

Asn Pro Gly Thr Val Gly Leu Thr Ser Gin Phe Thr Lys Pro Leu Arg 

165 170 175 

Leu Leu Asp Gly Ser Ser Glu Phe Val Leu Thr Tyr Glu Asp Lys Glu 

180 185 190 

Gly Asp Trp Met Leu Val Gly Asp Val Pro Trp Arg Met Phe He Asn 

195 200 205 

Ser Val Lys Arg Leu Arg Val Met Lys Thr Ser Glu Ala Asn Gly Leu 

210 215 220 

Ala Ala Arg Asn Gin Glu Pro Asn Glu Arg Gin Arg Lys Gin Pro Val 
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225 230 235 240 



(2) INFORMATION FOR SEQ ID NO: 283: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 160 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..160 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582657 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 283 












Met 


Asn 


Ser 


Leu 


Val 


Asn 


Asn 


Gin 


Ala 


Thr 


Lys 


Ser 


Ala 


Arg 


Glu 


Glu 


1 








5 










10 










15 




Glu 


Glu 


Ala 


Gly 


Lys 


Lys 


Lys 


Val 


Lys 


Asp 


Asp 


Glu 


Pro 


Lys 


Asp 


Val 








20 










25 










30 






Thr 


Lys 


Lys 


Val 


Asn 


Gly 


Lys 


Val 


Gin 


Val 


Gly 


Phe 


He 


Lys 


Val 


Asn 




35 










40 










45 








Met 


Asp 


Gly 


Val 


Ala 


He 


Gly 


Arg 


Lys 


Val 


Asp 


Leu 


Asn 


Ala 


His 


Ser 




50 










55 










60 










Ser 


Tyr 


Glu 


Asn 


Leu 


Ala 


Gin 


Thr 


Leu 


Glu 


Asp 


Met 


Phe 


Phe 


Arg 


Thr 


65 








70 










75 










80 


Asn 


Pro 


Gly 


Thr 


Val 


Gly 


Leu 


Thr 


Ser 


Gin 


Phe 


Thr 


Lys 


Pro 


Leu 


Arg 








85 










90 










95 




Leu 


Leu 


Asp 


Gly 


Ser 


Ser 


Glu 


Phe 


Val 


Leu 


Thr 


Tyr 


Glu 


Asp 


Lys 


Glu 






100 










105 










110 






Gly 


Asp 


Trp 


Met 


Leu 


Val 


Gly 


Asp 


Val 


Pro 


Trp 


Arg 


Met 


Phe 


He 


Asn 


115 










120 










125 








Ser 


Val 


Lys 


Arg 


Leu 


Arg 


Val 


Met 


Lys 


Thr 


Ser 


Glu 


Ala 


Asn 


Gly 


Leu 




130 






135 










140 










Ala 


Ala 


Arg 


Asn 


Gin 


Glu 


Pro 


Asn 


Glu 


Arg 


Gin 


Arg 


Lys 


Gin 


Pro 


Val 


145 








150 










155 










160 



(2) INFORMATION FOR SEQ ID NO: 284: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1555 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1555 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582658 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 284: 
aattcaccac catgaactct tcctttactc tcttcatctt taccttcgtc atcttcctcc 
aatatctaaa tcccaccgga gctgccacgt gtcatcctga tgatgaagcg ggtcttctag 120 

........ 180 

240 
300 



60 



ctttcaaagc gggtataacc cgagatcctt cgggtattct cagttcttgg aagaaaggta 
ccgcttgttg ttcttggaac ggtgtcactt gtctcaccac tgaccgtgtc tctgcactct 
ctgtcgctgg acaagccgat gttgctggaa gcttcctctc cggcactctc tcgccgtcgt 
tggctaaact caagcacctt gatgggattt acttcaccga tctcaagaac atcactggtt 360 
cttttcctca attccttttc caattaccaa atcttaagta cgtatacatt gagaataacc 420 
gtctctctgg tcctcttccg gctaacatcg gtgcgctaag ccagcttgaa gcgttcagcc 480 
tcgagggaaa ccggttcacc ggtccgatcc cgagctcgat atctaatttg actcggttaa 540 
ctcaactcaa actcggcaat aatcttctaa ccggaactat accgttaggg gtcgctaatc 600 
tcaagctcat gtcgtatctc aaccttggag gtaaccgtct cactggaacc attccagata 
ttttcaaatc catgccagag ctccgatctt taactctctc ccgcaacgga ttctccggga 



660 
720 
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atcttcctcc gtccattgca tcactcgcac cgattcttag gttcctcgag ttaggccata 
acaaactctc cgggacgatt ccaaactttt tatcaaactt caaggcgctc gacacattgg 
atctctccaa gaatcggttc tcgggtgtca taccgaagag tttcgccaat ctgaccaaga 
tattcaatct tgatctctcc cataatcttc taaccgatcc attccctgtc ttgaacgtta 
aaggcattga atctctggat ctctcgtaca acaagtttca cttgaatacg atcccgaaat 
gggtgacttc gtcgccgatc atcttctcgt tgaagctagc aaaatgcggg atcaagatga 
gcttagacga ttggaagcca gcgcaaacat tctactatga tttcatcgat ctgtcggaaa 
acgagatcac aggtagtcca gcaaggttct tgaaccaaac agagtatcta gtggagttca 
aggcagcggg taacaaacta cgattcgata tggggaagct aacgtttgca aagacgctga 
caactttaga tatatcaagg aacttggtat ttgggaaggt gccggcaatg gtggctggac 
taaagacatt gaacgtgagt cacaaccatc tttgtggaaa gcttccagta acaaagttcc 
cggccagtgc gtttgtgggt aatgactgtc tttgcggctc tcctctttct ccttgtaaag 
cttaagcggc aagaaagcta caactggtag ttggatattt accgtgacca ataaagttta 
gctccaaata aaatttgtat taaaaagatt acaacaaata aagtagtttt atgtt 
(2) INFORMATION FOR SEQ ID NO: 285: 



(i) 



(ii) 
(ix) 



(xi) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 480 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE : 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..480 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582659 
SEQUENCE DESCRIPTION: SEQ ID NO: 285: 



Phe Thr Thr Met Asn Ser Ser Phe Thr Leu Phe He Phe Thr Phe Val 



1 



5 10 15 

He Phe Leu Gin Cys Leu Asn Pro Thr Gly Ala Ala Thr Cys His Pro 

20 25 30 

Asp Asp Glu Ala Gly Leu Leu Ala Phe Lys Ala Gly He Thr Arg Asp 

35 40 45 

Pro Ser Gly He Leu Ser Ser Trp Lys Lys Gly Thr Ala Cys Cys Ser 

50 55 60 

Trp Asn Gly Val Thr Cys Leu Thr Thr Asp Arg Val Ser Ala Leu Ser 
65 70 75 80 

Val Ala Gly Gin Ala Asp Val Ala Gly Ser Phe Leu Ser Gly Thr Leu 

85 90 95 

Ser Pro Ser Leu Ala Lys Leu Lys His Leu Asp Gly He Tyr Phe Thr 

100 105 HO 

Asp Leu Lys Asn He Thr Gly Ser Phe Pro Gin Phe Leu Phe Gin Leu 

115 120 125 

Pro Asn Leu Lys Tyr Val Tyr He Glu Asn Asn Arg Leu Ser Gly Pro 

130 135 140 

Leu Pro Ala Asn He Gly Ala Leu Ser Gin Leu Glu Ala Phe Ser Leu 
145 150 155 160 

Glu Gly Asn Arg Phe Thr Gly Pro He Pro Ser Ser He Ser Asn Leu 

165 170 175 

Thr Arg Leu Thr Gin Leu Lys Leu Gly Asn Asn Leu Leu Thr Gly Thr 

180 185 190 

He Pro Leu Gly Val Ala Asn Leu Lys Leu Met Ser Tyr Leu Asn Leu 

195 200 205 

Gly Gly Asn Arg Leu Thr Gly Thr He Pro Asp He Phe Lys Ser Met 

210 215 220 

Pro Glu Leu Arg Ser Leu Thr Leu Ser Arg Asn Gly Phe Ser Gly Asn 
225 230 235 240 

Leu Pro Pro Ser He Ala Ser Leu Ala Pro He Leu Arg Phe Leu Glu 

245 250 255 

Leu Gly His Asn Lys Leu Ser Gly Thr He Pro Asn Phe Leu Ser Asn 

260 265 270 

Phe Lys Ala Leu Asp Thr Leu Asp Leu Ser Lys Asn Arg Phe Ser Gly 



780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
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275 










280 




Val 


He 


Pro 


Lys 


Ser 


Phe 


Ala 


Asn 


Leu 




290 










295 






Leu 


Ser 


His 


Asn 


Leu 


Leu 


Thr 


Asp 


Pro 


305 










310 








Gly 


He 


Glu 


Ser 


Leu Asp 


Leu 


Ser 


Tyr 










325 










He 


Pro 


Lys 


Trp 


Val 


Thr 


Ser 


Ser 


Pro 








340 










345 


Ala 


Lys 


Cys 


Gly 


He 


Lys 


Met 


Ser 


Leu 






355 










360 




Thr 


Phe 


Tyr 


Tyr 


Asp 


Phe 


He 


Asp 


Leu 




370 










375 






Ser 


Pro 


Ala 


Arg 


Phe 


Leu 


Asn 


Gin 


Thr 


385 










390 








Ala 


Ala 


Gly 


Asn 


Lys 


Leu 


Arg 


Phe 


Asp 










405 










Lys 


Thr 


Leu 


Thr 


Thr 


Leu 


Asp 


He 


Ser 






420 










425 


Val 


Pro 


Ala 


Met 


Val 


Ala 


Gly 


Leu 


Lys 






435 










440 




His 


Leu 


Cys 


Gly 


Lys 


Leu 


Pro 


Val 


Thr 




450 










455 






Val 


Gly 


Asn 


Asp 


Cys 


Leu 


Cys 


Gly 


Ser 


465 










470 
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285 



Thr 


Lys 


He 
300 


Phe 


Asn 


Leu 


Asp 


Phe 


Pro 
315 


Val 


Leu 


Asn 


Val 


Lys 
320 


Asn 


Lys 


Phe 


His 


Leu 


Asn 


Thr 


330 










335 




He 


He 


Phe 


Ser 


Leu 
350 


Lys 


Leu 


Asp 


Asp 


Trp 


Lys 
365 


Pro 


Ala 


Gin 


Ser 


Glu 


Asn 
380 


Glu 


He 


Thr 


Gly 


Glu 


Tyr 
395 


Leu 


Val 


Glu 


Phe 


Lys 
400 


Met 


Gly 


Lys 


Leu 


Thr 


Phe 


Ala 


410 










415 




Arg 


Asn 


Leu 


Val 


Phe 
430 


Gly 


Lys 


Thr 


Leu 


Asn 


Val 
445 


Ser 


His 


Asn 


Lys 


Phe 


Pro 
460 


Ala 


Ser 


Ala 


Phe 


Pro 


Leu 
475 


Ser 


Pro 


Cys 


Lys 


Ala 
480 



(2) INFORMATION FOR SEQ ID NO: 28 6: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 477 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..477 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582660 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:286: 



Met 


Asn 


Ser 


Ser 


Phe 


Thr 


Leu 


Phe 


He 


Phe 


Thr 


Phe 


Val 


He 


Phe 


Leu 


1 








5 










10 










15 




Gin 


Cys 


Leu 


Asn 


Pro 


Thr 


Gly 


Ala 


Ala 


Thr 


Cys 


His 


Pro 


Asp 


Asp 


Glu 






20 










25 










30 






Ala 


Gly 


Leu 


Leu 


Ala 


Phe 


Lys 


Ala 


Gly 


He 


Thr 


Arg 


Asp 


Pro 


Ser 


Gly 




35 










40 










45 








He 


Leu 
50 


Ser 


Ser 


Trp 


Lys 


Lys 
55 


Gly 


Thr 


Ala 


Cys 


Cys 
60 


Ser 


Trp 


Asn 


Gly 


Val 


Thr 


Cys 


Leu 


Thr 


Thr 


Asp 


Arg 


Val 


Ser 


Ala 


Leu 


Ser 


Val 


Ala 


Gly 


65 








70 










75 










80 


Gin 


Ala 


Asp 


Val 


Ala 


Gly 


Ser 


Phe 


Leu 


Ser 


Gly 


Thr 


Leu 


Ser 


Pro 


Ser 








85 










90 










95 




Leu 


Ala 


Lys 


Leu 


Lys 


His 


Leu 


Asp 


Gly 


He 


Tyr 


Phe 


Thr 


Asp 


Leu 


Lys 






100 










105 










110 






Asn 


He 


Thr 


Gly 


Ser 


Phe 


Pro 


Gin 


Phe 


Leu 


Phe 


Gin 


Leu 


Pro 


Asn 


Leu 






115 








120 










125 








Lys 


Tyr 


Val 


Tyr 


He 


Glu 


Asn 


Asn 


Arg 


Leu 


Ser 


Gly 


Pro 


Leu 


Pro 


Ala 


130 










135 










140 










Asn 


He 


Gly 


Ala 


Leu 


Ser 


Gin 


Leu 


Glu 


Ala 


Phe 


Ser 


Leu 


Glu 


Gly Asn 


145 








150 










155 










160 


Arg 


Phe 


Thr 


Gly 


Pro 


He 


Pro 


Ser 


Ser 


He 


Ser 


Asn 


Leu 


Thr 


Arg 


Leu 






165 










170 










175 
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Thr Gin Leu Lys Leu Gly Asn Asn Leu Leu Thr Gly Thr He Pro Leu 

180 185 190 

Gly Val Ala Asn Leu Lys Leu Met Ser Tyr Leu Asn Leu Gly Gly Asn 

195 200 205 

Arg Leu Thr Gly Thr He Pro Asp He Phe Lys Ser Met Pro Glu Leu 

210 215 220 

Arg Ser Leu Thr Leu Ser Arg Asn Gly Phe Ser Gly Asn Leu Pro Pro 
225 230 235 240 

Ser He Ala Ser Leu Ala Pro He Leu Arg Phe Leu Glu Leu Gly Hrs 

245 250 255 

Asn Lys Leu Ser Gly Thr He Pro Asn Phe Leu Ser Asn Phe Lys Ala 

260 265 270 

Leu Asp Thr Leu Asp Leu Ser Lys Asn Arg Phe Ser Gly Val He Pro 

275 280 285 

Lys Ser Phe Ala Asn Leu Thr Lys He Phe Asn Leu Asp Leu Ser His 

290 295 300 

Asn Leu Leu Thr Asp Pro Phe Pro Val Leu Asn Val Lys Gly He Glu 
305 310 315 320 

Ser Leu Asp Leu Ser Tyr Asn Lys Phe His Leu Asn Thr He Pro Lys 

325 330 335 

Trp Val Thr Ser Ser Pro He He Phe Ser Leu Lys Leu Ala Lys Cys 

340 345 350 

Gly He Lys Met Ser Leu Asp Asp Trp Lys Pro Ala Gin Thr Phe Tyr 

355 360 365 

Tyr Asp Phe He Asp Leu Ser Glu Asn Glu He Thr Gly Ser Pro Ala 

370 375 380 

Arg Phe Leu Asn Gin Thr Glu Tyr Leu Val Glu Phe Lys Ala Ala Gly 
385 390 395 400 

Asn Lys Leu Arg Phe Asp Met Gly Lys Leu Thr Phe Ala Lys Thr Leu 

405 410 415 

Thr Thr Leu Asp He Ser Arg Asn Leu Val Phe Gly Lys Val Pro Ala 

420 425 430 

Met Val Ala Gly Leu Lys Thr Leu Asn Val Ser His Asn His Leu Cys 

435 440 445 

Gly Lys Leu Pro Val Thr Lys Phe Pro Ala Ser Ala Phe Val Gly Asn 

450 455 460 

Asp Cys Leu Cys Gly Ser Pro Leu Ser Pro Cys Lys Ala 
465 470 475 

(2) INFORMATION FOR SEQ ID NO: 287: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..278 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582661 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 287: 
Met Ser Tyr Leu Asn Leu Gly Gly Asn Arg Leu Thr Gly Thr He Pro 
15 10 15 

Asp He Phe Lys Ser Met Pro Glu Leu Arg Ser Leu Thr Leu Ser Arg 

20 25 30 

Asn Gly Phe Ser Gly Asn Leu Pro Pro Ser He Ala Ser Leu Ala Pro 

35 40 45 

He Leu Arg Phe Leu Glu Leu Gly His Asn Lys Leu Ser Gly Thr He 

50 55 60 

Pro Asn Phe Leu Ser Asn Phe Lys Ala Leu Asp Thr Leu Asp Leu Ser 
65 70 75 80 

Lys Asn Arg Phe Ser Gly Val He Pro Lys Ser Phe Ala Asn Leu Thr 
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85 90 95 

Lys He Phe Asn Leu Asp Leu Ser His Asn Leu Leu Thr Asp Pro Phe 

100 105 HO 

Pro Val Leu Asn Val Lys Gly He Glu Ser Leu Asp Leu Ser Tyr Asn 

115 120 125 

Lys Phe His Leu Asn Thr He Pro Lys Trp Val Thr Ser Ser Pro He 

130 135 140 

He Phe Ser Leu Lys Leu Ala Lys Cys Gly He Lys Met Ser Leu Asp 
145 150 155 160 

Asp Trp Lys Pro Ala Gin Thr Phe Tyr Tyr Asp Phe He Asp Leu Ser 

165 170 175 

Glu Asn Glu He Thr Gly Ser Pro Ala Arg Phe Leu Asn Gin Thr Glu 

180 185 190 

Tyr Leu Val Glu Phe Lys Ala Ala Gly Asn Lys Leu Arg Phe Asp Met 

195 200 " 205 

Gly Lys Leu Thr Phe Ala Lys Thr Leu Thr Thr Leu Asp He Ser Arg 

210 215 220 

Asn Leu Val Phe Gly Lys Val Pro Ala Met Val Ala Gly Leu Lys Thr 
225 230 235 240 

Leu Asn Val Ser His Asn His Leu Cys Gly Lys Leu Pro Val Thr Lys 

245 250 255 

Phe Pro Ala Ser Ala Phe Val Gly Asn Asp Cys Leu Cys Gly Ser Pro 

260 265 270 

Leu Ser Pro Cys Lys Ala 
275 

(2) INFORMATION FOR SEQ ID NO: 28 8 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1914 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1914 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582689 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 288: 
cacaaagcag cttctgcaac ctattcgttg gttggttaat agaagctctc tttctctgtc 
tgtctttctc tctcagagaa atggctcttc ttctagtttc ttcttcctcc tcctatgccc 
tcagagtcac cattttcttg tctttcttct tctttctctg caatggcttc tcttacccta 
ctacttcttc tcttttcaac acccatcacc atcgtcacca cttggccaag cacaactaca 
aagatgctct cactaaatca atcctcttct ttgaaggcca aaggtcaggg aaacttcctt 
ctaaccagag aatgagttgg agaagagact ctggtctctc tgatggctct gctcttcatg 
tggatttggt tggagggtac tatgatgcag gagvcaatat caaatttgga ttcccaatgg 
cattcacaac cacaatgctt tcatggagtg taattgaatt cggtggactc atgaaatctg 
agttacaaaa cgctaaaata gcgattcgtt gggctactga ttatctcctc aaagccactt 
cacaacctga cacaatctat gttcaagttg gtgatgctaa taaagaccat tcttgttggg 
aaagaccaga agacatggat actgtaagaa gtgtgtttaa agttgacaag aacactcctg 
gttctgatgt cgccgctgaa accgccgccg ctctagccgc cgccgccatt gtattcagaa 
aatctgatcc ttcttactcc aaagtcctcc tcaaacgagc catcagtgtt tttgcatttg 
cggacaaata cagaggaact tatagtgcag gattaaaacc tgatgtttgt ccattttatt 
gctcttactc tggttatcag gatgaattgt tgtggggagc tgcttggtta caaaacgcga 
caaagaattt aaaatatttg aattacataa aaatcaatgg acaaatcctt ggagctgctg 
aatatgataa cacttttggt tgggataaca agcacgctgg tgccagaatc cttcttacaa 
aggcattttt ggttcagaat gtgaagacac ttcatgaata caaaggtcat gctgataatt 
tcatctgctc tgttattcct ggagctcctt tctcttctac tcagtataca ccaggtggat 
tattgtttaa aatggcagac gccaacatgc aatacgtgac gtcaacatcg ttcttgctct 
taacctatgc caaatactta acctccgcca aaaccgtcgt ccattgcggt ggctccgtct 
acactcccgg tcgtcttcgc tccatcgcca aaagacaggt ggattatcta cttggagaca 1320 
acccattaag aatgtcttac atggttggtt acggtccaaa attcccacgg agaatccacc 1380 
accgtggctc ctcattacct tgtgttgcaa gccacccggc caagatccaa tgccaccaag 1440 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
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ggtttgcaat catgaactct caatctccaa accctaactt ccttgttggt gcagtcgttg 
gtggtcccga ccagcatgat cgcttcccag acgaacggtc tgattacgag cagtccgagc 
cggctactta catcaattca ccactcgttg gagctcttgc ctatttcgct cacgcctatg 
gtcaactcta gtttagtaac gacgagtgtg ttagtttaag taaaaataaa aatgaaggaa 
gttttttctt tatttttact tttatttgtt agtaatgtag tggaccgaaa atcggatcac 
aagaggacat tggtccgagg gatggtttat ttggttcgtt ataatataac gtcaagtgta 
atcttattgt ggttattaat gttatcatcc tattaattac tatatccatg tcgttaattt 
ttgatatgtt tatatgattt ttcatatttt tgtgaaaaaa aarwwaaact tttg 
(2) INFORMATION FOR SEQ ID NO: 289: 



(i) 



<±i) 
(ix) 



(xi) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 542 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE 
FEATURE : 

(A) NAME /KEY 

(B) LOCATION 



peptide 



peptide 
1..542 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582690 
SEQUENCE DESCRIPTION: SEQ ID NO: 289: 



Gin Ser Ser Phe Cys Asn Leu Phe Val Gly Trp Leu He Glu Ala Leu 



1 



5 10 15 

Phe Leu Cys Leu Ser Phe Ser Leu Arg Glu Met Ala Leu Leu Leu Val 

20 25 30 

Ser Ser Ser Ser Ser Tyr Ala Leu Arg Val Thr He Phe Leu Ser Phe 

35 40 45 

Phe Phe Phe Leu Cys Asn Gly Phe Ser Tyr Pro Thr Thr Ser Ser Leu 

50 55 60 

Phe Asn Thr His His His Arg His His Leu Ala Lys His Asn Tyr Lys 
65 70 75 80 

Asp Ala Leu Thr Lys Ser He Leu Phe Phe Glu Gly Gin Arg Ser Gly 

85 90 95 

Lys Leu Pro Ser Asn Gin Arg Met Ser Trp Arg Arg Asp Ser Gly Leu 

100 105 HO 

Ser Asp Gly Ser Ala Leu His Val Asp Leu Val Gly Gly Tyr Tyr Asp 

115 120 125 

Ala Gly Xaa Asn He Lys Phe Gly Phe Pro Met Ala Phe Thr Thr Thr 

130 135 140 

Met Leu Ser Trp Ser Val He Glu Phe Gly Gly Leu Met Lys Ser Glu 
145 150 155 160 

Leu Gin Asn Ala Lys He Ala He Arg Trp Ala Thr Asp Tyr Leu Leu 

165 170 175 

Lys Ala Thr Ser Gin Pro Asp Thr He Tyr Val Gin Val Gly Asp Ala 

180 185 190 

Asn Lys Asp His Ser Cys Trp Glu Arg Pro Glu Asp Met Asp Thr Val 

195 200 205 

Arg Ser Val Phe Lys Val Asp Lys Asn Thr Pro Gly Ser Asp Val Ala 

210 215 220 

Ala Glu Thr Ala Ala Ala Leu Ala Ala Ala Ala He Val Phe Arg Lys 
225 230 235 240 

Ser Asp Pro Ser Tyr Ser Lys Val Leu Leu Lys Arg Ala He Ser Val 

245 250 255 

Phe Ala Phe Ala Asp Lys Tyr Arg Gly Thr Tyr Ser Ala Gly Leu Lys 

260 265 270 

Pro Asp Val Cys Pro Phe Tyr Cys Ser Tyr Ser Gly Tyr Gin Asp Glu 

275 280 285 

Leu Leu Trp Gly Ala Ala Trp Leu Gin Asn Ala Thr Lys Asn Leu Lys 

290 295 300 

Tyr Leu Asn Tyr He Lys He Asn Gly Gin He Leu Gly Ala Ala Glu 
305 310 315 320 

Tyr Asp Asn Thr Phe Gly Trp Asp Asn Lys His Ala Gly Ala Arg He 



1500 
1560 
1620 
1680 
1740 
1800 
1860 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 2 
Page 168 



325 330 335 



Leu 


Leu 


Thr 


Lys 


Ala 


Phe 


Leu 


Val 


Gin 


Asn 


Val 


Lys 


Thr 


Leu 


His 


Glu 








340 










345 










350 






Tyr 


Lys 


Gly 


His 


Ala 


Asp 


Asn 


Phe 


He 


Cys 


Ser 


Val 


He 


Pro 


Gly 


Ala 


355 










360 










365 








Pro 


Phe 


Ser 


Ser 


Thr 


Gin 


Tyr 


Thr 


Pro 


Gly 


Gly 


Leu 


Leu 


Phe 


Lys 


Met 




370 










375 










380 










Ala 


Asp 


Ala 


Asn 


Met 


Gin 


Tyr 


Val 


Thr 


Ser 


Thr 


Ser 


Phe 


Leu 


Leu 


Leu 


385 








390 










395 










400 


Thr 


Tyr 


Ala 


Lys 


Tyr 


Leu 


Thr 


Ser 


Ala 


Lys 


Thr 


Val 


Val 


His 


Cys 


Gly 








405 










410 










415 




Gly 


Ser 


Val 


Tyr 


Thr 


Pro 


Gly 


Arg 


Leu 


Arg 


Ser 


He 


Ala 


Lys 


Arg 


Gin 






420 










425 










430 






Val 


Asp 


Tyr 


Leu 


Leu 


Gly 


Asp 


Asn 


Pro 


Leu 


Arg 


Met 


Ser 


Tyr 


Met 


Val 






435 










440 










445 








Gly 


Tyr 


Gly 


Pro 


Lys 


Phe 


Pro 


Arg 


Arg 


He 


His 


His 


Arg 


Gly 


Ser 


Ser 




450 










455 










460 










Leu 


Pro 


Cys 


Val 


Ala 


Ser 


His 


Pro 


Ala 


Lys 


He 


Gin 


C Y S 


His 


Gin 


Gly 


465 








470 










475 










480 


Phe 


Ala 


lie 


Met 


Asn 


Ser 


Gin 


Ser 


Pro 


Asn 


Pro 


Asn 


Phe 


Leu 


Val 


Gly 










485 










490 










495 




Ala 


Val 


Val 


Gly 


Gly 


Pro 


Asp 


Gin 


His 


Asp 


Arg 


Phe 


Pro 


Asp 


Glu 


Arg 








500 










505 










510 






Ser 


Asp 


Tyr 


Glu 


Gin 


Ser 


Glu 


Pro 


Ala 


Thr 


Tyr 


He 


Asn 


Ser 


Pro 


Leu 




515 










520 










525 








Val 


Gly 


Ala 


Leu 


Ala 


Tyr 


Phe 


Ala 


His 


Ala 


Tyr 


Gly 


Gin 


Leu 







530 535 540 

(2) INFORMATION FOR SEQ ID NO: 2 90: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 516 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..516 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582691 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 290: 



Met 


Ala 


Leu 


Leu 


Leu 


Val 


Ser 


Ser 


Ser 


Ser 


Ser 


Tyr 


Ala 


Leu 


Arg 


Val 


1 








5 










10 










15 




Thr 


He 


Phe 


Leu 
20 


Ser 


Phe 


Phe 


Phe 


Phe 
25 


Leu 


Cys 


Asn 


Gly 


Phe 
30 


Ser 


Tyr 


Pro 


Thr 


Thr 
35 


Ser 


Ser 


Leu 


Phe 


Asn 
40 


Thr 


His 


His 


His 


Arg 
45 


His 


His 


Leu 


Ala 


Lys 
50 


His 


Asn 


Tyr 


Lys 


Asp 
55 


Ala 


Leu 


Thr 


Lys 


Ser 
60 


He 


Leu 


Phe 


Phe 


Glu 


Gly 


Gin 


Arg 


Ser 


Gly 


Lys 


Leu 


Pro 


Ser 


Asn 


Gin 


Arg 


Met 


Ser 


Trp 


65 








70 










75 










80 


Arg 


Arg 


Asp 


Ser 


Gly 


Leu 


Ser 


Asp 


Gly 


Ser 


Ala 


Leu 


His 


Val 


Asp 


Leu 






85 










90 










95 




Val 


Gly 


Gly 


Tyr 


Tyr 


Asp 


Ala 


Gly 


Xaa 


Asn 


He 


Lys 


Phe 


Gly 


Phe 


Pro 






100 










105 










110 






Met 


Ala 


Phe 
115 


Thr 


Thr 


Thr 


Met 


Leu 
120 


Ser 


Trp 


Ser 


Val 


He 
125 


Glu 


Phe 


Gly 


Gly 


Leu 


Met 


Lys 


Ser 


Glu 


Leu 


Gin 


Asn 


Ala 


Lys 


He 


Ala 


He 


Arg 


Trp 


130 










135 










140 










Ala 


Thr 


Asp 


Tyr 


Leu 


Leu 


Lys 


Ala 


Thr 


Ser 


Gin 


Pro 


Asp 


Thr 


He 


Tyr 


145 








150 










155 










160 


Val 


Gin 


Val 


Gly 


Asp 
165 


Ala 


Asn 


Lys 


Asp 


His 
170 


Ser 


Cys 


Trp 


Glu 


Arg 
175 


Pro 
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Glu Asp Met Asp Thr Val Arg Ser Val Phe Lys Val Asp Lys Asn Thr 

180 185 190 

Pro Gly Ser Asp Val Ala Ala Glu Thr Ala Ala Ala Leu Ala Ala Ala 

195 200 205 

Ala lie Val Phe Arg Lys Ser Asp Pro Ser Tyr Ser Lys Val Leu Leu 

210 215 220 

Lys Arg Ala He Ser Val Phe Ala Phe Ala Asp Lys Tyr Arg Gly Thr 
225 230 235 240 

Tyr Ser Ala Gly Leu Lys Pro Asp Val Cys Pro Phe Tyr Cys Ser Tyr 

245 250 255 

Ser Gly Tyr Gin Asp Glu Leu Leu Trp Gly Ala Ala Trp Leu Gin Asn 

260 265 270 

Ala Thr Lys Asn Leu Lys Tyr Leu Asn Tyr He Lys He Asn Gly Gin 

275 280 285 

He Leu Gly Ala Ala Glu Tyr Asp Asn Thr Phe Gly Trp Asp Asn Lys 

290 295 300 

His Ala Gly Ala Arg He Leu Leu Thr Lys Ala Phe Leu Val Gin Asn 
305 310 315 320 

Val Lys Thr Leu His Glu Tyr Lys Gly His Ala Asp Asn Phe He Cys 

325 330 335 

Ser Val He Pro Gly Ala Pro Phe Ser Ser Thr Gin Tyr Thr Pro Gly 

340 345 350 

Gly Leu Leu Phe Lys Met Ala Asp Ala Asn Met Gin Tyr Val Thr Ser 

355 360 365 

Thr Ser Phe Leu Leu Leu Thr Tyr Ala Lys Tyr Leu Thr Ser Ala Lys 

370 375 380 

Thr Val Val His Cys Gly Gly Ser Val Tyr Thr Pro Gly Arg Leu Arg 
385 390 395 400 

Ser He Ala Lys Arg Gin Val Asp Tyr Leu Leu Gly Asp Asn Pro Leu 

405 410 415 

Arg Met Ser Tyr Met Val Gly Tyr Gly Pro Lys Phe Pro Arg Arg He 

420 425 430 

His His Arg Gly Ser Ser Leu Pro Cys Val Ala Ser His Pro Ala Lys 

435 440 445 

He Gin Cys His Gin Gly Phe Ala He Met Asn Ser Gin Ser Pro Asn 

450 455 460 

Pro Asn Phe Leu Val Gly Ala Val Val Gly Gly Pro Asp Gin His Asp 
465 470 475 480 

Arg Phe Pro Asp Glu Arg Ser Asp Tyr Glu Gin Ser Glu Pro Ala Thr 

485 490 495 

Tyr He Asn Ser Pro Leu Val Gly Ala Leu Ala Tyr Phe Ala His Ala 

500 505 510 

Tyr Gly Gin Leu 
515 

(2) INFORMATION FOR SEQ ID NO: 291: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 439 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..439 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582692 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 91: 
Met Ser Trp Arg Arg Asp Ser Gly Leu Ser Asp Gly Ser Ala Leu His 
15 10 15 

Val Asp Leu Val Gly Gly Tyr Tyr Asp Ala Gly Xaa Asn He Lys Phe 

20 25 30 

Gly Phe Pro Met Ala Phe Thr Thr Thr Met Leu Ser Trp Ser Val He 
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35 40 45 

Glu Phe Gly Gly Leu Met Lys Ser Glu Leu Gin Asn Ala Lys lie Ala 

50 55 60 

lie Arg Trp Ala Thr Asp Tyr Leu Leu Lys Ala Thr Ser Gin Pro Asp 
65 70 75 80 

Thr lie Tyr Val Gin Val Gly Asp Ala Asn Lys Asp His Ser Cys Trp 

85 90 95 

Glu Arg Pro Glu Asp Met Asp Thr Val Arg Ser Val Phe Lys Val Asp 

100 105 HO 

Lys Asn Thr Pro Gly Ser Asp Val Ala Ala Glu Thr Ala Ala Ala Leu 

115 120 125 

Ala Ala Ala Ala He Val Phe Arg Lys Ser Asp Pro Ser Tyr Ser Lys 

130 135 140 

Val Leu Leu Lys Arg Ala He Ser Val Phe Ala Phe Ala Asp Lys Tyr 
145 150 155 160 

Arg Gly Thr Tyr Ser Ala Gly Leu Lys Pro Asp Val Cys Pro Phe Tyr 

165 170 175 

Cys Ser Tyr Ser Gly Tyr Gin Asp Glu Leu Leu Trp Gly Ala Ala Trp 

180 185 190 

Leu Gin Asn Ala Thr Lys Asn Leu Lys Tyr Leu Asn Tyr He Lys He 

195 200 205 

Asn Gly Gin He Leu Gly Ala Ala Glu Tyr Asp Asn Thr Phe Gly Trp 

210 215 220 

Asp Asn Lys His Ala Gly Ala Arg He Leu Leu Thr Lys Ala Phe Leu 
225 230 235 240 

Val Gin Asn Val Lys Thr Leu His Glu Tyr Lys Gly His Ala Asp Asn 

245 250 255 

Phe He Cys Ser Val He Pro Gly Ala Pro Phe Ser Ser Thr Gin Tyr 

260 265 270 

Thr Pro Gly Gly Leu Leu Phe Lys Met Ala Asp Ala Asn Met Gin Tyr 

275 280 285 

Val Thr Ser Thr Ser Phe Leu Leu Leu Thr Tyr Ala Lys Tyr Leu Thr 

290 295 300 

Ser Ala Lys Thr Val Val His Cys Gly Gly Ser Val Tyr Thr Pro Gly 
305 310 315 320 

Arg Leu Arg Ser He Ala Lys Arg Gin Val Asp Tyr Leu Leu Gly Asp 

325 330 335 

Asn Pro Leu Arg Met Ser Tyr Met Val Gly Tyr Gly Pro Lys Phe Pro 

340 345 350 

Arg Arg He His His Arg Gly Ser Ser Leu Pro Cys Val Ala Ser His 

355 360 365 

Pro Ala Lys He Gin Cys His Gin Gly Phe Ala He Met Asn Ser Gin 

370 ' 375 380 

Ser Pro Asn Pro Asn Phe Leu Val Gly Ala Val Val Gly Gly Pro Asp 
385 390 395 400 

Gin His Asp Arg Phe Pro Asp Glu Arg Ser Asp Tyr Glu Gin Ser Glu 

405 410 415 

Pro Ala Thr Tyr He Asn Ser Pro Leu Val Gly Ala Leu Ala Tyr Phe 

420 425 430 

Ala His Ala Tyr Gly Gin Leu 
435 

(2) INFORMATION FOR SEQ ID NO: 2 92: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 91 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..4 91 
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(D) OTHER INFORMATION: / Ceres Seq. ID 1582700 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 292: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaggtaccct gacttggaga acaccgccac 
cgagactctt gtcctcggtg ttgctccggc gatgaactct cagtacgagg cttccggcga 
gactttcgtt gccgagaatg atgcctgcaa atgcggatct gactgcaagt gcaacccttg 
tacctgcaaa tgaaaaactt cataaaccct aagtctgtaa taaccctaat gttatgttag 
gtttgcttat atgtaataat tggctgattt ttccggtagt tttgccggcg acgttggtct 
ttctcttctt cttcttcttc tgtgtgtgtt tttatgtttt attaatccta agactattat 
gggtttgtat c 

(2) INFORMATION FOR SEQ ID NO: 2 93: 



60 
120 
180 
240 
300 
360 
420 
480 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



(xi! 



peptide 
1. .70 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582701 
SEQUENCE DESCRIPTION: SEQ ID NO: 293: 
Ser Leu He He Val Leu Arg Gin He Lys He He Phe Lys He Phe 
15 10 15 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly 

20 25 30 

Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys Gly Gly 

35 40 45 

Cys Lys Gly Thr Leu Thr Trp Arg Thr Pro Pro Pro Arg Leu Leu Ser 

50 55 60 

Ser Val Leu Leu Arg Arg 



65 
(2) 



70 



INFORMATION FOR SEQ ID NO: 294 



<i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME / KEY : 

(B) LOCATION: 



peptide 
1. .44 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582702 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 294: 

Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
5 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Gly Thr Leu Thr Trp Arg Thr Pro 

20 25 30 

Pro Pro Arg Leu Leu Ser Ser Val Leu Leu Arg Arg 
35 40 
INFORMATION FOR SEQ ID NO: 2 95: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 
{ D) TOPOLOGY: linear 

MOLECULE TYPE: peptide 



Met 
1 



(2; 



(ii) 
(ix) 



FEATURE: 

(A) NAME / KEY : 

(B) LOCATION: 



peptide 
1. .46 
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60 



180 
240 
300 
360 



(D) OTHER INFORMATION: / Ceres Seq. ID 1582703 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 95: 
Met Leu Gly Leu Leu He Cys Asn Asn Trp Leu He Phe Pro Val Val 
15 10 15 

Leu Pro Ala Thr Leu Val Phe Leu Phe Phe Phe Phe Phe Cys Val Cys 

20 25 30 

Phe Tyr Val Leu Leu He Leu Arg Leu Leu Trp Val Cys He 

35 40 45 

(2) INFORMATION FOR SEQ ID NO: 296: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 505 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..505 

<D) OTHER INFORMATION: / Ceres Seq. ID 1582712 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 296: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 120 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgttggtc 420 
tttctcttct tcttcttctt ctgtgtgtgt ttttatggtt tggtcattaa gatatctctg 480 
caaagtttgc atatggttta tactc 
(2) INFORMATION FOR SEQ ID NO: 297: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582713 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 297: 
Ser Leu He He Val Leu Arg Gin He Lys He He Phe Lys He Phe 
15 10 15 

Ser Asp Leu Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly 

20 25 30 

Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys Gly Gly 

35 40 45 

Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr Ala Thr Glu Thr Leu Val 

50 55 60 

Leu Gly Val Ala Pro Ala Met Asn Ser Gin Tyr Glu Ala Ser Gly Glu 
65 70 75 80 

Thr Phe Val Ala Glu Asn Asp Ala Cys Lys Cys Gly Ser Asp Cys Lys 

85 90 95 

Cys Asn Pro Cys Thr Cys Lys 
100 

(2) INFORMATION FOR SEQ ID NO: 2 98: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 

(ix) FEATURE: 

(A) NAME /KEY : peptide 
<B) LOCATION: 1..77 
(D) OTHER INFORMATION: 



/ Ceres Seq. ID 1582714 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 298: 



Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 








5 










10 










15 




Gly 


Asn 


Gly 


Cys 
20 


Gly 


Gly 


Cys 


Lys 


Arg 
25 


Tyr 


Pro 


Asp 


Leu 


Glu 
30 


Asn 


Thr 


Ala 


Thr 


Glu 
35 


Thr 


Leu 


Val 


Leu 


Gly 
40 


Val 


Ala 


Pro 


Ala 


Met 
45 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe 


Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


50 










55 










60 










Cys 


Gly 


Ser 


Asp 


Cys 


Lys 


Cys 


Asn 


Pro 


Cys 


Thr 


cys 


Lys 








65 










70 










75 












(2) 


INFORMATION 


FOR 


SEQ 


ID 


l\TO:299: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 441 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..441 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582741 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 299: 
attcactgat tattgtttta aggcaaatta agatcatctt caaaatcttc tcagatctct 
tccaattttc tagaaaaaac atgtcttgct gtggtggaag ctgtggttgt ggatctgcct 
gcaagtgcgg caatggttgc ggaggttgca aaaggtaccc tgacttggag aacaccgcca 
ccgagactct tgtcctcggt gttgctccgg cgatgaactc tcagtacgag gcttccggcg 
agactttcgt tgccgagaat gatgcctgca aatgcggatc tgactgcaag tgcaaccctt 
gtacctgcaa atgaagaact tcataaaccc taagtctgta ataaccctaa tgttatgtta 
ggtttgctta tatgtaataa ttggctgatt tttccggtag ttttgccggc gacgtcgttc 
tttactgcaa tattctttct g 
(2) INFORMATION FOR SEQ ID NO: 300: 



60 
120 
180 
240 
300 
360 
420 



<i> 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582742 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 


:300: 










Ser 


Leu 


He 


He 


Val 


Leu 


Arg 


Gin 


He 


Lys 


He He 


Phe 


Lys 


He 


Phe 


1 








5 










10 








15 




Ser 


Asp 


Leu 


Phe 


Gin 


Phe 


Ser 


Arg 


Lys 


Asn 


Met Ser 


Cys 


Cys 


Gly 


Gly 






20 










25 








30 






Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


Gly Asn 


Gly 


Cys 


Gly 


Gly 




35 










40 








45 








Cys 


Lys 
50 


Arg 


Tyr 


Pro 


Asp 


Leu 
55 


Glu 


Asn 


Thr 


Ala Thr 
60 


Glu 


Thr 


Leu 


Val 


Leu 


Gly 


Val 


Ala 


Pro 


Ala 


Met 


Asn 


Ser 


Gin 


Tyr Glu 


Ala 


Ser 


Gly 


Glu 


65 








70 










75 








80 


Thr 


Phe 


Val 


Ala 


Glu 
85 


Asn 


Asp 


Ala 


Cys 


Lys 
90 


Cys Gly 


Ser 


Asp 


Cys 
95 


Lys 


Cys 


Asn 


Pro 


Cys 


Thr 


Cys 


Lys 
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100 

(2) INFORMATION FOR SEQ ID NO: 301: 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 7 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE : 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 
1. .77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582743 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 301: 
Met Ser Cys Cys Gly Gly Ser Cys Gly Cys Gly Ser Ala Cys Lys Cys 
15 10 15 

Gly Asn Gly Cys Gly Gly Cys Lys Arg Tyr Pro Asp Leu Glu Asn Thr 

20 25 30 

Ala Thr Glu Thr Leu Val Leu Gly Val Ala Pro Ala Met Asn Ser Gin 

35 40 45 

Tyr Glu Ala Ser Gly Glu Thr Phe Val Ala Glu Asn Asp Ala Cys Lys 

50 55 60 

Cys Gly Ser Asp Cys Lys Cys Asn Pro Cys Thr Cys Lys 



65 
(2) 



70 

INFORMATION FOR SEQ ID NO: 302: 



75 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 513 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: 
FEATURE : 

(A) NAME /KEY : 

(B) LOCATION: 



DNA (genomic) 



ID 1582786 



1. .513 

(D) OTHER INFORMATION: / Ceres Seq. 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 302: 
ctgacgcgag aactagtctc accagacaac aacaacaaca acaaaactta agggaaagat 
aatcgtttgt ctgtgcagag agagagagag agagaaaatg aggttgttcg atccatggcc 
agtgttcttc aagagagaat ggaaacgttg ctggccattc ctcaccggtt tcgccgtcac 
cggcgttctc atcaccaagc taaccgccgg tctcactgag gaagatgcta agaactccaa 
gttcgtccag caacacagtg attcactgaa gcaaaatgta gaagctgaca aggcataaat 
atttgggact atagtgaatg cttcagcttc tttgaaacat gttcaataac aaagaagagc 
gttctatatt actcttttat tttctctgag ttttgaaatc agctattctt tttctgaaac 
ttagcaacaa atggtttttg ttcacgttat cttattccca tttgtttgga aatggatttt 
atggcctttt tgaatcatgc gaacgcacat ttc 
(2) INFORMATION FOR SEQ ID NO:303: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE : 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..66 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582787 
SEQUENCE DESCRIPTION: SEQ ID NO: 303: 
Met Arg' Leu Phe Asp Pro Trp Pro Val Phe Phe Lys Arg Glu Trp Lys 
15 10 15 

Arg Cys Trp Pro Phe Leu Thr Gly Phe Ala Val Thr Gly Val Leu He 

20 25 30 

Thr Lys Leu Thr Ala Gly Leu Thr Glu Glu Asp Ala Lys Asn Ser Lys 



60 
120 
180 
240 
300 
360 
420 
480 



(ii) 
(ix) 



(xi] 
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35 40 45 

Phe Val Gin Gin His Ser Asp Ser Leu Lys Gin Asn Val Glu Ala Asp 

50 55 60 

Lys Ala 
65 

(2) INFORMATION FOR SEQ ID NO: 304: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..34 

<D) OTHER INFORMATION: / Ceres Seq. ID 1582788 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 304: 
Met Ala Ser Val Leu Gin Glu Arg Met Glu Thr Leu Leu Ala lie Pro 
15 10 15 

His Arg Phe Arg Arg His Arg Arg Ser His His Gin Ala Asn Arg Arg 
20 25 30 

Ser His 

(2) INFORMATION FOR SEQ ID NO: 305: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 681 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..681 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582825 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 305: 
aacaaacaca aaacttgatc tcacgactcc atcttctaaa attcagaatc tttctttgaa 60 
acttttgtgt gaagaaaatg gcaccaaagg cagagaagaa gccggcagag aagaaaccag 120 
tggaagagaa atcaaaagcc gaaaaagctc cagcggagaa gaaaccaaaa gccggcaaga 18 0 

aactcccgaa gaagccggtg ccggcggcga taagaagaag aaaatgaaga agaagagtgt 240 
tgagacgtac aagatctaca tcttcaaggt tctgaaacaa gttcatccag atattggtat 
ttcaagcaaa gctatgggga ttatgaacag tttcatcaac gacatctttg agaaattggc 
atcggaatcg tcgaaactcg cgaggtataa taagaagccg acgattactt ctcgggagat 
tcagactgct gttagacttg ttcttcctgg tgagcttgcg aaacatgctg tttctgaagg 
aactaaggcg gtgactaagt ttactagctc ttgaattgtg gatcttgttg aaatcgatgt 
ttgtaaaatt agggtttttt agatttgatg gttgttgttg actctttgat cgatttctgt 
ttcgttttct tgttgcgatg ttaatcaaat cggatccgat ttctttctta aatcaaatct 
atcaagtaaa atcttttgcc g 
(2) INFORMATION FOR SEQ ID NO: 306: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 170 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..170 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582826 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 306: 
Thr Asn Thr Lys Leu Asp Leu Thr Thr Pro Ser Ser Lys lie Gin Asn 
15 10 15 



300 
360 
420 
480 
540 
600 
660 
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Leu 


Ser 


Leu 


Lys 


Leu 


Leu 


Cys 


Glu 


Glu 


Asn 


Gly 


Thr 


Lys 


Gly Arg 


Glu 








20 










25 










30 






Glu 


Ala 


Gly 


Arg 


Glu 


Glu 


Thr 


Ser 


Gly 


Arg 


Glu 


He 


Lys 


Ser 


Arg 


Lys 






35 










40 










45 








Ser 


Ser 


Ser 


Gly 


Glu 


Glu 


Thr 


Lys 


Ser 


Arg 


Gin 


Glu 


Thr 


Pro 


Glu 


Glu 




50 










55 










60 










Ala 


Gly 


Ala 


Gly 


Gly 


Asp 


Lys 


Lys 


Lys 


Lys 


Met 


Lys 


Lys 


Lys 


Ser 


Val 


65 










70 










75 










80 


Glu 


Thr 


Tyr 


Lys 


He 


Tyr 


He 


Phe 


Lys 


Val 


Leu 


Lys 


Gin 


Val 


His 


Pro 








85 










90 










95 




Asp 


He 


Gly 


He 


Ser 


Ser 


Lys 


Ala 


Met 


Gly 


He 


Met 


Asn 


Ser 


Phe 


He 




100 










105 










110 






Asn 


Asp 


He 


Phe 


Glu 


Lys 


Leu 


Ala 


Ser 


Glu 


Ser 


Ser 


Lys 


Leu 


Ala 


Arg 




115 










120 










125 








Tyr 


Asn 


Lys 


Lys 


Pro 


Thr 


He 


Thr 


Ser 


Arg 


Glu 


He 


Gin 


Thr 


Ala 


Val 


130 










135 










140 










Arg 


Leu 


Val 


Leu 


Pro 


Gly 


Glu 


Leu 


Ala 


Lys 


His 


Ala 


Val 


Ser 


Glu 


Gly 


145 










150 










155 










160 


Thr 


Lys 


Ala 


Val 


Thr 


Lys 


Phe 


Thr 


Ser 


Ser 




















165 










170 














(2) 


INFORMATION 


FOR 


SEQ 


ID : 


NO:307 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 96 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..96 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582827 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 307: 



Met 


Lys 


Lys 


Lys 


Ser 


Val 


Glu 


Thr 


T Y r 


Lys 


He 


Tyr 


He 


Phe 


Lys 


Val 


1 


5 










10 










15 




Leu 


Lys 


Gin 


Val 


His 


Pro 


Asp 


He 


Gly 


He 


Ser 


Ser 


Lys 


Ala 


Met 


Gly 






20 










25 










30 






He 


Met 


Asn 


Ser 


Phe 


He 


Asn 


Asp 


He 


Phe 


Glu 


Lys 


Leu 


Ala 


Ser 


Glu 






35 










40 










45 








Ser 


Ser 


Lys 


Leu 


Ala 


Arg 


Tyr 


Asn 


Lys 


Lys 


Pro 


Thr 


He 


Thr 


Ser 


Arg 




50 








55 










60 










Glu 


He 


Gin 


Thr 


Ala 


Val 


Arg 


Leu 


Val 


Leu 


Pro 


Gly 


Glu 


Leu 


Ala 


Lys 


65 










70 










75 










80 


His 


Ala 


Val 


Ser 


Glu 


Gly 


Thr 


Lys 


Ala 


Val 


Thr 


Lys 


Phe 


Thr 


Ser 


Ser 










85 








90 










95 





(2) INFORMATION FOR SEQ ID NO:308: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1291 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1291 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582927 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 308: 
acatcttaaa aagtaaaaac acattcatct atccaacaaa aaaaaaaaaa aaaggagaaa 
tggaaggaat cgatcataga atggtgagtg tcaatggcat aactatgcac attgccgaga 
aaggtcccaa agaaggacct gtggtgcttc tcctccatgg attccctgat ctctggtaca 
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240 
300 



480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 



1260 



cgtggcgtca ccagattagt gggttatcat ctctaggtta ccgcgctgta gctccagacc 
tccgaggcta cggagactct gattcgccag agtctttctc cgagtacacg tgtcttaacg 
tcgttgggga cctcgtagct cttctggaca gtgttgctgg aaatcaagag aaggtgtttc 360 
tggtcggtca tgattgggga gccattatcg gatggtttct ctgtttgttt cgacctgaaa 420 
agattaacgg ctttgtgtgt ttgagtgtgc cgtatagatc aagaaaccct aaagtcaagc 
ccgttcaagg gttcaaggct gtatttggag atgattacta catttgtaga tttcaggaac 
cggggaagat tgaaggagag attgcaagtg cagatccaag aatatttctg aggaacctct 
tcacagggag gacactcggt ccgccgattt tacctaagga taatcccttt ggggaaaaac 
ctaaccctaa tagcgaaaac attgaattgc ctgaatggtt ttctaagaaa gatctcgatt 
tctatgtctc caaattcgag aaggcaggat ttaccggtgg attgaactac tacagagcca 
tggatctgaa ttgggagctc actgcaccat ggaccggagc taagattcaa gttccagtga 
agttcatgac aggtgacttc gacatggttt acaccacacc agggatgaaa gagtacattc 
acggtggtgg atttgctgca gatgttccaa ctcttcaaga gatagtggtg attgaagatg 
ctggtcactt cgttaaccaa gagaaacctc aagaggtcac tgctcacatc aatgacttct 
tcaccaagct tcgggacaac aacaaaagct tttagagttc tcgtttggtt ctattatgtt 
ggctctcaaa acaagttggt tcttgcatgt gttgtttcga caagattttg aataagactt 1140 
ggcattatga agctcaacgt gtgtaggaac aatcgattct ggcgaaaata gttagaaggc 1200 
tggattgaag atttcaaaag aaaaccaatt ttttttatag atttcacaat gattttataa 
gcaaactaaa tgaaccagaa ataaccaagt g 
(2) INFORMATION FOR SEQ ID NO:309: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 350 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..350 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582928 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 309: 
lie Leu Lys Ser Lys Asn Thr Phe lie Tyr Pro Thr Lys Lys Lys Lys 
15 10 15 

Lys Gly Glu Met Glu Gly He Asp His Arg Met Val Ser Val Asn Gly 

20 25 30 

He Thr Met His He Ala Glu Lys Gly Pro Lys Glu Gly Pro Val Val 

35 40 45 

Leu Leu Leu His Gly Phe Pro Asp Leu Trp Tyr Thr Trp Arg His Gin 

50 55 60 

He Ser Gly Leu Ser Ser Leu Gly Tyr Arg Ala Val Ala Pro Asp Leu 
65 70 75 80 

Arg Gly Tyr Gly Asp Ser Asp Ser Pro Glu Ser Phe Ser Glu Tyr Thr 

85 90 95 

Cys Leu Asn Val Val Gly Asp Leu Val Ala Leu Leu Asp Ser Val Ala 

100 105 HO 

Gly Asn Gin Glu Lys Val Phe Leu Val Gly His Asp Trp Gly Ala He 

115 120 125 

He Gly Trp Phe Leu Cys Leu Phe Arg Pro Glu Lys He Asn Gly Phe 

130 135 140 

Val Cys Leu Ser Val Pro Tyr Arg Ser Arg Asn Pro Lys Val Lys Pro 
145 150 155 160 

Val Gin Gly Phe Lys Ala Val Phe Gly Asp Asp Tyr Tyr He Cys Arg 

165 170 175 

Phe Gin Glu Pro Gly Lys He Glu Gly Glu He Ala Ser Ala Asp Pro 

180 185 190 

Arg He Phe Leu Arg Asn Leu Phe Thr Gly Arg Thr Leu Gly Pro Pro 

195 200 205 

He Leu Pro Lys Asp Asn Pro Phe Gly Glu Lys Pro Asn Pro Asn Ser 

210 215 220 

Glu Asn He Glu Leu Pro Glu Trp Phe Ser Lys Lys Asp Leu Asp Phe 
225 230 235 240 
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Tyr Val Ser Lys Phe Glu Lys Ala Gly Phe Thr Gly Gly Leu Asn Tyr 

245 250 255 

Tyr Arg Ala Met Asp Leu Asn Trp Glu Leu Thr Ala Pro Trp Thr Gly 

260 265 270 

Ala Lys lie Gin Val Pro Val Lys Phe Met Thr Gly Asp Phe Asp Met 

275 280 285 

Val Tyr Thr Thr Pro Gly Met Lys Glu Tyr lie His Gly Gly Gly Phe 

290 295 300 

Ala Ala Asp Val Pro Thr Leu Gin Glu lie Val Val lie Glu Asp Ala 
305 310 315 320 

Gly His Phe Val Asn Gin Glu Lys Pro Gin Glu Val Thr Ala His lie 

325 330 335 

Asn Asp Phe Phe Thr Lys Leu Arg Asp Asn Asn Lys Ser Phe 
340 345 350 

(2) INFORMATION FOR SEQ ID NO: 310: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 331 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..331 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582929 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 310: 
Met Glu Gly lie Asp His Arg Met Val Ser Val Asn Gly He Thr Met 
15 10 15 

His He Ala Glu Lys Gly Pro Lys Glu Gly Pro Val Val Leu Leu Leu 

20 25 30 

His Gly Phe Pro Asp Leu Trp Tyr Thr Trp Arg His Gin He Ser Gly 

35 40 45 

Leu Ser Ser Leu Gly Tyr Arg Ala Val Ala Pro Asp Leu Arg Gly Tyr 

50 55 60 

Gly Asp Ser Asp Ser Pro Glu Ser Phe Ser Glu Tyr Thr Cys Leu Asn 
65 70 75 80 

Val Val Gly Asp Leu Val Ala Leu Leu Asp Ser Val Ala Gly Asn Gin 

85 90 95 

Glu Lys Val Phe Leu Val Gly His Asp Trp Gly Ala He He Gly Trp 

100 105 HO 

Phe Leu Cys Leu Phe Arg Pro Glu Lys He Asn Gly Phe Val Cys Leu 

115 120 125 

Ser Val Pro Tyr Arg Ser Arg Asn Pro Lys Val Lys Pro Val Gin Gly 

130 135 140 

Phe Lys Ala Val Phe Gly Asp Asp Tyr Tyr He Cys Arg Phe Gin Glu 
145 150 155 160 

Pro Gly Lys He Glu Gly Glu He Ala Ser Ala Asp Pro Arg He Phe 

165 170 175 

Leu Arg Asn Leu Phe Thr Gly Arg Thr Leu Gly Pro Pro He Leu Pro 

180 185 190 

Lys Asp Asn Pro Phe Gly Glu Lys Pro Asn Pro Asn Ser Glu Asn He 

195 200 205 

Glu Leu Pro Glu Trp Phe Ser Lys Lys Asp Leu Asp Phe Tyr Val Ser 

210 215 220 

Lys Phe Glu Lys Ala Gly Phe Thr Gly Gly Leu Asn Tyr Tyr Arg Ala 
225 230 235 240 

Met Asp Leu Asn Trp Glu Leu Thr Ala Pro Trp Thr Gly Ala Lys He 

245 250 255 

Gin Val Pro Val Lys Phe Met Thr Gly Asp Phe Asp Met Val Tyr Thr 

260 265 270 

Thr Pro Gly Met Lys Glu Tyr He His Gly Gly Gly Phe Ala Ala Asp 
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275 280 285 

Val Pro Thr Leu Gin Glu He Val Val He Glu Asp Ala Gly His Phe 

290 295 300 

Val Asn Gin Glu Lys Pro Gin Glu Val Thr Ala His He Asn Asp Phe 
305 310 315 320 

Phe Thr Lys Leu Arg Asp Asn Asn Lys Ser Phe 

325 330 
{2) INFORMATION FOR SEQ ID NO: 311: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 324 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
<ix) FEATURE: 

(A) NAME/ KEY : peptide 

(B) LOCATION: 1..324 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582930 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 311: 
Met Val Ser Val Asn Gly He Thr Met His He Ala Glu Lys Gly Pro 
15 10 15 

Lys Glu Gly Pro Val Val Leu Leu Leu His Gly Phe Pro Asp Leu Trp 

20 25 30 

Tyr Thr Trp Arg His Gin He Ser Gly Leu Ser Ser Leu Gly Tyr Arg 

35 40 45 

Ala Val Ala Pro Asp Leu Arg Gly Tyr Gly Asp Ser Asp Ser Pro Glu 

50 55 60 

Ser Phe Ser Glu Tyr Thr Cys Leu Asn Val Val Gly Asp Leu Val Ala 
65 70 75 80 

Leu Leu Asp Ser Val Ala Gly Asn Gin Glu Lys Val Phe Leu Val Gly 

85 90 95 

His Asp Trp Gly Ala He He Gly Trp Phe Leu Cys Leu Phe Arg Pro 

100 105 HO 

Glu Lys He Asn Gly Phe Val Cys Leu Ser Val Pro Tyr Arg Ser Arg 

115 120 125 

Asn Pro Lys Val Lys Pro Val Gin Gly Phe Lys Ala Val Phe Gly Asp 

130 135 140 

Asp Tyr Tyr He Cys Arg Phe Gin Glu Pro Gly Lys He Glu Gly Glu 
145 150 155 160 

He Ala Ser Ala Asp Pro Arg He Phe Leu Arg Asn Leu Phe Thr Gly 

165 170 175 

Arg Thr Leu Gly Pro Pro He Leu Pro Lys Asp Asn Pro Phe Gly Glu 

180 185 190 

Lys Pro Asn Pro Asn Ser Glu Asn He Glu Leu Pro Glu Trp Phe Ser 

195 200 205 

Lys Lys Asp Leu Asp Phe Tyr Val Ser Lys Phe Glu Lys Ala Gly Phe 

210 215 220 

Thr Gly Gly Leu Asn Tyr Tyr Arg Ala Met Asp Leu Asn Trp Glu Leu 
225 230 235 240 

Thr Ala Pro Trp Thr Gly Ala Lys He Gin Val Pro Val Lys Phe Met 

245 250 255 

Thr Gly Asp Phe Asp Met Val Tyr Thr Thr Pro Gly Met Lys Glu Tyr 

260 265 270 

He His Gly Gly Gly Phe Ala Ala Asp Val Pro Thr Leu Gin Glu He 

275 280 285 

Val Val He Glu Asp Ala Gly His Phe Val Asn Gin Glu Lys Pro Gin 

290 295 300 

Glu Val Thr Ala His He Asn Asp Phe Phe Thr Lys Leu Arg Asp Asn 
305 310 315 320 

Asn Lys Ser Phe 
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(2) INFORMATION FOR SEQ ID NO: 312: 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 675 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
MOLECULE TYPE: 
FEATURE : 

(A) NAME /KEY : 

(B) LOCATION: 



DNA (genomic) 



1. . 675 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582959 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 312: 
tgggcaaggc aacaaacata atcaacttaa atcttatcta ctttctattt ctttttaatc 
aaaattaccg ttcttaacta tggcgaagtg gtttttcact atcttcttgg tttttgccct 
agcctcagct ttagcttgtg gcgcaagaaa cgtcccagta ggcctctctg accaaaagaa 
ctacctcgga tatggtggcg gatattccgg cgttggagac aatggtttac cctttggtgg 
cgtcggtgga ggtgtgtctg gtcccggagg taatcttggt tatgggggat ttggtggtgc 
tggtggcggc ttaggcggtg gtttgggcgg tggagcaggc agtggattag gcggtggctt 
aggtggtgga agtggaattg gtgccggaac cagtggagga agtaccggag gagttcattt 
cccttgagtt gttactttgg tttttaaggc gtcatacggt ccttattaag ctaggtctag 
cttaagatga tgtcataata ataatttatc atatctcttt agggttttaa actttggtat 
tatgaattat cattagctgt ttaacgtgcg tcttaagtta ctattttaac gtatgtttga 
atcagtctag tggcttgtcg tgtcatggct tggtccattt tcaaattcta ctttgacctt 
ttcgagtgtt tcacc 

(2) INFORMATION FOR SEQ ID NO: 313: 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 



(xi) 



peptide 
1. . 115 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582960 
SEQUENCE DESCRIPTION: SEQ ID NO: 313: 
Met Ala Lys Trp Phe Phe Thr lie Phe Leu Val Phe Ala Leu Ala Ser 
15 10 15 

Ala Leu Ala Cys Gly Ala Arg Asn Val Pro Val Gly Leu Ser Asp Gin 

20 25 30 

Lys Asn Tyr Leu Gly Tyr Gly Gly Gly Tyr Ser Gly Val Gly Asp Asn 

35 40 45 

Gly Leu Pro Phe Gly Gly Val Gly Gly Gly Val Ser Gly Pro Gly Gly 

50 55 60 

Asn Leu Gly Tyr Gly Gly Phe Gly Gly Ala Gly Gly Gly Leu Gly Gly 
65 70 75 80 

Gly Leu Gly Gly Gly Ala Gly Ser Gly Leu Gly Gly Gly Leu Gly Gly 

85 90 95 

Gly Ser Gly He Gly Ala Gly Thr Ser Gly Gly Ser Thr Gly Gly Val 
100 105 HO 

Phe Pro 
115 

INFORMATION FOR SEQ ID NO: 314: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 97 9 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

MOLECULE TYPE: DNA (genomic) 
FEATURE: 

(A) NAME/KEY: - 



His 



(2) 



(ii) 
(ix) 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
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(B) LOCATION: 1..97 9 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582997 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 314: 
aattcagaag atcgaagaag atgagtagag gaagcggcgc tggttatgat cgtcacatca 60 
ctatcttctc accqgaaggt cgtctctttc aagttgaata tgccttcaag gccgtgaaaa 120 

180 
240 
300 
360 
420 
480 
540 
600 
660 



840 
900 
960 



cagctggtat tacttcaatc ggagttcgtg ggaaagattc ggtttgcgtc gttacccaga 
agaaagttcc tgacaagctt ttggatcagt ctagtgttac acatctgttc cctatcacca 
agttcattgg attggtagct actggcatta cagctgatgc gaggtctttg gttcaacaag 
caaggaacca agcagctgaa tttcggttca cttatggata cgagatgcct gttgacattc 
ttgctaaatg gatagcggac aagtcacagg tctacactca acatgcttac atgagacccc 
ttggagttgt cgctatggta atgggtgttg atgaagagaA tggtccctta ctttacaaat 
gtgacccagc tggacatttt tacggtcaca aggcaactag tgctggtatg aaggaacaag 
aagcagtcaa tttcttggag aagaaaatga aagaaaaccc atctttcaca tttgatgaaa 
ccgtgcagac tgccatatcg gctttgcaat ctgttcttca agaagacttc aaggctactg 
agattgaggt aggagtggtg agagcagaga acccggaatt ccgtgcattg acaacggagg 720 
agattgagga gcatttgaca gccattagtg aacgagactg atcaacttag tgaaacaagt 780 
gtgagttacg ttgCttcgct tctatcagac attcctctaa agtgaccact ctccatcgat 
cttttgtttg gtcttccttc cataatttaa tttacttagt caacccagca aacttgaaaa 
acaaaaatgc actgtaatgc ttaaggctgt gagagtttcc gagtcttttg actcttcaga 
aatgacacca actttactc 
(2) INFORMATION FOR SEQ ID NO: 315: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 252 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..252 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582998 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 315: 
Phe Arg Arg Ser Lys Lys Met Ser Arg Gly Ser Gly Ala Gly Tyr Asp 
15 10 15 

Arg His lie Thr lie Phe Ser Pro Glu Gly Arg Leu Phe Gin Val Glu 

20 25 30 

Tyr Ala Phe Lys Ala Val Lys Thr Ala Gly He Thr Ser He Gly Val 

35 40 45 

Arg Gly Lys Asp Ser Val Cys Val Val Thr Gin Lys Lys Val Pro Asp 

50 55 60 

Lys Leu Leu Asp Gin Ser Ser Val Thr His Leu Phe Pro He Thr Lys 
65 70 75 80 

Phe He Gly Leu Val Ala Thr Gly He. Thr Ala Asp Ala Arg Ser Leu 

85 90 95 

Val Gin Gin Ala Arg Asn Gin Ala Ala Glu Phe Arg Phe Thr Tyr Gly 

100 105 HO 

Tyr Glu Met Pro Val Asp He Leu Ala Lys Trp He Ala Asp Lys Ser 

115 120 125 

Gin Val Tyr Thr Gin His Ala Tyr Met Arg Pro Leu Gly Val Val Ala 

130 135 140 

Met Val Met Gly Val Asp Glu Glu Asn Gly Pro Leu Leu Tyr Lys Cys 
145 150 155 160 

Asp Pro Ala Gly His Phe Tyr Gly His Lys Ala Thr Ser Ala Gly Met 

165 170 175 

Lys Glu Gin Glu Ala Val Asn Phe Leu Glu Lys Lys Met Lys Glu Asn 

180 185 190 

Pro Ser Phe Thr Phe Asp Glu Thr Val Gin Thr Ala He Ser Ala Leu 

195 200 205 

Gin Ser Val Leu Gin Glu Asp Phe Lys Ala Thr Glu He Glu Val Gly 

210 215 220 

Val Val Arg Ala Glu Asn Pro Glu Phe Arg Ala Leu Thr Thr Glu Glu 
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225 230 235 240 

lie Glu Glu His Leu Thr Ala lie Ser Glu Arg Asp 

245 250 
(2) INFORMATION FOR SEQ ID NO: 316: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..24 6 

(D) OTHER INFORMATION: / Ceres Seq. ID 1582999 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 316: 
Met Ser Arg Gly Ser Gly Ala Gly Tyr Asp Arg His lie Thr lie Phe 
15 10 15 

Ser Pro Glu Gly Arg Leu Phe Gin Val Glu Tyr Ala Phe Lys Ala Val 

20 25 30 

Lys Thr Ala Gly He Thr Ser He Gly Val Arg Gly Lys Asp Ser Val 

35 40 45 

Cys Val Val Thr Gin Lys Lys Val Pro Asp Lys Leu Leu Asp Gin Ser 

50 55 60 

Ser Val Thr His Leu Phe Pro He Thr Lys Phe He Gly Leu Val Ala 
65 70 75 80 

Thr Gly He Thr Ala Asp Ala Arg Ser Leu Val Gin Gin Ala Arg Asn 

85 90 95 

Gin Ala Ala Glu Phe Arg Phe Thr Tyr Gly Tyr Glu Met Pro Val Asp 

100 105 HO 

He Leu Ala Lys Trp He Ala Asp Lys Ser Gin Val Tyr Thr Gin His 

115 120 125 

Ala Tyr Met Arg Pro Leu Gly Val Val Ala Met Val Met Gly Val Asp 

130 135 140 

Glu Glu Asn Gly Pro Leu Leu Tyr Lys Cys Asp Pro Ala Gly His Phe 
145 150 155 160 

Tyr Gly His Lys Ala Thr Ser Ala Gly Met Lys Glu Gin Glu Ala Val 

165 170 175 

Asn Phe Leu Glu Lys Lys Met Lys Glu Asn Pro Ser Phe Thr Phe Asp 

180 185 190 

Glu Thr Val Gin Thr Ala He Ser Ala Leu Gin Ser Val Leu Gin Glu 

195 200 205 

Asp Phe Lys Ala Thr Glu He Glu Val Gly Val Val Arg Ala Glu Asn 

210 215 220 

Pro Glu Phe Arg Ala Leu Thr Thr Glu Glu He Glu Glu His Leu Thr 
225 230 235 240 

Ala He Ser Glu Arg Asp 
245 

(2) INFORMATION FOR SEQ ID NO: 317: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 138 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..138 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583000 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 317: 
Met Pro Val Asp He Leu Ala Lys Trp He Ala Asp Lys Ser Gin Val 
15 10 15 
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T vr 


Thr 


Gin 


His 


Ala 


Tyr 


Met 


Arg 


Pro 


Leu 


Gly 


Val 


Val 


Ala 


Met 


Val 






20 










25 










30 






Met 


Gly 


Val 
35 


Asp 


Glu 


Glu 


Asn 


Gly 
40 


Pro 


Leu 


Leu 


Tyr 


Lys 
45 


Cys 


Asp 


Pro 


Ala 


Gly 
50 


His 


Phe 


Tyr 


Gly 


His 
55 


Lys 


Ala 


Thr 


Ser 


Ala 
60 


Gly 


Met 


Lys 


Glu 


Gin 


Glu 


Ala 


Val 


Asn 


Phe 


Leu 


Glu 


Lys 


Lys 


Met 


Lys 


Glu 


Asn 


Pro 


Ser 


65 










70 










75 










80 


Phe 


Thr 


Phe 


Asp 


Glu 

85 


Thr 


Val 


Gin 


Thr 


Ala 
90 


He 


Ser 


Ala 


Leu 


Gin 
95 


Ser 


Val 


Leu 


Gin 


Glu 
100 


Asp 


Phe 


Lys 


Ala 


Thr 
105 


Glu 


He 


Glu 


Val 


Gly 
110 


Val 


Val 


Arg 


Ala 


Glu 


Asn 


Pro 


Glu 


Phe 


Arg 


Ala 


Leu 


Thr 


Thr 


Glu 


Glu 


He 


Glu 




115 










120 










125 








Glu 


His 
130 


Leu 


Thr 


Ala 


He 


Ser 
135 


Glu 


Arg 


Asp 














(2) 


INFORMATION 


FOR 


SEQ 


ID NO:318: 

















{i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 929 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..929 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583044 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 318: 
attcctaagc aaatctcttc tctctcccat tcgtctccaa agagattaca gttttgagca 
tttctcatct ctcgaagctc ttctctgtgt gtgtggcgat ggctgctaat tcgataatgg 



60 
120 



cttcctccaa acccctaatc tccctgtcat ccaaccaaca accaaaccga gtccaaattc 180 
ccaaattcgc caaacttccc caaattccca aatccctcac ttcctccacc gatctccgta 240 
gcaaagcact atcactctcc tccgccaccg ccaaatcctt agctttaatc gccgctttcg 300 
ctcctccgtc gatggcggag gcgatggaga aagcacagct cttcgatttc aatctcacgc 360 
ttccgatcat cgttgttgag tttctcttct tgatgttcgc tctcgacaag gtctattact 420 
ctccgcttgg taacttcatg gatcaaagag acgcttccat caaagagaag ctcgcgagtg 480 
ttaaggacac ttcgactgaa gtaaaggagc tcgatgagca agccgccgcc gtaatgagag 540 
cagctagggc tgagatcgcc gccgcgctta acaagatgaa gaaggagact caggttgaag 600 
tcgaggagaa gctagcggag ggaaggaaga aggtggagga agagctaaaa gaagctttgg 660 
cgagcttgga gagtcagaaa gaagaaacca ttaaagcttt ggattctcag attgctgctc 720 
ttagtgaaga cattgtcaag aaggttcttc cttcttaaat tatatttttg ttaactgtgt 780 
aattctctgt ctctctatct caaaacttat ttacaagaaa ttactgtaaa tctcttcttc 840 
ttcttcttct ctgtttcttg gattgttcgt cgttcaaaga agaacaattt ttattttgta 900 
agtttataaa taattagctc tcttctgcc 
(2) INFORMATION FOR SEQ ID NO: 319: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 219 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..219 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583045 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:319: 
Met Ala Ala Asn Ser He Met Ala Ser Ser Lys Pro Leu He Ser Leu 
15 10 15 

Ser Ser Asn Gin Gin Pro Asn Arg Val Gin He Pro Lys Phe Ala Lys 

20 25 30 

Leu Pro Gin He Pro Lys Ser Leu Thr Ser Ser Thr Asp Leu Arg Ser 
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35 










40 










45 








Lys 


Ala 


Leu 


Ser 


Leu 


Ser 


Ser 


Ala 


Thr 


Ala 


Lys 


Ser 


Leu 


Ala 


Leu 


He 


50 










55 










60 










Ala 


Ala 


Phe 


Ala 


Pro 


Pro 


Ser 


Met 


Ala 


Glu 


Ala 


Met 


Glu 


Lys 


Ala 


Gin 


65 










70 










75 










80 


Leu 


Phe 


Asp 


Phe 


Asn 


Leu 


Thr 


Leu 


Pro 


He 


He 


Val 


Val 


Glu 


Phe 


Leu 








85 










90 










95 




Phe 


Leu 


Met 


Phe 


Ala 


Leu 


Asp 


Lys 


Val 


Tyr 


Tyr 


Ser 


Pro 


Leu 


Gly 


Asn 








100 










105 










110 






Phe 


Met 


Asp 


Gin 


Arg 


Asp 


Ala 


Ser 


He 


Lys 


Glu 


Lys 


Leu 


Ala 


Ser 


Val 






115 










120 










125 








Lys 


Asp 


Thr 


Ser 


Thr 


Glu 


Val 


Lys 


Glu 


Leu 


Asp 


Glu 


Gin 


Ala 


Ala 


Ala 




130 










135 










140 










Val 


Met 


Arg 


Ala 


Ala 


Arg 


Ala 


Glu 


He 


Ala 


Ala 


Ala 


Leu 


Asn 


Lys 


Met 


145 










150 










155 










160 


Lys 


Lys 


Glu 


Thr 


Gin 


Val 


Glu 


Val 


Glu 


Glu 


Lys 


Leu 


Ala 


Glu 


Gly 


Arg 








165 










170 










175 




Lys 


Lys 


Val 


Glu 


Glu 


Glu 


Leu 


Lys 


Glu 


Ala 


Leu 


Ala 


Ser 


Leu 


Glu 


Ser 








180 










185 










190 






Gin 


Lys 


Glu 


Glu 


Thr 


He 


Lys 


Ala 


Leu 


Asp 


Ser 


Gin 


He 


Ala 


Ala 


Leu 




195 










200 










205 








Ser 


Glu 


Asp 


He 


Val 


Lys 


Lys 


Val 


Leu 


Pro 


Ser 













210 215 
(2) INFORMATION FOR SEQ ID NO: 320: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 213 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..213 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583046 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 320: 



Met 


Ala 


Ser 


Ser 


Lys 


Pro 


Leu 


He 


Ser 


Leu 


Ser 


Ser 


Asn 


Gin 


Gin 


Pro 


1 








5 










10 










15 




Asn 


Arg 


Val 


Gin 


He 


Pro 


Lys 


Phe 


Ala 


Lys 


Leu 


Pro 


Gin 


He 


Pro 


Lys 








20 










25 










30 






Ser 


Leu 


Thr 


Ser 


Ser 


Thr 


Asp 


Leu 


Arg 


Ser 


Lys 


Ala 


Leu 


Ser 


Leu 


Ser 






35 










40 










45 








Ser 


Ala 


Thr 


Ala 


Lys 


Ser 


Leu 


Ala 


Leu 


He 


Ala 


Ala 


Phe 


Ala 


Pro 


Pro 




50 










55 










60 










Ser 


Met 


Ala 


Glu 


Ala 


Met 


Glu 


Lys 


Ala 


Gin 


Leu 


Phe 


Asp 


Phe 


Asn 


Leu 


65 










70 










75 










80 


Thr 


Leu 


Pro 


He 


He 


Val 


Val 


Glu 


Phe 


Leu 


Phe 


Leu 


Met 


Phe 


Ala 


Leu 










85 










90 










95 




Asp 


Lys 


Val 


Tyr 


Tyr 


Ser 


Pro 


Leu 


Gly 


Asn 


Phe 


Met 


Asp 


Gin 


Arg 


Asp 






100 










105 










110 






Ala 


Ser 


He 


Lys 


Glu 


Lys 


Leu 


Ala 


Ser 


Val 


Lys 


Asp 


Thr 


Ser 


Thr 


Glu 






115 










120 










125 








Val 


Lys 


Glu 


Leu 


Asp 


Glu 


Gin 


Ala 


Ala 


Ala 


Val 


Met 


Arg 


Ala 


Ala 


Arg 




130 










135 










140 










Ala 


Glu 


He 


Ala 


Ala 


Ala 


Leu 


Asn 


Lys 


Met 


Lys 


Lys 


Glu 


Thr 


Gin 


Val 


145 










150 










155 










160 


Glu 


Val 


Glu 


Glu 


Lys 


Leu 


Ala 


Glu 


Gly Arg 


Lys 


Lys 


Val 


Glu 


Glu 


Glu 










165 










170 










175 




Leu 


Lys 


Glu 


Ala 


Leu 


Ala 


Ser 


Leu 


Glu 


Ser 


Gin 


Lys 


Glu 


Glu 


Thr 


He 






180 










185 










190 






Lys 


Ala 


Leu 


Asp 


Ser 


Gin 


He 


Ala 


Ala 


Leu 


Ser 


Glu 


Asp 


He 


Val 


Lys 




195 










200 










205 
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Lys Val Leu Pro Ser 
210 

(2) INFORMATION FOR SEQ ID NO: 321: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..148 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583047 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 321: 
Met Ala Glu Ala Met Glu Lys Ala Gin Leu Phe Asp Phe Asn Leu Thr 
15 10 15 

Leu Pro lie lie Val Val Glu Phe Leu Phe Leu Met Phe Ala Leu Asp 

20 25 30 

Lys Val Tyr Tyr Ser Pro Leu Gly Asn Phe Met Asp Gin Arg Asp Ala 

35 40 45 

Ser lie Lys Glu Lys Leu Ala Ser Val Lys Asp Thr Ser Thr Glu Val 

50 55 60 

Lys Glu Leu Asp Glu Gin Ala Ala Ala Val Met Arg Ala Ala Arg Ala 
65 70 75 80 

Glu lie Ala Ala Ala Leu Asn Lys Met Lys Lys Glu Thr Gin Val Glu 

85 90 95 

Val Glu Glu Lys Leu Ala Glu Gly Arg Lys Lys Val Glu Glu Glu Leu 

100 105 110 

Lys Glu Ala Leu Ala Ser Leu Glu Ser Gin Lys Glu Glu Thr lie Lys 

115 120 125 

Ala Leu Asp Ser Gin lie Ala Ala Leu Ser Glu Asp lie Val Lys Lys 

130 135 140 

Val Leu Pro Ser 
145 

(2) INFORMATION FOR SEQ ID NO: 322: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1429 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1429 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583080 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 322: 
atcatctcat taacaaaaat aaaacacaca atctcaagat tttctacttc ttattacaaa 



60 



180 
240 
300 
360 
420 
480 



gattcaatct tcttgtttct tcttgcaacc atgagtcttc ttgcagatct tgttaacctt 120 
gacatctcag acaacagtga aaagatcatc gctgaataca tatgggttgg tggttctggt 
atggacatga gaagcaaagc caggactctc cctggacctg tgaccgatcc atcaaaactt 
ccaaagtgga actatgatgg ttcaagcact ggtcaagctc ctggtcaaga cagtgaagtg 
atcttatacc ctcaagcaat tttcaaagat ccattccgta gaggcaacaa catccttgtt 
atgtgtgatg cttacactcc agcgggagag ccaatcccta ctaacaagcg acatgctgcg 
gctgagatct ttgctaaccc tgatgttatt gctgaagtgc catggtatgg aatcgaacaa 
gaatacactt tgttgcagaa ggatgtgaac tggcctcttg gatggcccat tggtggcttc 540 
cctggccctc agggaccata ctactgcagt attggagctg acaaatcttt tggaagagac 600 
attgttgatg ctcactacaa agcctctttg tatgctggaa tcaacatcag tgggatcaat 660 
ggagaagtca tgccgggaca atgggagttc caagtcggcc catcggtcgg tatctcagct 720 
gctgatgaaa tatggatcgc tcgttacatt ttggagagga tcacagagat tgctggtgtg 780 
gttgtatctt ttgacccaaa acctattcct ggtgactgga atggagctgg tgctcacacc 840 
aattacagta ctaaatcaat gagggaagaa ggaggatacg agataatcaa gaaggcgatc 900 
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gagaagcttg gcttgagaca caaggaacac atttccgctt acggtgaagg aaacgagctt 
cgtctcacgg gacaccatga aactgctgac atcaacactt tcctttgggg tgttgcgaac 
cgtggtgcat cgatccgagt aggacgtgac accgagaaag aagggaaggg atactttgag 
gataggaggc cagcttcaaa catggaccct tacgttgtta cttccatgat tgcagagact 
acactcctct ggaacccttg aaaggatgat ccgtaactct tgaagctgct tctgattggg 
ttttttggaa gttccaagct tgtcttttct ctacagtgtg tattaagcaa ttgtaccggt 
tgacactgcc ggagtttgtg atttggggcc tttctttctt tttcttcttt ttataatctt 
ttgggttctg tggttagagc aaattcggtt tgctctgttt gtttgacctt tattgaaacc 
tttggtattt gtactaataa tacaatctga aaaggcctct tcatgttcc 
(2) INFORMATION FOR SEQ ID NO: 323: 



(i) 



<ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

( D ) TOPOLOGY : linear 



MOLECULE TYPE: 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 



ID 1583081 



(xi) 



He 

1 



peptide 
1. .386 

(D) OTHER INFORMATION: / Ceres Seq. 
SEQUENCE DESCRIPTION: SEQ ID NO:323: 
He Ser Leu Thr Lys He Lys His Thr He Ser Arg Phe Ser Thr 
5 10 15 

Ser Tyr Tyr Lys Asp Ser He Phe Leu Phe Leu Leu Ala Thr Met Ser 

20 25 30 

Leu Leu Ala Asp Leu Val Asn Leu Asp He Ser Asp Asn Ser Glu Lys 

35 40 45 

He He Ala Glu Tyr He Trp Val Gly Gly Ser Gly Met Asp Met Arg 

50 55 60 

Ser Lys Ala Arg Thr Leu Pro Gly Pro Val Thr Asp Pro Ser Lys Leu 
65 70 75 80 

Pro Lys Trp Asn Tyr Asp Gly Ser Ser Thr Gly Gin Ala Pro Gly Gin 



85 



90 



95 



Asp Ser Glu Val He Leu Tyr Pro Gin Ala He Phe Lys Asp Pro Phe 



100 



105 



110 



Arg Arg Gly Asn Asn He Leu Val Met Cys Asp Ala Tyr Thr Pro Ala 



115 



120 



125 



Gly Glu Pro He Pro Thr Asn Lys Arg His Ala Ala Ala Glu He 



Phe 



130 135 
Ala Asn Pro Asp Val 
145 150 



140 



He Ala Glu Val Pro Trp Tyr Gly He Glu Gin 



155 



160 



Glu Tyr Thr Leu Leu Gin Lys Asp Val Asn Trp Pro Leu Gly Trp Pro 



165 



170 



175 



He Gly Gly Phe Pro Gly Pro Gin Gly Pro Tyr Tyr Cys Ser He Gly 



Ala Asp Lys 

195 



180 
Ser 



185 



190 



Phe Gly Arg Asp He Val Asp Ala His Tyr Lys Ala 



200 



205 



Ser Leu Tyr Ala Gly He Asn He Ser Gly He Asn Gly Glu Val Met 



210 



215 



220 



Pro Gly Gin Trp Glu Phe Gin Val Gly Pro Ser Val Gly He Ser Ala 



225 



230 



235 



240 



Ala Asp Glu He Trp He Ala Arg Tyr He Leu Glu Arg He Thr Glu 



245 



250 



255 



He Ala Gly Val Val Val Ser Phe Asp Pro Lys Pro He Pro Gly Asp 



260 



265 



270 



Trp Asn Gly Ala Gly Ala His Thr Asn Tyr Ser Thr Lys Ser Met Arg 

275 280 285 

Glu Glu Gly Gly Tyr Glu He He Lys Lys Ala He Glu Lys Leu Gly 

290 295 300 

Leu Arg His Lys Glu His He Ser Ala Tyr Gly Glu Gly Asn Glu Leu 
305 310 315 320 



960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
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Arg Leu Thr Gly His His Glu Thr Ala Asp He Asn Thr Phe Leu Trp 

325 330 335 

Gly Val Ala Asn Arg Gly Ala Ser He Arg Val Gly Arg Asp Thr Glu 

340 345 350 

Lys Glu Gly Lys Gly Tyr Phe Glu Asp Arg Arg Pro Ala Ser Asn Met 

355 360 365 

Asp Pro Tyr Val Val Thr Ser Met lie Ala Glu Thr Thr Leu Leu Trp 

370 375 380 

Asn Pro 
385 

(2) INFORMATION FOR SEQ ID NO: 324: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..356 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583082 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 324: 
Met Ser Leu Leu Ala Asp Leu Val Asn Leu Asp He Ser Asp Asn Ser 
15 10 15 

Glu Lys He He Ala Glu Tyr He Trp Val Gly Gly Ser Gly Met Asp 

20 25 30 

Met Arg Ser Lys Ala Arg Thr Leu Pro Gly Pro Val Thr Asp Pro Ser 

35 40 45 

Lys Leu Pro Lys Trp Asn Tyr Asp Gly Ser Ser Thr Gly Gin Ala Pro 

50 55 60 

Gly Gin Asp Ser Glu Val He Leu Tyr Pro Gin Ala He Phe Lys Asp 
65 70 75 80 

Pro Phe Arg Arg Gly Asn Asn He Leu Val Met Cys Asp Ala Tyr Thr 

85 90 95 

Pro Ala Gly Glu Pro He Pro Thr Asn Lys Arg His Ala Ala Ala Glu 

100 105 HO 

He Phe Ala Asn Pro Asp Val He Ala Glu Val Pro Trp Tyr Gly He 

115 120 125 

Glu Gin Glu Tyr Thr Leu Leu Gin Lys Asp Val Asn Trp Pro Leu Gly 

130 135 140 

Trp Pro He Gly Gly Phe Pro Gly Pro Gin Gly Pro Tyr Tyr Cys Ser 
145 150 155 160 

He Gly Ala Asp Lys Ser Phe Gly Arg Asp He Val Asp Ala His Tyr 

165 170 175 

Lys Ala Ser Leu Tyr Ala Gly He Asn He Ser Gly He Asn Gly Glu 

180 185 190 

Val Met Pro Gly Gin Trp Glu Phe Gin Val Gly Pro Ser Val Gly He 

195 200 205 

Ser Ala Ala Asp Glu He Trp He Ala Arg Tyr He Leu Glu Arg He 

210 215 220 

Thr Glu He Ala Gly Val Val Val Ser Phe Asp Pro Lys Pro He Pro 
225 230 235 240 

Gly Asp Trp Asn Gly Ala Gly Ala His Thr Asn Tyr Ser Thr Lys Ser 

245 250 255 

Met Arg Glu Glu Gly Gly Tyr Glu He He Lys Lys Ala He Glu Lys 

260 265 270 

Leu Gly Leu Arg His Lys Glu His He Ser Ala Tyr Gly Glu Gly Asn 

275 280 285 

Glu Leu Arg Leu Thr Gly His His Glu Thr Ala Asp He Asn Thr Phe 

290 295 300 

Leu Trp Gly Val Ala Asn Arg Gly Ala Ser He Arg Val Gly Arg Asp 
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305 310 315 320 

Thr Glu Lys Glu Gly Lys Gly Tyr Phe Glu Asp Arg Arg Pro Ala Ser 

325 330 335 

Asn Met Asp Pro Tyr Val Val Thr Ser Met He Ala Glu Thr Thr Leu 

340 345 350 

Leu Trp Asn Pro 
355 

(2) INFORMATION FOR SEQ ID NO: 325: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..32 6 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583083 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 325: 



Met 


Asp 


Met 


Arg 


Ser 


Lys 


Ala 


Arg 


Thr 


Leu 


Pro 


Gly 


Pro 


Val 


Thr 


Asp 


1 








5 










10 










15 




Pro 


Ser 


Lys 


Leu 

20 


Pro 


Lys 


Trp 


Asn 


Tyr 

25 


Asp 


Gly 


Ser 


Ser 


Thr 
30 


Gly 


Gin 


Ala 


Pro 


Gly 
35 


Gin 


Asp 


Ser 


Glu 


Val 
40 


He 


Leu 


Tyr 


Pro 


Gin 

45 


Ala 


He 


Phe 


Lys 


Asp 


Pro 


Phe 


Arg 


Arg 


Gly Asn 


Asn 


He 


Leu 


Val 


Met 


Cys 


Asp 


Ala 




50 










55 










60 










Tyr 


Thr 


Pro 


Ala 


Gly 


Glu 


Pro 


He 


Pro 


Thr 


Asn 


Lys 


Arg 


His 


Ala 


Ala 


65 










70 










75 










80 


Ala 


Glu 


He 


Phe 


Ala 
85 


Asn 


Pro 


Asp 


Val 


He 
90 


Ala 


Glu 


Val 


Pro 


Trp 

95 


Tyr 


Gly 


He 


Glu 


Gin 
100 


Glu 


Tyr 


Thr 


Leu 


Leu 
105 


Gin 


Lys 


Asp 


Val 


Asn 
110 


Trp 


Pro 


Leu 


Gly 


Trp 
115 


Pro 


He 


Gly 


Gly 


Phe 
120 


Pro 


Gly 


Pro 


Gin 


Gly 
125 


Pro 


Tyr 


Tyr 


Cys 


Ser 


He 


Gly 


Ala 


Asp 


Lys 


Ser 


Phe 


Gly Arg 


Asp 


He 


Val 


Asp 


Ala 




130 










135 










140 










His 


Tyr 


Lys 


Ala 


Ser 


Leu 


Tyr 


Ala 


Gly 


He 


Asn 


He 


Ser 


Gly 


He 


Asn 


145 










150 










155 










160 


Gly 


Glu 


Val 


Met 


Pro 

165 


Gly 


Gin 


Trp 


Glu 


Phe 
170 


Gin 


Val 


Gly 


Pro 


Ser 
175 


Val 


Gly 


He 


Ser 


Ala 
180 


Ala 


Asp 


Glu 


He 


Trp 
185 


He 


Ala 


Arg 


Tyr 


He 
190 


Leu 


Glu 


Arg 


He 


Thr 


Glu 


He 


Ala 


Gly Val 


Val 


Val 


Ser 


Phe 


Asp 


Pro 


Lys 


Pro 






195 










200 










205 








He 


Pro 


Gly Asp 


Trp 


Asn 


Gly 


Ala 


Gly Ala 


His 


Thr 


Asn 


Tyr 


Ser 


Thr 




210 










215 










220 










Lys 


Ser 


Met 


Arg 


Glu 


Glu 


Gly 


Gly 


Tyr 


Glu 


He 


He 


Lys 


Lys 


Ala 


He 


225 










230 










235 










240 


Glu 


Lys 


Leu 


Gly 


Leu 
245 


Arg 


His 


Lys 


Glu 


His 
250 


He 


Ser 


Ala 


Tyr 


Gly 
255 


Glu 


Gly 


Asn 


Glu 


Leu 

260 


Arg 


Leu 


Thr 


Gly 


His 
265 


His 


Glu 


Thr 


Ala 


Asp 
270 


He 


Asn 


Thr 


Phe 


Leu 
275 


Trp 


Gly 


Val 


Ala 


Asn 

280 


Arg 


Gly 


Ala 


Ser 


He 
285 


Arg 


Val 


Gly 


Arg 


Asp 
290 


Thr 


Glu 


Lys 


Glu 


Gly 
295 


Lys 


Gly 


Tyr 


Phe 


Glu 
300 


Asp 


Arg 


Arg 


Pro 


Ala 


Ser 


Asn 


Met 


Asp 


Pro 


Tyr 


Val 


Val 


Thr 


Ser 


Met 


He 


Ala 


Glu 


Thr 


305 










310 










315 










320 


Thr 


Leu 


Leu 


Trp 


Asn 
325 


Pro 
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(2) INFORMATION FOR SEQ ID NO: 32 6: 
(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 1601 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..1601 

<D) OTHER INFORMATION: / Ceres Seq. ID 1583099 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 326: 



ccctcttttt aaacctattc aaagccaagg acaagaaaaa aaaaagagtc gcccattttc 60 

ttctcatttt tttttttgct cttgacgaag aaaccaaaaa aaaaaaaaat gagagagatc 120 

cttcatatcc aaggcggtca atgtggaaac cagatcggag caaagttctg ggaagtgatc 180 

tgcgacgaac acggcattga tcacaccggt caatacgtcg gcgattctcc gttacagctt 240 

gaacgtatcg atgtctattt caacgaagct agcggtggaa agtacgttcc tcgcgctgtt 300 

cttatggatc tggagcctgg taccatggat tctctcagat ctggtccgtt cggtcagatt 360 

ttccgtcctg ataacttcgt ctttggtcaa tctggtgccg gaaataactg ggcgaaaggt 420 

cattacaccg aaggtgctga gttgattgat tctgttctcg atgttgtgag gaaggaagct 480 

gagaacagcg attgtcttca aggtttccaa gtgtgtcatt cattgggagg aggaactgga 540 

tctggaatgg gaactctatt gatttctaag ataagagaag agtatccaga tcgtatgatg 600 

atgactttct cagtgtttcc ttctcctaag gtctctgaca ctgttgttga gccatacaat 660 

gcaactctct ctgtgcatca gcttgtcgaa aacgctgacg agtgtatggt tttggacaat 720 

gaggctctct acgatatctg tttccgtacc ctcaagctcg ctaatcctac ctttggtgat 7 80 

cttaaccatc tcatctctgc tacaatgagt ggtgttactt gctgtcttcg tttccctggt 840 

cagcttaact ctgaccttag gaaactcgct gtgaacctta tcccattccc aaggcttcac 900 

ttcttcatgg ttggtttcgc accattgaca tcgagaggat cacagcaata cagtgccttg 960 

agtgttcctg aactgaccca gcagatgtgg gatgcaaaga acatgatgtg tgctgctgac 1020 

cctcgtcatg gacgttactt gactgcatcc gctgtgttcc gtggaaagct gagcaccaaa 1080 

taggttgacg agcagatgat gaacattcag aacaagaact catcctactt tgtggaatgg 114 0 

atcccaaaca acgtcaagtc cagtgtctgt gatattgcac caaagggttt gaaaatggcg 1200 

tctactttca ttggtaactc aacctcaatc caggagatgt ttaggcgtgt gagcgaacag 1260 

ttcacagcta tgttcaggag aaaggctttc cttcattggt acacaggaga aggcatggac 1320 

gagatggagt tcactgaagc agagagtaac atgaatgatc ttgtcgcaga gtaccagcag 1380 

taccaagatg ctacagccgg agaggaggag tacgaggagg aagaagagga gtacgagact 1440 

taagatgttg tcaatggctc cctcggattc gtaagctgtg taagcaagca gctttcactt 1500 

tcttctttcc ccttatcctg aatttttttc ttcgtaatat ctcttttatt gtttcgttca 1560 



tgtgtgttcg ttttgttatt gaaaccctat atcggttctg c 
(2) INFORMATION FOR SEQ ID NO: 327: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 324 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 324 







(D) OTHER 


INFORMATION: 


/ Ceres 


Seq. ID 15E 


33100 






(xi) 


SEQUENCE DESCRIPTION: SEQ 


ID NO 


:327: 










Met 


Arg 


Glu lie Leu 


His He Gin 


Gly 


Gly 


Gin Cys 


Gly Asn 


Gin 


He 


1 




5 






10 








15 




Gly 


Ala 


Lys Phe Trp 


Glu Val He 


Cys 


Asp 


Glu His 


Gly 


He 


Asp 


His 






20 




25 








30 






Thr 


Gly 


Gin Tyr Val 


Gly Asp Ser 


Pro 


Leu 


Gin Leu 


Glu 


Arg 


He 


Asp 






35 


40 








45 








Val 


Tyr 


Phe Asn Glu 


Ala Ser Gly 


Gly 


Lys 


Tyr Val 


Pro 


Arg 


Ala 


Val 




50 




55 






60 










Leu 


Met 


Asp Leu Glu 


Pro Gly Thr 


Met 


Asp 


Ser Leu 


Arg 


Ser 


Gly 


Pro 


65 






70 






75 








80 
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Phe 


Gly 


Gin 


He 


Phe 


Arg 


Pro 


Asp 


Asn 


Phe 


Val 


Phe 


Gly Gin 


Ser 


Gly 










85 










90 










95 




Ala 


Gly 


Asn 


Asn 


Trp 


Ala 


Lys 


Gly 


His 


Tyr 


Thr 


Glu 


Gly Ala 


Glu 


Leu 








100 










105 










110 






He 


Asp 


Ser 


Val 


Leu 


Asp 


Val 


Val 


Arg 


Lys 


Glu 


Ala 


Glu 


Asn 


Ser 


Asp 






115 










120 










125 








Cys 


Leu 


Gin 


Gly 


Phe 


Gin 


Val 


Cys 


His 


Ser 


Leu 


Gly 


Gly 


Gly 


Thr 


Gly 




130 










135 










140 










Ser 


Gly 


Met 


Gly 


Thr 


Leu 


Leu 


He 


Ser 


Lys 


He 


Arg 


Glu 


Glu 


Tyr 


Pro 


145 










150 










155 










160 


Asp 


Arg 


Met 


Met 


Met 


Thr 


Phe 


Ser 


Val 


Phe 


Pro 


Ser 


Pro 


Lys 


Val 


Ser 










165 










170 










175 




Asp 


Thr 


Val 


Val 


Glu 


Pro 


Tyr 


Asn 


Ala 


Thr 


Leu 


Ser 


Val 


His 


Gin 


Leu 








180 










185 










190 






Val 


Glu 


Asn 


Ala 


Asp 


Glu 


Cys 


Met 


Val 


Leu 


Asp 


Asn 


Glu 


Ala 


Leu 


Tyr 






195 










200 










205 








Asp 


He 


Cys 


Phe 


Arg 


Thr 


Leu 


Lys 


Leu 


Ala 


Asn 


Pro 


Thr 


Phe 


Gly Asp 




210 










215 










220 










Leu 


Asn 


His 


Leu 


lie 


Ser 


Ala 


Thr 


Met 


Ser 


Gly Val 


Thr 


Cys 


Cys 


Leu 


225 










230 










235 










240 


Arg 


Phe 


Pro 


Gly 


Gin 


Leu 


Asn 


Ser 


Asp 


Leu 


Arg 


Lys 


Leu 


Ala 


Val 


Asn 










245 










250 










255 




Leu 


He 


Pro 


Phe 


Pro 


Arg 


Leu 


His 


Phe 


Phe 


Met 


Val 


Gly 


Phe 


Ala 


Pro 








260 










265 










270 






Leu 


Thr 


Ser 


Arg 


Gly 


Ser 


Gin 


Gin 


Tyr 


Ser 


Ala 


Leu 


Ser 


Val 


Pro 


Glu 






275 










280 










285 








Leu 


Thr 


Gin 


Gin 


Met 


Trp 


Asp 


Ala 


Lys 


Asn 


Met 


Met 


Cys 


Ala 


Ala 


Asp 




290 










295 










300 










Pro 


Arg 


His 


Gly 


Arg 


Tyr 


Leu 


Thr 


Ala 


Ser 


Ala 


Val 


Phe 


Arg 


Gly 


Lys 


305 










310 










315 










320 


Leu 


Ser 


Thr 


Lys 



























(2) INFORMATION FOR SEQ ID NO: 328: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 259 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..259 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583101 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 


:326 


i : 










Met 


Asp 


Leu 


Glu 


Pro 


Gly 


Thr 


Met 


Asp 


Ser 


Leu 


Arg 


Ser 


Gly 


Pro 


Phe 


1 








5 










10 










15 




Gly 


Gin 


He 


Phe 


Arg 


Pro 


Asp 


Asn 


Phe 


Val 


Phe 


Gly 


Gin 


Ser 


Gly 


Ala 








20 










25 










30 






Gly 


Asn 


Asn 


Trp 


Ala 


Lys 


Gly 


His 


Tyr 


Thr 


Glu 


Gly 


Ala 


Glu 


Leu 


He 






35 










40 










45 








Asp 


Ser 


Val 


Leu 


Asp 


Val 


Val 


Arg 


Lys 


Glu 


Ala 


Glu 


Asn 


Ser 


Asp 


Cys 




50 










55 










60 










Leu 


Gin 


Gly 


Phe 


Gin 


Val 


Cys 


His 


Ser 


Leu 


Gly 


Gly 


Gly 


Thr 


Gly 


Ser 


65 










70 










75 










80 


Gly 


Met 


Gly 


Thr 


Leu 


Leu 


He 


Ser 


Lys 


He 


Arg 


Glu 


Glu 


Tyr 


Pro 


As P 










85 










90 










95 




Arg 


Met 


Met 


Met 


Thr 


Phe 


Ser 


Val 


Phe 


Pro 


Ser 


Pro 


Lys 


Val 


Ser 


Asp 






100 










105 










110 






Thr 


Val 


Val 


Glu 


Pro 


Tyr 


Asn 


Ala 


Thr 


Leu 


Ser 


Val 


His 


Gin 


Leu 


Val 






115 










120 










125 








Glu 


Asn 


Ala 


Asp 


Glu 


Cys 


Met 


Val 


Leu 


Asp 


Asn 


Glu 


Ala 


Leu 


Tyr 


Asp 
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130 










135 










140 










He 


Cys 


Phe 


Arg 


Thr 


Leu 


Lys 


Leu 


Ala 


Asn 


Pro 


Thr 


Phe 


Gly Asp 


Leu 


145 










150 










155 










160 


Asn 


His 


Leu 


He 


Ser 
165 


Ala 


Thr 


Met 


Ser 


Gly 
170 


Val 


Thr 


Cys 


Cys 


Leu 
175 


Arg 


Phe 


Pro 


Gly 


Gin 
180 


Leu 


Asn 


Ser 


Asp 


Leu 
185 


Arg 


Lys 


Leu 


Ala 


Val 
190 


Asn 


Leu 


He 


Pro 


Phe 
195 


Pro 


Arg 


Leu 


His 


Phe 
200 


Phe 


Met 


Val 


Gly 


Phe 
205 


Ala 


Pro 


Leu 


Thr 


Ser 
210 


Arg 


Gly 


Ser 


Gin 


Gin 
215 


Tyr 


Ser 


Ala 


Leu 


Ser 
220 


Val 


Pro 


Glu 


Leu 


Thr 


Gin 


Gin 


Met 


Trp 


Asp 


Ala 


Lys 


Asn 


Met 


Met 


Cys 


Ala 


Ala 


Asp 


Pro 


225 










230 










235 










240 


Arg 


His 


Gly 


Arg 


Tyr 
245 


Leu 


Thr 


Ala 


Ser 


Ala 
250 


Val 


Phe 


Arg 


Gly 


Lys 
255 


Leu 


Ser 


Thr 


Lys 





























(2) INFORMATION FOR SEQ ID NO: 32 9: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 252 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..252 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583102 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 329: 



Met 


Asp 


Ser 


Leu 


Arg 


Ser 


Gly 


Pro 


Phe 


Gly 


Gin 


He 


Phe 


Arg 


Pro 


Asp 


1 








5 










10 










15 




Asn 


Phe 


Val 


Phe 


Gly 


Gin 


Ser 


Gly Ala 


Gly Asn 


Asn 


Trp 


Ala 


Lys 


Gly 








20 










25 










30 






His 


Tyr 


Thr 


Glu 


Gly 


Ala 


Glu 


Leu 


He 


Asp 


Ser 


Val 


Leu 


Asp 


Val 


Val 






35 










40 










45 








Arg 


Lys 


Glu 


Ala 


Glu 


Asn 


Ser 


Asp 


Cys 


Leu 


Gin 


Gly 


Phe 


Gin 


Val 


Cys 




50 










55 










60 










His 


Ser 


Leu 


Gly 


Gly 


Gly 


Thr 


Gly 


Ser 


Gly Met 


Gly 


Thr 


Leu 


Leu 


He 


65 










70 










75 










80 


Ser 


Lys 


He 


Arg 


Glu 


Glu 


Tyr 


Pro 


Asp 


Arg 


Met 


Met 


Met 


Thr 


Phe 


Ser 










85 










90 










95 




Val 


Phe 


Pro 


Ser 


Pro 


Lys 


Val 


Ser 


Asp 


Thr 


Val 


Val 


Glu 


Pro 


Tyr 


Asn 








100 










105 










110 






Ala 


Thr 


Leu 


Ser 


Val 


His 


Gin 


Leu 


Val 


Glu 


Asn 


Ala 


Asp 


Glu 


Cys 


Met 






115 










120 










125 








Val 


Leu 


Asp 


Asn 


Glu 


Ala 


Leu 


Tyr 


Asp 


He 


Cys 


Phe 


Arg 


Thr 


Leu 


Lys 




130 










135 










140 










Leu 


Ala 


Asn 


Pro 


Thr 


Phe 


Gly 


Asp 


Leu 


Asn 


His 


Leu 


He 


Ser 


Ala 


Thr 


145 










150 










155 










160 


Met 


Ser 


Gly 


Val 


Thr 


Cys 


Cys 


Leu 


Arg 


Phe 


Pro 


Gly 


Gin 


Leu 


Asn 


Ser 










165 










170 










175 




Asp 


Leu 


Arg 


Lys 


Leu 


Ala 


Val 


Asn 


Leu 


He 


Pro 


Phe 


Pro 


Arg 


Leu 


His 






180 










185 










190 






Phe 


Phe 


Met 


Val 


Gly 


Phe 


Ala 


Pro 


Leu 


Thr 


Ser 


Arg 


Gly 


Ser 


Gin 


Gin 






195 










200 










205 








Tyr 


Ser 


Ala 


Leu 


Ser 


Val 


Pro 


Glu 


Leu 


Thr 


Gin 


Gin 


Met 


Trp 


Asp 


Ala 


210 










215 










220 










Lys 


Asn 


Met 


Met 


Cys 


Ala 


Ala 


Asp 


Pro 


Arg 


His 


Gly Arg 


Tyr 


Leu 


Thr 


225 










230 










235 










240 


Ala 


Ser 


Ala 


Val 


Phe 


Arg 


Gly 


Lys 


Leu 


Ser 


Thr 


Lys 











245 250 
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(2) INFORMATION FOR SEQ ID NO: 330: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1321 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..1321 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583160 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 330: 
ctaacttctc tgttcatctt tttctctctt tatttataaa tttatctgca tagtactctc 
tgaatctata tcttcaaaaa aaaaaaacgt ccaagatcaa atcaagaaac ccattaaaaa 
aaaaaatcag gttttggttt cagttttaag ggtttaatgt ttcttgggga agaaacgatg 
gagacttttt gtgggtttca aaaggaggaa gagcagatgg atttacctcc tgggttcagg 240 
tttcatccaa cagatgaaga actcataact cactatctcc ataagaaggt tcttgacacc 
agcttctcag ctaaagctat cggtgaagtt gatttaaaca aatcagagcc atgggagtta 
ccatggatgg caaaaatggg tgagaaagaa tggtattttt tctgtgtgag agacagaaag 420 

480 
540 
600 
660 



60 
120 
180 



300 
360 



tatcccaccg gtttaagaac taaccgagca actgaagccg gttattggaa ggcgaccggg 

aaggataaag agatataccg aggcaaatca cttgttggga tgaagaagac acttgttttc 

tatagaggaa gagctcctaa aggtcagaaa accaactggg tgatgcatga gtacaggctt 

gaaggaaaat tctctgccca taacttgccg aaaaccgcaa agaatgaatg ggtgatatgc 

aqqgtgttcc aaaagagtgc tggagggaag aagatcccga tttcgagtct aatccgaatc 720 

.... . 78Q 

840 
900 
960 



ggttcactcg gaaccgactt taacccttcg cttttgccct ctttaaccga ttcttcgcct 
tacaacgata aaaccaaaac agaaccggtc tacgtgccct gcttctccaa ccaaacggat 
caaaaccaag gaaccacact caattgcttc agcagccctg ttcttaactc gatccaagcc 
gacatttttc acaggattcc actctatcaa actcagtccc tccaggtttc tatgaatcta 

cagagcccgg ttctcacgca agaacactca gttctacatg ctatgatcga gaacaacaga 1020 

agacaaagtc tcaaaacgat gagtgtctca caagaaaccg gagtttcaac tgacatgaac 1080 

actgatatct catcggattt tgaatttggt aagagacggt ttgattctca agaagatccg 1140 

tcttcctcta ctggaccggt tgatcttgaa cctttctgga attactgaag atgattcaag 1200 

attctcatgt ccattaattt actgtggtgt gttaaagttt gtataggcta ttgtcatata 12 60 

ctctcatatc aacttccact atatattata acaatttaaa gaaacttaaa aatatgattt 1320 

g 

(2) INFORMATION FOR SEQ ID NO: 331: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 343 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..343 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583161 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 331: 
Met Phe Leu Gly Glu Glu Thr Met Glu Thr Phe Cys Gly Phe Gin Lys 
15 10 15 

Glu Glu Glu Gin Met Asp Leu Pro Pro Gly Phe Arg Phe His Pro Thr 

20 25 30 

Asp Glu Glu Leu He Thr His Tyr Leu His Lys Lys Val Leu Asp Thr 

35 40 45 

Ser Phe Ser Ala Lys Ala He Gly Glu Val Asp Leu Asn Lys Ser Glu 

50 55 60 

Pro Trp Glu Leu Pro Trp Met Ala Lys Met Gly Glu Lys Glu Trp Tyr 
65 70 75 80 

Phe Phe Cys Val Arg Asp Arg Lys Tyr Pro Thr Gly Leu Arg Thr Asn 

85 90 95 

Arg Ala Thr Glu Ala Gly Tyr Trp Lys Ala Thr Gly Lys Asp Lys Glu 
100 105 HO 
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lie 


Tyr 


Arg 


Gly 


Lys 


Ser 


Leu 


Val 


Gly 


Met 


Lys 


Lys 


Thr 


Leu 


Val 


Phe 






115 










120 










125 








Tyr 


Arg 


Gly 


Arg 


Ala 


Pro 


Lys 


Gly 


Gin 


Lys 


Thr 


Asn 


Trp 


Val 


Met 


His 




130 










135 










140 










Glu 


Tyr 


Arg 


Leu 


Glu 


Gly 


Lys 


Phe 


Ser 


Ala 


His 


Asn 


Leu 


Pro 


Lys 


Thr 


145 










150 










155 










160 


Ala 


Lys 


Asn 


Glu 


Trp 


Val 


He 


Cys 


Arg 


Val 


Phe 


Gin 


Lys 


Ser 


Ala 


Gly 










165 










170 










175 




Gly 


Lys 


Lys 


lie 


Pro 


lie 


Ser 


Ser 


Leu 


He 


Arg 


He 


Gly 


Ser 


Leu 


Gly 








180 










185 










190 






Thr 


Asp 


Phe 


Asn 


Pro 


Ser 


Leu 


Leu 


Pro 


Ser 


Leu 


Thr 


Asp 


Ser 


Ser 


Pro 






195 










200 










205 








Tyr 


Asn 


Asp 


Lys 


Thr 


Lys 


Thr 


Glu 


Pro 


Val 


Tyr 


Val 


Pro 


Cys 


Phe 


Ser 




210 










215 










220 










Asn 


Gin 


Thr 


Asp 


Gin 


Asn 


Gin 


Gly 


Thr 


Thr 


Leu 


Asn 


Cys 


Phe 


Ser 


Ser 


225 










230 










235 










240 


Pro 


Val 


Leu 


Asn 


Ser 


He 


Gin 


Ala 


Asp 


He 


Phe 


His 


Arg 


He 


Pro 


Leu 










245 










250 










255 




Tyr 


Gin 


Thr 


Gin 


Ser 


Leu 


Gin 


Val 


Ser 


Met 


Asn 


Leu 


Gin 


Ser 


Pro 


Val 






260 










265 










270 






Leu 


Thr 


Gin 


Glu 


His 


Ser 


Val 


Leu 


His 


Ala 


Met 


He 


Glu 


Asn 


Asn 


Arg 






275 










280 










285 








Arg 


Gin 


Ser 


Leu 


Lys 


Thr 


Met 


Ser 


Val 


Ser 


Gin 


Glu 


Thr 


Gly 


Val 


Ser 




290 










295 










300 










Thr 


Asp 


Met 


Asn 


Thr 


Asp 


He 


Ser 


Ser 


Asp 


Phe 


Glu 


Phe 


Gly 


Lys 


Arg 


305 










310 










315 










320 


Arg 


Phe 


Asp 


Ser 


Gin 


Glu 


Asp 


Pro 


Ser 


Ser 


Ser 


Thr 


Gly 


Pro 


Val 


Asp 










325 










330 










335 




Leu 


Glu 


Pro 


Phe 


Trp 


Asn 


Tyr 





















340 



(2) INFORMATION FOR SEQ ID NO: 332: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 336 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..336 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583162 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 332: 



Met 


Glu 


Thr 


Phe 


Cys 


Gly 


Phe 


Gin 


Lys 


Glu 


Glu 


Glu 


Gin 


Met 


Asp 


Leu 


1 








5 










10 










15 




Pro 


Pro 


Gly 


Phe 


Arg 


Phe 


His 


Pro 


Thr 


Asp 


Glu 


Glu 


Leu 


He 


Thr 


His 








20 










25 










30 






Tyr 


Leu 


His 


Lys 


Lys 


Val 


Leu 


Asp 


Thr 


Ser 


Phe 


Ser 


Ala 


Lys 


Ala 


He 






35 










40 










45 








Gly 


Glu 


Val 


Asp 


Leu 


Asn 


Lys 


Ser 


Glu 


Pro 


Trp 


Glu 


Leu 


Pro 


Trp 


Met 




50 










55 










60 










Ala 


Lys 


Met 


Gly 


Glu 


Lys 


Glu 


Trp 


Tyr 


Phe 


Phe 


Cys 


Val 


Arg 


Asp 


Arg 


65 










70 










75 










80 


Lys 


Tyr 


Pro 


Thr 


Gly 


Leu 


Arg 


Thr 


Asn 


Arg 


Ala 


Thr 


Glu 


Ala 


Gly 


Tyr 










85 










90 










95 




Trp 


lys 


Ala 


Thr 


Gly 


Lys 


Asp 


Lys 


Glu 


He 


Tyr 


Arg 


Gly 


Lys 


Ser 


Leu 








100 










105 










110 






Val 


Gly 


Met 


Lys 


Lys 


Thr 


Leu 


Val 


Phe 


Tyr 


Arg 


Gly 


Arg 


Ala 


Pro 


Lys 






115 










120 










125 








Gly 


Gin 


Lys 


Thr 


Asn 


Trp 


Val 


Met 


His 


Glu 


Tyr 


Arg 


Leu 


Glu 


Gly 


Lys 




130 










135 










140 










Phe 


Ser 


Ala 


His 


Asn 


Leu 


Pro 


Lys 


Thr 


Ala 


Lys 


Asn 


Glu 


Trp 


Val 


He 
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145 










150 










155 










160 


Cys 


Arg 


Val 


Phe 


Gin 


Lys 


Ser 


Ala 


Gly 


Gly 


Lys 


Lys 


He 


Pro 


He 


Ser 








165 










170 










175 




Ser 


Leu 


He 


Arg 
180 


He 


Gly 


Ser 


Leu 


Gly 
185 


Thr 


Asp 


Phe 


Asn 


Pro 
190 


Ser 


Leu 


Leu 


Pro 


Ser 
195 


Leu 


Thr 


Asp 


Ser 


Ser 

200 


Pro 


Tyr 


Asn 


Asp 


Lys 
205 


Thr 


Lys 


Thr 


Glu 


Pro 
210 


Val 


Tyr 


Val 


Pro 


Cys 

215 


Phe 


Ser 


Asn 


Gin 


Thr 
220 


Asp 


Gin 


Asn 


Gin 


Gly 


Thr 


Thr 


Leu 


Asn 


Cys 


Phe 


Ser 


Ser 


Pro 


Val 


Leu 


Asn 


Ser 


He 


Gin 


225 










230 










235 










240 


Ala 


Asp 


He 


Phe 


His 
245 


Arg 


He 


Pro 


Leu 


Tyr 
250 


Gin 


Thr 


Gin 


Ser 


Leu 
255 


Gin 


Val 


Ser 


Met 


Asn 
260 


Leu 


Gin 


Ser 


Pro 


Val 
265 


Leu 


Thr 


Gin 


Glu 


His 
270 


Ser 


Val 


Leu 


His 


Ala 
275 


Met 


He 


Glu 


Asn 


Asn 
280 


Arg 


Arg 


Gin 


Ser 


Leu 

285 


Lys 


Thr 


Met 


Ser 


Val 
290 


Ser 


Gin 


Glu 


Thr 


Gly 
295 


Val 


Ser 


Thr 


Asp 


Met 
300 


Asn 


Thr 


Asp 


He 


Ser 


Ser 


Asp 


Phe 


Glu 


Phe 


Gly 


Lys 


Arg 


Arg 


Phe 


Asp 


Ser 


Gin 


Glu 


Asp 


305 










310 










315 










320 


Pro 


Ser 


Ser 


Ser 


Thr 
325 


Gly 


Pro 


Val 


Asp 


Leu 

330 


Glu 


Pro 


Phe 


Trp 


Asn 
335 


Tyr 



(2) INFORMATION FOR SEQ ID NO: 333: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 323 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..323 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583163 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 333: 



Met 


Asp 


Leu 


Pro 


Pro 


Gly 


Phe 


Arg 


Phe 


His 


Pro 


Thr 


Asp 


Glu 


Glu 


Leu 


1 








5 










10 










15 




He 


Thr 


His 


Tyr 

20 


Leu 


His 


Lys 


Lys 


Val 
25 


Leu 


Asp 


Thr 


Ser 


Phe 
30 


Ser 


Ala 


Lys 


Ala 


He 
35 


Gly 


Glu 


Val 


Asp 


Leu 
40 


Asn 


Lys 


Ser 


Glu 


Pro 
45 


Trp 


Glu 


Leu 


Pro 


Trp 
50 


Met 


Ala 


Lys 


Met 


Gly 
55 


Glu 


Lys 


Glu 


Trp 


Tyr 
60 


Phe 


Phe 


Cys 


Val 


Arg 


Asp 


Arg 


Lys 


Tyr 


Pro 


Thr 


Gly 


Leu 


Arg 


Thr 


Asn 


Arg 


Ala 


Thr 


Glu 


65 










70 










75 










80 


Ala 


Gly 


Tyr 


Trp 


Lys 
85 


Ala 


Thr 


Gly 


Lys 


Asp 
90 


Lys 


Glu 


He 


Tyr 


Arg 
95 


Gly 


Lys 


Ser 


Leu 


Val 


Gly 


Met 


Lys 


Lys 


Thr 


Leu 


Val 


Phe 


Tyr 


Arg 


Gly 


Arg 






100 










105 










110 






Ala 


Pro 


Lys 
115 


Gly 


Gin 


Lys 


Thr 


Asn 
120 


Trp 


Val 


Met 


His 


Glu 
125 


Tyr 


Arg 


Leu 


Glu 


Gly 
130 


Lys 


Phe 


Ser 


Ala 


His 
135 


Asn 


Leu 


Pro 


Lys 


Thr 
140 


Ala 


Lys 


Asn 


Glu 


Trp 


Val 


He 


Cys 


Arg 


Val 


Phe 


Gin 


Lys 


Ser 


Ala 


Gly 


Gly 


Lys 


Lys 


He 


145 










150 










155 










160 


Pro 


He 


Ser 


Ser 


Leu 
165 


He 


Arg 


He 


Gly 


Ser 
170 


Leu 


Gly 


Thr 


Asp 


Phe 
175 


Asn 


Pro 


Ser 


Leu 


Leu 
180 


Pro 


Ser 


Leu 


Thr 


Asp 
185 


Ser 


Ser 


Pro 


Tyr 


Asn 
190 


Asp 


Lys 
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Thr 


Lys 


Thr 


Glu 


Pro 


Val 


Tyr 


Val 


Pro 


Cys 


Phe 


Ser 


Asn 


Gin 


Thr 


Asp 






195 










200 










205 








Gin 


Asn 


Gin 


Gly 


Thr 


Thr 


Leu 


Asn 


Cys 


Phe 


Ser 


Ser 


Pro 


Val 


Leu 


Asn 




210 










215 










220 










Ser 


He 


Gin 


Ala 


Asp 


He 


Phe 


His 


Arg 


He 


Pro 


Leu 


Tyr 


Gin 


Thr 


Gin 


225 










230 










235 










240 


Ser 


Leu 


Gin 


Val 


Ser 


Met 


Asn 


Leu 


Gin 


Ser 


Pro 


Val 


Leu 


Thr 


Gin 


Glu 










245 










250 










255 




His 


Ser 


Val 


Leu 


His 


Ala 


Met 


He 


Glu 


Asn 


Asn 


Arg 


Arg 


Gin 


Ser 


Leu 








260 










265 










270 






Lys 


Thr 


Met 


Ser 


Val 


Ser 


Gin 


Glu 


Thr 


Gly 


Val 


Ser 


Thr 


Asp 


Met 


Asn 






275 










280 










285 








Thr 


Asp 


He 


Ser 


Ser 


Asp 


Phe 


Glu 


Phe 


Gly 


Lys 


Arg 


Arg 


Phe 


Asp 


Ser 




290 










295 










300 










Gin 


Glu 


Asp 


Pro 


Ser 


Ser 


Ser 


Thr 


Gly 


Pro 


Val 


Asp 


Leu 


Glu 


Pro 


Phe 


305 










310 










315 










320 



Trp Asn Tyr 

(2) INFORMATION FOR SEQ ID NO: 334: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 95 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..959 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583171 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 334: 



aaaacaaaaa gattaaacaa agagagaaga atatggcgag agggaagatc cagatcaaga 60 

ggatagagaa ccagacaaac agacaagtga cgtattcaaa gagaagaaat ggtttattca 120 

agaaagcaca tgagctcacg gttttgtgtg atgctagggt ttcgattatc atgttctcta 180 

gctccaacaa gcttcatgag tatatcagcc ctaacaccac aacgaaggag atcgtagatc 240 

tgtaccaaac tatttctgat gtcgatgttt gggccactca atatgagcga atgcaagaaa 300 

ccaagaggaa actgttggag acaaatagaa atctccggac tcagatcaag cagaggctag 360 

gtgagtgttt gaacaagctt gacattcagg agctgcgtcg tcttgaggat gaaatggaaa 420 

acactttcaa actcgttcgc gagcgcaagt tcaaatctct tgggaatcag atcgagacca 480 

ccaagaaaaa gaacaaaagt caacaagaca tacaaaagaa tctcatacat gagctggaac 54 0 

taagagctga agatcctcac tatggactag tagacaatgg aggagattac gactcagttc 600 

ttggatacca aatcgaaggg tcacgtgctt acgctcttcg tttccaccag aaccatcacc 660 

actattaccc caaccatggc cttcatgcac cctctgcctc tgacatcatt accttccatc 720 

ttcttgaata attaaaggct aaaaggtttg ctggtgccat cattgtctat ctaattattt 780 

agtaactact taaaacataa ggcatggtgt tgctaaaacc ttaaactgtc atgtttctta 840 

gttatgtatt ttaaagccta aagaaatatg gattgtgtga tcagtagtgc ttaggcttat 900 



tgtgtgtgga atgttttcaa gacttttatc atgtatcgta ttattatatt gaccactcc 
(2) INFORMATION FOR SEQ ID NO: 335: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 242 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1 . . 242 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583172 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 335: 
Asn Lys Lys He Lys Gin Arg Glu Lys Asn Met Ala Arg Gly Lys He 
15 10 15 

Gin He Lys Arg He Glu Asn Gin Thr Asn Arg Gin Val Thr Tyr Ser 
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20 










25 










30 






Lys 


Arg 


Arg 


Asn 


Gly 


Leu 


Phe 


Lys 


Lys 


Ala 


His 


Glu 


Leu 


Thr 


Val 


Leu 






35 










40 










45 








c y s 


Asp 


Ala 


Arg 


Val 


Ser 


He 


He 


Met 


Phe 


Ser 


Ser 


Ser 


Asn 


Lys 


Leu 




50 










55 










60 










His 


Glu 


Tyr 


He 


Ser 


Pro 


Asn 


Thr 


Thr 


Thr 


Lys 


Glu 


He 


Val 


Asp 


Leu 


65 










70 










75 










80 


Tyr 


Gin 


Thr 


He 


Ser 


Asp 


Val 


Asp 


Val 


Trp 


Ala 


Thr 


Gin 


Tyr 


Glu 


Arg 










85 










90 










95 




Met 


Gin 


Glu 


Thr 


Lys 


Arg 


Lys 


Leu 


Leu 


Glu 


Thr 


Asn 


Arg 


Asn 


Leu 


Arg 








100 










105 










110 






Thr 


Gin 


He 


Lys 


Gin 


Arg 


Leu 


Gly 


Glu 


Cys 


Leu 


Asn 


Lys 


Leu 


Asp 


He 






115 










120 










125 








Gin 


Glu 


Leu 


Arg 


Arg 


Leu 


Glu 


Asp 


Glu 


Met 


Glu 


Asn 


Thr 


Phe 


Lys 


Leu 




130 










135 










140 










Val 


Arg 


Glu 


Arg 


Lys 


Phe 


Lys 


Ser 


Leu 


Gly Asn 


Gin 


He 


Glu 


Thr 


Thr 


145 










150 










155 










160 


Lys 


Lys 


Lys 


Asn 


Lys 


Ser 


Gin 


Gin 


Asp 


He 


Gin 


Lys 


Asn 


Leu 


He 


His 










165 










170 










175 




Glu 


Leu 


Glu 


Leu 


Arg 


Ala 


Glu 


Asp 


Pro 


His 


Tyr 


Gly 


Leu 


Val 


Asp 


Asn 








180 










185 










190 






Gly 


Gly 


Asp 


Tyr 


Asp 


Ser 


Val 


Leu 


Gly 


Tyr 


Gin 


He 


Glu 


Gly 


Ser 


Arg 






195 










200 










205 








Ala 


Tyr 


Ala 


Leu 


Arg 


Phe 


His 


Gin 


Asn 


His 


His 


His 


Tyr 


Tyr 


Pro 


Asn 




210 










215 










220 










His 


Gly 


Leu 


His 


Ala 


Pro 


Ser 


Ala 


Ser 


Asp 


He 


He 


Thr 


Phe 


His 


Leu 


225 










230 










235 










240 



Leu Glu 



(2) INFORMATION FOR SEQ ID NO: 336: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 232 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: H.232 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583173 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 336: 



Met 


Ala 


Arg 


Gly 


Lys 


He 


Gin 


He 


Lys 


Arg 


He 


Glu 


Asn 


Gin 


Thr 


Asn 


1 








5 










10 










15 




Arg 


Gin 


Val 


Thr 


Tyr 


Ser 


Lys 


Arg 


Arg 


Asn 


Gly 


Leu 


Phe 


Lys 


Lys 


Ala 








20 










25 










30 






His 


Glu 


Leu 


Thr 


Val 


Leu 


Cys 


Asp 


Ala 


Arg 


Val 


Ser 


He 


He 


Met 


Phe 






35 










40 










45 








Ser 


Ser 


Ser 


Asn 


Lys 


Leu 


His 


Glu 


Tyr 


He 


Ser 


Pro 


Asn 


Thr 


Thr 


Thr 




50 










55 










60 










Lys 


Glu 


He 


Val 


Asp 


Leu 


Tyr 


Gin 


Thr 


He 


Ser 


Asp 


Val 


Asp 


Val 


Trp 


65 










70 










75 










80 


Ala 


Thr 


Gin 


Tyr 


Glu 


Arg 


Met 


Gin 


Glu 


Thr 


Lys 


Arg 


Lys 


Leu 


Leu 


Glu 










85 










90 










95 




Thr 


Asn 


Arg 


Asn 


Leu 


Arg 


Thr 


Gin 


He 


Lys 


Gin 


Arg 


Leu 


Gly 


Glu 


Cys 








100 










105 










110 






Leu 


Asn 


Lys 


Leu 


Asp 


He 


Gin 


Glu 


Leu 


Arg 


Arg 


Leu 


Glu 


Asp 


Glu 


Met 






115 










120 










125 








Glu 


Asn 


Thr 


Phe 


Lys 


Leu 


Val 


Arg 


Glu 


Arg 


Lys 


Phe 


Lys 


Ser 


Leu 


Gly 




130 










135 










140 










Asn 


Gin 


He 


Glu 


Thr 


Thr 


Lys 


Lys 


Lys 


Asn 


Lys 


Ser 


Gin 


Gin 


Asp 


He 



145 150 155 160 
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Gin Lys Asn Leu He His Glu Leu Glu Leu Arg Ala Glu Asp Pro His 

165 170 175 

Tyr Gly Leu Val Asp Asn Gly Gly Asp Tyr Asp Ser Val Leu Gly Tyr 

180 185 190 

Gin He Glu Gly Ser Arg Ala Tyr Ala Leu Arg Phe His Gin Asn His 

195 200 205 

His His Tyr Tyr Pro Asn His Gly Leu His Ala Pro Ser Ala Ser Asp 

210 215 220 

He He Thr Phe His Leu Leu Glu 
225 230 
(2) INFORMATION FOR SEQ ID NO: 337: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..186 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583174 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 337 












Met 


Phe 


Ser 


Ser 


Ser 


Asn 


Lys 


Leu 


His 


Glu 


Tyr 


He 


Ser 


Pro 


Asn 


Thr 


1 








5 










10 










15 




Thr 


Thr 


Lys 


Glu 


He 


Val 


Asp 


Leu 


Tyr 


Gin 


Thr 


He 


Ser 


Asp 


Val 


Asp 








20 










25 










30 






Val 


Trp 


Ala 


Thr 


Gin 


Tyr 


Glu 


Arg 


Met 


Gin 


Glu 


Thr 


Lys 


Arg 


Lys 


Leu 






35 










40 










45 








Leu 


Glu 


Thr 


Asn 


Arg 


Asn 


Leu 


Arg 


Thr 


Gin 


He 


Lys 


Gin 


Arg 


Leu 


Gly 




50 










55 










60 










Glu 


Cys 


Leu 


Asn 


Lys 


Leu 


Asp 


He 


Gin 


Glu 


Leu 


Arg 


Arg 


Leu 


Glu 


Asp 


65 










70 










75 










80 


Glu 


Met 


Glu 


Asn 


Thr 


Phe 


Lys 


Leu 


Val 


Arg 


Glu 


Arg 


Lys 


Phe 


Lys 


Ser 










85 










90 










95 




Leu 


Gly Asn 


Gin 


He 


Glu 


Thr 


Thr 


Lys 


Lys 


Lys 


Asn 


Lys 


Ser 


Gin 


Gin 








100 










105 










110 






Asp 


He 


Gin 


Lys 


Asn 


Leu 


He 


His 


Glu 


Leu 


Glu 


Leu 


Arg 


Ala 


Glu 


Asp 




115 










120 










125 








Pro 


His 


Tyr 


Gly 


Leu 


Val 


Asp 


Asn 


Gly 


Gly Asp 


Tyr 


Asp 


Ser 


Val 


Leu 




130 










135 










140 










Gly 


Tyr 


Gin 


He 


Glu 


Gly 


Ser 


Arg 


Ala 


Tyr 


Ala 


Leu 


Arg 


Phe 


His 


Gin 


145 










150 










155 










160 


Asn 


His 


His 


His 


Tyr 


Tyr 


Pro 


Asn 


His 


Gly 


Leu 


His 


Ala 


Pro 


Ser 


Ala 










165 










170 










175 




Ser 


Asp 


He 


He 


Thr 


Phe 


His 


Leu 


Leu 


Glu 















180 185 
(2) INFORMATION FOR SEQ ID NO:338: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1248 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..1248 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583175 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 338: 
aaaatatgac atttacattc tctcaaaaga aactttctac ataatataaa aagtcacaca 60 
ttctctctat taattaaaca tgttgcttgt tgcgtttgtg actcttttgg tagcagtagc 120 
tttgcagcca ctaccgtcag tgttgtcttt ggacgttcac cttcttcggc agctagccgc 180 
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aaagcacaat gtgacatcga tacttgtgtt tggagattct agtgtcgatc cagggaacaa 
taatttcatt aaaaccgaaa tgaaagggaa ttttccacct tatggtgaaa atttcatcaa 
ccataaaccc accggtagat tatgcgatgg attactcgct cccgattata ttgcggaggc 
catgggttac cctcccatac cggcttttct tgatccatcc ctaacccaag ctgatctaac 
tcgcggtgca agttttgcct ctgctggttc tggctatgac gatctcacag ctaatatatc 
aaacgtatgg agtttcacta cacaagccaa ttactttcta cattacaaga ttcatctgac 
taaattggtc ggtccattag agagcgctaa aatgataaat aatgctatat ttcttatgag 
tatgggatca aatgactttc ttcaaaatta cttagtcgat tttactcgac aaaagcaatt 
cacggttgag caatacatcg agtttctctc ccaccgtatg ctttacgacg ccaagatgtt 
gcataggctt ggagctaaaa ggttggtggt agtgggagtt cctcccatgg gatgcatgcc 
tttaattaaa tacctacgag gccaaaaaac ttgtgtagat caactaaacc aaatcgcttt 
ctcctttaac gccaaaatca tcaaaaatct agagcttctc cagtctaaaa tcggtttgaa 
aaccatctac gttgatgctt attcgaccat ccaagaagcc attaaaaatc cgagaaaatt 
tggtttcgtc gaggcttcgc taggctgttg tggaacaggt acatacgaat atggagagac 
atgcaaagat atgcaagtat gcaaagatcc taccaaatac gttttttggg acgccgtcca 
tccaacacaa agaatgtatc aaatcattgt taagaaggca attgcatcca tcagtgaaga 
gtttcttgtt tagaaatatt atatgttcgt attttactat catgatttat gaaagattct 
agatgtaatt gaaaagcatt caaaatgtta aatttaatgg ttcaaatc 
(2) INFORMATION FOR SEQ ID NO: 339: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 357 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
peptide 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 
1. .357 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583176 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 339: 
Met Leu Leu Val Ala Phe Val Thr Leu Leu Val Ala Val Ala Leu Gin 
15 10 15 

Pro Leu Pro Ser Val Leu Ser Leu Asp Val His Leu Leu Arg Gin Leu 

20 25 30 

Ala Ala Lys His Asn Val Thr Ser He Leu Val Phe Gly Asp Ser Ser 

35 40 45 

Val Asp Pro Gly Asn Asn Asn Phe He Lys Thr Glu Met Lys Gly Asn 

50 55 60 

Phe Pro Pro Tyr Gly Glu Asn Phe lie Asn His Lys Pro Thr Gly Arg 
65 70 75 80 

Leu Cys Asp Gly Leu Leu Ala Pro Asp Tyr He Ala Glu Ala Met Gly 

85 90 95 

Tyr Pro Pro He Pro Ala Phe Leu Asp Pro Ser Leu Thr Gin Ala Asp 

100 105 HO 

Leu Thr Arg Gly Ala Ser Phe Ala Ser Ala Gly Ser Gly Tyr Asp Asp 

115 120 125 

Leu Thr Ala Asn He Ser Asn Val Trp Ser Phe Thr Thr Gin Ala Asn 

130 135 140 

Tyr Phe Leu His Tyr Lys He His Leu Thr Lys Leu Val Gly Pro Leu 
145 150 155 160 

Glu Ser Ala Lys Met He Asn Asn Ala He Phe Leu Met Ser Met Gly 

165 170 175 

Ser Asn Asp Phe Leu Gin Asn Tyr Leu Val Asp Phe Thr Arg Gin Lys 

180 185 190 

Gin Phe Thr Val Glu Gin Tyr He Glu Phe Leu Ser His Arg Met Leu 

195 200 205 

Tyr Asp Ala Lys Met Leu His Arg Leu Gly Ala Lys Arg Leu Val Val 

210 215 220 

Val Gly Val Pro Pro Met Gly Cys Met Pro Leu He Lys Tyr Leu Arg 
225 230 235 240 

Gly Gin Lys Thr Cys Val Asp Gin Leu Asn Gin He Ala Phe Ser Phe 



240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
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245 










250 










255 




Asn 


Ala 


Lys 


He 


He 


Lys 


Asn 


Leu 


Glu 


Leu 


Leu 


Gin 


Ser 


Lys 


He 


Gly 






260 










265 










270 






Leu 


Lys 


Thr 
275 


He 


Tyr 


Val 


Asp 


Ala 
280 


Tyr 


Ser 


Thr 


He 


Gin 
285 


Glu 


Ala 


He 


Lys 


Asn 
290 


Pro 


Arg 


Lys 


Phe 


Gly 
295 


Phe 


Val 


Glu 


Ala 


Ser 
300 


Leu 


Gly 


Cys 


Cys 


Gly 


Thr 


Gly 


Thr 


Tyr 


Glu 


Tyr 


Gly 


Glu 


Thr 


Cys 


Lys 


Asp 


Met 


Gin 


Val 


305 










310 










315 










320 


Cys 


Lys 


Asp 


Pro 


Thr 
325 


Lys 


Tyr 


Val 


Phe 


Trp 

330 


Asp 


Ala 


Val 


His 


Pro 
335 


Thr 


Gin 


Arg 


Met 


Tyr 
340 


Gin 


He 


He 


Val 


Lys 
345 


Lys 


Ala 


He 


Ala 


Ser 
350 


He 


Ser 


Glu 


Glu 


Phe 
355 


Leu 


Val 
























(2) 


INFORMATION 


FOR 


SEQ 


ID NO:340: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 97 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..297 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583177 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 340: 



Met 


Lys 


Gly 


Asn 


Phe 


Pro 


Pro 


Tyr 


Gly 


Glu 


Asn 


Phe 


He 


Asn 


His 


Lys 


1 








5 










10 










15 




Pro 


Thr 


Gly Arg 


Leu 


Cys 


Asp 


Gly 


Leu 


Leu 


Ala 


Pro 


Asp 


Tyr 


He 


Ala 








20 










25 










30 






Glu 


Ala 


Met 


Gly 


Tyr 


Pro 


Pro 


He 


Pro 


Ala 


Phe 


Leu 


Asp 


Pro 


Ser 


Leu 






35 










40 










45 








Thr 


Gin 


Ala 


Asp 


Leu 


Thr 


Arg 


Gly 


Ala 


Ser 


Phe 


Ala 


Ser 


Ala 


Gly 


Ser 




50 










55 










60 










Gly 


Tyr 


Asp 


Asp 


Leu 


Thr 


Ala 


Asn 


He 


Ser 


Asn 


Val 


Trp 


Ser 


Phe 


Thr 


65 










70 










75 










80 


Thr 


Gin 


Ala 


Asn 


Tyr 


Phe 


Leu 


His 


Tyr 


Lys 


He 


His 


Leu 


Thr 


Lys 


Leu 










85 










90 










95 




Val 


Gly 


Pro 


Leu 


Glu 


Ser 


Ala 


Lys 


Met 


He 


Asn 


Asn 


Ala 


He 


Phe 


Leu 






100 










105 










110 






Met 


Ser 


Met 


Gly 


Ser 


Asn 


Asp 


Phe 


Leu 


Gin 


Asn 


Tyr 


Leu 


Val 


Asp 


Phe 






115 










120 










125 








Thr 


Arg 


Gin 


Lys 


Gin 


Phe 


Thr 


Val 


Glu 


Gin 


Tyr 


He 


Glu 


Phe 


Leu 


Ser 




130 










135 










140 










His 


Arg 


Met 


Leu 


Tyr 


Asp 


Ala 


Lys 


Met 


Leu 


His 


Arg 


Leu 


Gly 


Ala 


Lys 


145 








150 










155 










160 


Arg 


Leu 


Val 


Val 


Val 


Gly 


Val 


Pro 


Pro 


Met 


Gly 


Cys 


Met 


Pro 


Leu 


He 








165 










170 










175 




Lys 


Tyr 


Leu 


Arg 


Gly 


Gin 


Lys 


Thr 


Cys 


Val 


Asp 


Gin 


Leu 


Asn 


Gin 


He 




180 










185 










190 






Ala 


Phe 


Ser 


Phe 


Asn 


Ala 


Lys 


He 


He 


Lys 


Asn 


Leu 


Glu 


Leu 


Leu 


Gin 






195 










200 










205 








Ser 


Lys 


He 


Gly 


Leu 


Lys 


Thr 


He 


Tyr 


Val 


Asp 


Ala 


Tyr 


Ser 


Thr 


He 




210 










215 










220 










Gin 


Glu 


Ala 


He 


Lys 


Asn 


Pro 


Arg 


Lys 


Phe 


Gly 


Phe 


Val 


Glu 


Ala 


Ser 


225 










230 










235 










240 


Leu 


Gly 


Cys 


Cys 


Gly 


Thr 


Gly 


Thr 


Tyr 


Glu 


Tyr 


Gly 


Glu 


Thr 


C Y S 


Lys 










245 










250 










255 




Asp 


Met 


Gin 


Val 


Cys 


Lys 


Asp 


Pro 


Thr 


Lys 


Tyr 


Val 


Phe 


Trp 


Asp 


Ala 



260 265 270 
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Val His Pro Thr Gin Arg Met Tyr Gin lie He Val Lys Lys Ala He 

275 280 285 

Ala Ser He Ser Glu Glu Phe Leu Val 

290 295 
(2) INFORMATION FOR SEQ ID NO: 341: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 263 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..263 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583178 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 341: 
Met Gly Tyr Pro Pro He Pro Ala Phe Leu Asp Pro Ser Leu Thr Gin 
15 10 15 

Ala Asp Leu Thr Arg Gly Ala Ser Phe Ala Ser Ala Gly Ser Gly Tyr 

20 25 30 

Asp Asp Leu Thr Ala Asn He Ser Asn Val Trp Ser Phe Thr Thr Gin 

35 40 45 

Ala Asn Tyr Phe Leu His Tyr Lys He His Leu Thr Lys Leu Val Gly 

50 55 60 

Pro Leu Glu Ser Ala Lys Met He Asn Asn Ala He Phe Leu Met Ser 
65 70 75 80 

Met Gly Ser Asn Asp Phe Leu Gin Asn Tyr Leu Val Asp Phe Thr Arg 

85 90 95 

Gin Lys Gin Phe Thr Val Glu Gin Tyr He Glu Phe Leu Ser His Arg 

100 105 HO 

Met Leu Tyr Asp Ala Lys Met Leu His Arg Leu Gly Ala Lys Arg Leu 

115 120 125 

Val Val Val Gly Val Pro Pro Met Gly Cys Met Pro Leu He Lys Tyr 

130 135 140 

Leu Arg Gly Gin Lys Thr Cys Val Asp Gin Leu Asn Gin He Ala Phe 
145 150 155 160 

Ser Phe Asn Ala Lys He He Lys Asn Leu Glu Leu Leu Gin Ser Lys 

165 170 175 

He Gly Leu Lys Thr He Tyr Val Asp Ala Tyr Ser Thr He Gin Glu 

180 185 190 

Ala He Lys Asn Pro Arg Lys Phe Gly Phe Val Glu Ala Ser Leu Gly 

195 200 205 

Cys Cys Gly Thr Gly Thr Tyr Glu Tyr Gly Glu Thr Cys Lys Asp Met 

210 215 220 

Gin Val Cys Lys Asp Pro Thr Lys Tyr Val Phe Trp Asp Ala Val His 
225 230 235 240 

Pro Thr Gin Arg Met Tyr Gin He He Val Lys Lys Ala He Ala Ser 

245 250 255 

He Ser Glu Glu Phe Leu Val 
260 

(2) INFORMATION FOR SEQ ID NO: 34 2: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1293 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..1293 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583318 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 342: 
aaaaggaagt atacgtttgt ttctttgctc tgaggtatct tcgtttgata caaggcgaaa ' 60 
tcgatcttct tcttcttctt cttcatcccc ttgcgttttc catctccatc gtctgctgaa 120 
accatttgat cgatctctca ggttacttag aatggcacaa gaagccgatg ggatcagatt 180 
ggaccaaaga catgggaagg ctcgagtgag agttggaaga gtttggcgtc atgctcatga 240 
tggatctcat cactttgttg aatggaatgt tagcatcagt cttctctctc actgtctctc 300 
ttcttaccgt cttgatgata actctgatat tgtcgccaca gataccatta aaaacactgt 360 
ttatgtgaaa gccaaggaat gtggagatcg gctctcggtg gaggaatttg ccatacttat 420 
tgggaaacac ttttgctcat tttatccaca ggtttttact gctatcgtga atatcattga 480 
gaagccctgg gagcgtgtct ccatcgatgg aaaaccacat ttacatggtt ttaagcttgg 540 
gtcagagaac catactacag aggcaagagt agaaaagtct ggtgcattaa acttaacttc 600 
cggtattgga ggactagctc tactgaagac aacccagtca ggatttgaga ggtttgtcag 660 
agacaaatac accattctgc ctgaaactcg tgagcgaatg ctggccacag aggttaatgc 720 
atcttggagg tactcgtatg agtcagttgc aagcattcca acaaaaggac tctactttag 780 
cgagaagttc atggacgtta agaaagttct gatggatact ttctttggtc caccagaaac 840 
tggtgtgtat agcccttctg tccaacgcac tctctacctc atgggaagcg ccgtactgaa 900 
aaggttcgct gatgtatcat cgattcacct aaaaatgcca aatattcatt ttctaccagt 960 
aaatctttca acgaaggaaa acccttcaat ggtgaagttt aaagatgatg tgtacctgcc 1020 
aaccgatgAa acctcatgga tctatagaag caactctaag ccgtataacc tcgaaactat 1080 
gaggttttaa gacaatacat catacatgga aatatcaaac acaatttcac aggttctgca 1140 
acttttggcg tttgcctcgt ttatgaaata acaatttaaa tgaagtgtgt aaggtcaact 1200 
tttggctttt gttgcaagaa taatactgta actgcaaaca gtttcacata aaaatgtagc 1260 
ttacatatat tatcactgat tcgcagtcat gtt 
(2) INFORMATION FOR SEQ ID NO: 343: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 3 62 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..362 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583319 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:343: 



Lys 


Gly 


Ser 


He 


Arg 


Leu 


Phe 


Leu 


Cys 


Ser 


Glu 


Val 


Ser 


Ser 


Phe 


As P 


1 








5 










10 










15 


Thr 


Arg 


Arg 


Asn 


Arg 


Ser 


Ser 


Ser 


Ser 


Ser 


Ser 


Ser 


Ser 


Pro 


Cys 


Val 








20 










25 










30 




Phe 


His 


Leu 


His 


Arg 


Leu 


Leu 


Lys 


Pro 


Phe 


Asp 


Arg 


Ser 


Leu 


Arg 


Leu 






35 










40 










45 








Leu 


Arg 


Met 


Ala 


Gin 


Glu 


Ala 


Asp 


Gly 


He 


Arg 


Leu 


Asp 


Gin 


Arg 


His 




50 










55 










60 










Gly 


Lys 


Ala 


Arg 


Val 


Arg 


Val 


Gly 


Arg 


Val 


Trp 


Arg 


His 


Ala 


His 


Asp 


65 










70 










75 










80 


Gly 


Ser 


His 


His 


Phe 


Val 


Glu 


Trp 


Asn 


Val 


Ser 


He 


Ser 


Leu 


Leu 


Ser 










85 










90 










95 




His 


Cys 


Leu 


Ser 


Ser 


Tyr 


Arg 


Leu 


Asp 


Asp 


Asn 


Ser 


Asp 


He 


Val 


Ala 








100 










105 










110 






Thr 


Asp 


Thr 


He 


Lys 


Asn 


Thr 


Val 


Tyr 


Val 


lys 


Ala 


Lys 


Glu 


Cys 


Gly 






115 










120 










125 






Asp 


Arg 


Leu 


Ser 


Val 


Glu 


Glu 


Phe 


Ala 


He 


Leu 


He 


Gly 


Lys 


His 


Phe 




130 










135 










140 








Cys 


Ser 


Phe 


Tyr 


Pro 


Gin 


Val 


Phe 


Thr 


Ala 


He 


Val 


Asn 


He 


He 


Glu 


145 










150 










155 










160 


Lys 


Pro 


Trp 


Glu 


Arg 


Val 


Ser 


He 


Asp 


Gly 


Lys 


Pro 


His 


Leu 


His 


Gly 










165 










170 










175 


Phe 


Lys 


Leu 


Gly 


Ser 


Glu 


Asn 


His 


Thr 


Thr 


Glu 


Ala 


Arg 


Val 


Glu 


Lys 








180 










185 










190 




Ser 


Gly 


Ala 


Leu 


Asn 


Leu 


Thr 


Ser 


Gly 


He 


Gly 


Gly 


Leu 


Ala 


Leu 


Leu 



195 200 205 
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Lys Thr Thr Gin Ser Gly Phe Glu Arg Phe Val Arg Asp Lys Tyr Thr 

210 215 220 

lie Leu Pro Glu Thr Arg Glu Arg Met Leu Ala Thr Glu Val Asn Ala 
225 230 235 240 

Ser Trp Arg Tyr Ser Tyr Glu Ser Val Ala Ser lie Pro Thr Lys Gly 

245 250 255 

Leu Tyr Phe Ser Glu Lys Phe Met Asp Val Lys Lys Val Leu Met Asp 

260 265 270 

Thr Phe Phe Gly Pro Pro Glu Thr Gly Val Tyr Ser Pro Ser Val Gin 

275 280 285 

Arg Thr Leu Tyr Leu Met Gly Ser Ala Val Leu Lys Arg Phe Ala Asp 

290 295 300 

Val Ser Ser lie His Leu Lys Met Pro Asn lie His Phe Leu Pro Val 
305 310 315 320 

Asn Leu Ser Thr Lys Glu Asn Pro Ser Met Val Lys Phe Lys Asp Asp 

325 330 335 

Val Tyr Leu Pro Thr Asp Glu Thr Ser Trp lie Tyr Arg Ser Asn Ser 

340 345 350 

Lys Pro Tyr Asn Leu Glu Thr Met Arg Phe 

355 360 
(2) INFORMATION FOR SEQ ID NO: 344: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 312 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 312 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583320 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 344: 
Met Ala Gin Glu Ala Asp Gly lie Arg Leu Asp Gin Arg His Gly Lys 
15 10 15 

Ala Arg Val Arg Val Gly Arg Val Trp Arg His Ala His Asp Gly Ser 

20 25 30 

His His Phe Val Glu Trp Asn Val Ser lie Ser Leu Leu Ser His Cys 

35 40 45 

Leu Ser Ser Tyr Arg Leu Asp Asp Asn Ser Asp lie Val Ala Thr Asp 

50 55 60 

Thr lie Lys Asn Thr Val Tyr Val Lys Ala Lys Glu Cys Gly Asp Arg 
65 70 75 80 

Leu Ser Val Glu Glu Phe Ala lie Leu lie Gly Lys His Phe Cys Ser 

85 90 95 

Phe Tyr Pro Gin Val Phe Thr Ala lie Val Asn lie lie Glu Lys Pro 

100 105 110 

Trp Glu Arg Val Ser lie Asp Gly Lys Pro His Leu His Gly Phe Lys 

115 120 125 

Leu Gly Ser Glu Asn His Thr Thr Glu Ala Arg Val Glu Lys Ser Gly 

130 135 140 

Ala Leu Asn Leu Thr Ser Gly lie Gly Gly Leu Ala Leu Leu Lys Thr 
145 150 155 160 

Thr Gin Ser Gly Phe Glu Arg Phe Val Arg Asp Lys Tyr Thr lie Leu 

165 170 175 

Pro Glu Thr Arg Glu Arg Met Leu Ala Thr Glu Val Asn Ala Ser Trp 

180 185 190 

Arg Tyr Ser Tyr Glu Ser Val Ala Ser lie Pro Thr Lys Gly Leu Tyr 

195 200 205 

Phe Ser Glu Lys Phe Met Asp Val Lys Lys Val Leu Met Asp Thr Phe 

210 215 220 

Phe Gly Pro Pro Glu Thr Gly Val Tyr Ser Pro Ser Val Gin Arg Thr 
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225 230 235 240 

Leu Tyr Leu Met Gly Ser Ala Val Leu Lys Arg Phe Ala Asp Val Ser 

245 250 255 

Ser lie His Leu Lys Met Pro Asn lie His Phe Leu Pro Val Asn Leu 

260 265 270 

Ser Thr Lys Glu Asn Pro Ser Met Val Lys Phe Lys Asp Asp Val Tyr 

275 280 285 

Leu Pro Thr Asp Glu Thr Ser Trp lie Tyr Arg Ser Asn Ser Lys Pro 

290 295 300 

Tyr Asn Leu Glu Thr Met Arg Phe 
305 310 
(2) INFORMATION FOR SEQ ID NO: 34 5: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1696 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA [genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1696 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583381 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 345: 



agagagacca aaacaaaaaa agctttgaat atccgccatg gatgttggac gatgctttct 60 

ctttcttcta ttaccaagtt tcttcttctt gccttcacaa acgcagagca ccgattcatt 120 

cacatcggtt ctcgtatctc aaaacggact cgatttcgtc aaaaatcttc tagttaacaa 180 

agctatcgcg tcgattatac ctctccagat tcccaggatt gagaaatcta tgaaaatccc 240 

gttccttgga ggaatcgatg ttgttgtctc gaatctcacg atatacgagc ttgacgttgc 300 

gtcatcttat gttaagcttg gtgagacagg tgttgtcatc gtcgcttcag ggacgacttg 360 

taatttgagt atgaattggc attattctta cagtacttgg cttcctccta ttgaaatatc 420 

tgatcagggt attgcatctg ttcaggttca aggcatggaa atcgggctct ctctaggctt 480 

gaaaagtgat gaaggaggct taaaactttc tctttcggaa tgtggatgcc atgtggaaga 54 0 

cattaccatt gagttggaag gaggagcatc atggttttat caggggatgg ttaatgcatt 600 

taaagaccaa attgggtcaa gtgtggaaag taccattgcc aagaaactca cggaaggagt 660 

atcagacctt gattcattcc tacaaaacct tccaaaggaa atcccagtgg atgataatgc 720 

tgaccttaat gtcactttca ccagtgatcc tatattgaga aattcatcta tcacttttga 780 

gattgatggc ttgttcacca aaggagaaac aaatcaagtc ttaaaatcct tcttcaaaaa 84 0 

gtctgtatct ttagttatct gccctggaaa ttctaaaatg ctcggaattt cagttgacga 900 

agctgttttt aactctgctg cagctttgta ctataatgca gactttgtgc aatgggttgt 960 

ggataaaata ccagagcagt ctcttttaaa cactgctagg tggaggttca tcattccaca 1020 

actatacaag aaatacccga accaggatat gaatctgaac atcagtttgt cctcacctcc 1080 

gcttgtaaag atatcagagc aatacgttgg agctaatgtt aatgcagacc tagtaatcaa 1140 

tgttctagat gcaaaccaag taatacctgt agcttgcatc tccttgatga tccggggatc 1200 

tggtgctctc agagtcatgg gcaataacct tggaggcagc gtaagtttag aagatttctc 1260 

catgtcCttt gaaatggagc aacattggaa atctccatct gcatcttctt cagccaatag 1320 

tgtggactgt tatacaaaca gtgtttgtgc catatgcaaa tgaccaccta gaaaagggat 1380 

tcccgttgcc cataatgcac ggattcacac ttcagaatgc ggaaataatc tgctcagaat 1440 

ctgaaatcac agtttgcagc gatgtcgcct acttggattc gtcccaacag cctcaatggc 1500 

tttgaagcta ccacaatcta cgacttggaa acctaactcg ttgtagttta caattggtgg 1560 

ttgtaagtgt cgctactggt ttggcatttg gaccagactc agcgaccaaa acgaacaaag 1620 

ctttgtataa tacttttgta cttaatgttg actaatctgt gaatgatatt tctgtctgaa 1680 



ataaaatagt ctttgc 

(2) INFORMATION FOR SEQ ID NO: 346: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 441 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 
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(B) LOCATION: 1..441 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583382 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 346: 
Met Asp Val Gly Arg Cys Phe Leu Phe Leu Leu Leu Pro Ser Phe Phe 
15 10 15 

Phe Leu Pro Ser Gin Thr Gin Ser Thr Asp Ser Phe Thr Ser Val Leu 

20 25 30 

Val Ser Gin Asn Gly Leu Asp Phe Val Lys Asn Leu Leu Val Asn Lys 

35 40 45 

Ala lie Ala Ser lie lie Pro Leu Gin lie Pro Arg lie Glu Lys Ser 

50 55 60 

Met Lys lie Pro Phe Leu Gly Gly He Asp Val Val Val Ser Asn Leu 
65 70 75 80 

Thr He Tyr Glu Leu Asp Val Ala Ser Ser Tyr Val Lys Leu Gly Glu 

85 90 95 

Thr Gly Val Val He Val Ala Ser Gly Thr Thr Cys Asn Leu Ser Met 

100 105 110 

Asn Trp His Tyr Ser Tyr Ser Thr Trp Leu Pro Pro He Glu He Ser 

115 120 125 

Asp Gin Gly He Ala Ser Val Gin Val Gin Gly Met Glu He Gly Leu 

130 135 140 

Ser Leu Gly Leu Lys Ser Asp Glu Gly Gly Leu Lys Leu Ser Leu Ser 
145 150 155 160 

Glu Cys Gly Cys His Val Glu Asp He Thr He Glu Leu Glu Gly Gly 

165 170 175 

Ala Ser Trp Phe Tyr Gin Gly Met Val Asn Ala Phe Lys Asp Gin He 

180 185 190 

Gly Ser Ser Val Glu Ser Thr He Ala Lys Lys Leu Thr Glu Gly Val 

195 200 205 

Ser Asp Leu Asp Ser Phe Leu Gin Asn Leu Pro Lys Glu He Pro Val 

210 215 220 

Asp Asp Asn Ala Asp Leu Asn Val Thr Phe Thr Ser Asp Pro He Leu 
225 230 235 240 

Arg Asn Ser Ser He Thr Phe Glu He Asp Gly Leu Phe Thr Lys Gly 

245 250 255 

Glu Thr Asn Gin Val Leu Lys Ser Phe Phe Lys Lys Ser Val Ser Leu 

260 265 270 

Val He Cys Pro Gly Asn Ser Lys Met Leu Gly He Ser Val Asp Glu 

275 280 285 

Ala Val Phe Asn Ser Ala Ala Ala Leu Tyr Tyr Asn Ala Asp Phe Val 

290 295 300 

Gin Trp Val Val Asp Lys He Pro Glu Gin Ser Leu Leu Asn Thr Ala 
305 310 315 320 

Arg Trp Arg Phe He He Pro Gin Leu Tyr Lys Lys Tyr Pro Asn Gin 

325 330 335 

Asp Met Asn Leu Asn He Ser Leu Ser Ser Pro Pro Leu Val Lys He 

340 345 350 

Ser Glu Gin Tyr Val Gly Ala Asn Val Asn Ala Asp Leu Val He Asn 

355 360 365 

Val Leu Asp Ala Asn Gin Val He Pro Val Ala Cys He Ser Leu Met 

370 375 380 

He Arg Gly Ser Gly Ala Leu Arg Val Met Gly Asn Asn Leu Gly Gly 
385 390 395 400 

Ser Val Ser Leu Glu Asp Phe Ser Met Ser Phe Glu Met Glu Gin His 

405 410 415 

Trp Lys Ser Pro Ser Ala Ser Ser Ser Ala Asn Ser Val Asp Cys Tyr 

420 425 430 

Thr Asn Ser Val Cys Ala He Cys Lys 

435 440 
(2) INFORMATION FOR SEQ ID NO: 347: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 377 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..377 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583383 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 347 












Met 


Lys 


He 


Pro 


Phe 


Leu 


Gly 


Gly 


He 


Asp 


Val 


Val 


Val 


Ser 


Asn 


Leu 


1 








5 










10 










15 




Thr 


He 


Tyr 


Glu 


Leu 


Asp 


Val 


Ala 


Ser 


Ser 


Tyr 


Val 


Lys 


Leu 


Gly 


Glu 








20 










25 










30 






Thr 


Gly Val 


Val 


He 


Val 


Ala 


Ser 


Gly 


Thr 


Thr 


Cys 


Asn 


Leu 


Ser 


Met 






35 










40 










45 








Asn 


Trp 


His 


Tyr 


Ser 


Tyr 


Ser 


Thr 


Trp 


Leu 


Pro 


Pro 


He 


Glu 


He 


Ser 




50 










55 










60 










Asp 


Gin 


Gly 


He 


Ala 


Ser 


Val 


Gin 


Val 


Gin 


Gly 


Met 


Glu 


He 


Gly 


Leu 


65 










70 










75 










80 


Ser 


Leu 


Gly 


Leu 


Lys 


Ser 


Asp 


Glu 


Gly 


Gly 


Leu 


Lys 


Leu 


Ser 


Leu 


Ser 










85 










90 










95 




Glu 


Cys 


Gly 


Cys 


His 


Val 


Glu 


Asp 


He 


Thr 


He 


Glu 


Leu 


Glu 


Gly 


Gly 








100 










105 










110 






Ala 


Ser 


Trp 


Phe 


Tyr 


Gin 


Gly 


Met 


Val 


Asn 


Ala 


Phe 


Lys 


Asp 


Gin 


He 






115 










120 










125 








Gly 


Ser 


Ser 


Val 


Glu 


Ser 


Thr 


He 


Ala 


Lys 


Lys 


Leu 


Thr 


Glu 


Gly 


Val 




130 










135 










140 










Ser 


Asp 


Leu 


Asp 


Ser 


Phe 


Leu 


Gin 


Asn 


Leu 


Pro 


Lys 


Glu 


He 


Pro 


Val 


145 










150 










155 










160 


Asp 


Asp 


Asn 


Ala 


Asp 


Leu 


Asn 


Val 


Thr 


Phe 


Thr 


Ser 


Asp 


Pro 


He 


Leu 










165 










170 










175 




Arg 


Asn 


Ser 


Ser 


He 


Thr 


Phe 


Glu 


He 


Asp 


Gly 


Leu 


Phe 


Thr 


Lys 


Gly 








180 










185 










190 






Glu 


Thr 


Asn 


Gin 


Val 


Leu 


Lys 


Ser 


Phe 


Phe 


Lys 


Lys 


Ser 


Val 


Ser 


Leu 






195 










200 










205 








Val 


He 


Cys 


Pro 


Gly Asn 


Ser 


Lys 


Met 


Leu 


Gly 


He 


Ser 


Val 


Asp 


Glu 




210 










215 










220 










Ala 


Val 


Phe 


Asn 


Ser 


Ala 


Ala 


Ala 


Leu 


Tyr 


Tyr 


Asn 


Ala 


Asp 


Phe 


Val 


225 










230 










235 










240 


Gin 


Trp 


Val 


Val 


Asp 


Lys 


He 


Pro 


Glu 


Gin 


Ser 


Leu 


Leu 


Asn 


Thr 


Ala 










245 










250 










255 




Arg 


Trp 


Arg 


Phe 


He 


He 


Pro 


Gin 


Leu 


Tyr 


Lys 


Lys 


Tyr 


Pro 


Asn 


Gin 








260 










265 










270 






Asp 


Met 


Asn 


Leu 


Asn 


He 


Ser 


Leu 


Ser 


Ser 


Pro 


Pro 


Leu 


Val 


Lys 


He 






275 










280 










285 








Ser 


Glu 


Gin 


Tyr 


Val 


Gly Ala 


Asn 


Val 


Asn 


Ala 


Asp 


Leu 


Val 


He 


Asn 




290 










295 










300 










Val 


Leu 


Asp 


Ala 


Asn 


Gin 


Val 


He 


Pro 


Val 


Ala 


Cys 


He 


Ser 


Leu 


Met 


305 










310 










315 










320 


He 


Arg 


Gly 


Ser 


Gly 


Ala 


Leu 


Arg 


Val 


Met 


Gly 


Asn 


Asn 


Leu 


Gly 


Gly 










325 










330 










335 




Ser 


Val 


Ser 


Leu 


Glu 


Asp 


Phe 


Ser 


Met 


Ser 


Phe 


Glu 


Met 


Glu 


Gin 


His 








340 










345 










350 






Trp 


Lys 


Ser 


Pro 


Ser 


Ala 


Ser 


Ser 


Ser 


Ala 


Asn 


Ser 


Val 


Asp 


Cys 


Tyr 






355 










360 










365 








Thr 


Asn 


Ser 


Val 


Cys 


Ala 


He 


Cys 


Lys 

















370 375 
(2) INFORMATION FOR SEQ ID NO:348: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 330 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: H.330 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583384 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 348: 
Met Asn Trp His Tyr Ser Tyr Ser Thr Trp Leu Pro Pro lie Glu lie 
15 10 15 

Ser Asp Gin Gly He Ala Ser Val Gin Val Gin Gly Met Glu He Gly 

20 25 30 

Leu Ser Leu Gly Leu Lys Ser Asp Glu Gly Gly Leu Lys Leu Ser Leu 

35 40 45 

Ser Glu Cys Gly Cys His Val Glu Asp He Thr He Glu Leu Glu Gly 

50 55 60 

Gly Ala Ser Trp Phe Tyr Gin Gly Met Val Asn Ala Phe Lys Asp Gin 
65 70 75 80 

He Gly Ser Ser Val Glu Ser Thr He Ala Lys Lys Leu Thr Glu Gly 

85 90 95 

Val Ser Asp Leu Asp Ser Phe Leu Gin Asn Leu Pro Lys Glu He Pro 

100 105 110 

Val Asp Asp Asn Ala Asp Leu Asn Val Thr Phe Thr Ser Asp Pro He 

115 120 125 

Leu Arg Asn Ser Ser He Thr Phe Glu He Asp Gly Leu Phe Thr Lys 

130 135 140 

Gly Glu Thr Asn Gin Val Leu Lys Ser Phe Phe Lys Lys Ser Val Ser 
145 150 155 160 

Leu Val He Cys Pro Gly Asn Ser Lys Met Leu Gly He Ser Val Asp 

165 170 175 

Glu Ala Val Phe Asn Ser Ala Ala Ala Leu Tyr Tyr Asn Ala Asp Phe 

180 185 190 

Val Gin Trp Val Val Asp Lys He Pro Glu Gin Ser Leu Leu Asn Thr 

195 200 205 

Ala Arg Trp Arg Phe He He Pro Gin Leu Tyr Lys Lys Tyr Pro Asn 

210 215 220 

Gin Asp Met Asn Leu Asn He Ser Leu Ser Ser Pro Pro Leu Val Lys 
225 ' 230 235 240 

He Ser Glu Gin Tyr Val Gly Ala Asn Val Asn Ala Asp Leu Val He 

245 250 255 

Asn Val Leu Asp Ala Asn Gin Val He Pro Val Ala Cys He Ser Leu 

260 265 270 

Met He Arg Gly Ser Gly Ala Leu Arg Val Met Gly Asn Asn Leu Gly 

275 280 285 

Gly Ser Val Ser Leu Glu Asp Phe Ser Met Ser Phe Glu Met Glu Gin 

290 295 300 

His Trp Lys Ser Pro Ser Ala Ser Ser Ser Ala Asn Ser Val Asp Cys 
305 310 315 320 

Tyr Thr Asn Ser Val Cys Ala He Cys Lys 
325 330 
(2) INFORMATION FOR SEQ ID NO:349: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 61 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..861 
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(D) OTHER INFORMATION: / Ceres Seq. ID 1583403 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 349: 

aatttttaca cacagagtag agcagagaga gagagagaga gagagatggg tgattctgac 60 

gtcggtgatc gtcttccccc tccatcttct tccgacgaac tctcgagctt tctccgacat 120 

attctttccc gtactcctac agctcaacct tcttcaccac cgaagagtac taatgtttcc 180 

tccgctgaga ccttcttccc ttccgtttcc ggcggactgt ttcttccgtc ggttatggag 240 

tctctgaaac tggccaagac aaatatgctt tcgaacacaa gagaagtgga gctaaacaga 300 

gaaattcgtt gaagagaaac attgatgctc aattccacaa cttgtctgaa aagaagagga 360 

ggagcaagat caacgagaaa atgaaagctt tgcagaaact cattcccaat tccaacaaga 420 

ctgataaagc ctcaatgctt gatgaagcta tagaatatct gaagcagctt caacttcaag 4 80 

tccagacttt agccgttatg aatggtttag gcttaaaccc tatgcgatta ccacaggttc 540 

cacctccaac tcatacaagg atcaatgaga ccttagagca agacctgaac ctagagactc 600 

ttctcgctgc tcctcactcg ctggaaccag ctaaaacaag tcaaggaatg tgcttttcca 660 

cagccactct gctttgaaga taacattcag acaatgatga tgatcggaat tcctctagta 720 

cctgccagac aggagtgaac aatgttttga gttttagcat tggccagatt tctatgttca 780 

gttatagtta tgctaataag ctttaggagt gaacaaaatc tgagtagttt gattataatg 840 



atgtctgaag cagattatat g 
(2) INFORMATION FOR SEQ ID NO: 350: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..103 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583404 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 350: 



Asn 


Phe 


Tyr 


Thr 


Gin 


Ser 


Arg 


Ala 


Glu 


Arg 


Glu 


Arg 


Glu 


Arg 


Glu 


Met 


1 








5 










10 










15 




Gly Asp 


Ser 


Asp 


Val 


Gly 


Asp 


Arg 


Leu 


Pro 


Pro 


Pro 


Ser 


Ser 


Ser 


Asp 








20 










25 










30 






Glu 


Leu 


Ser 


Ser 


Phe 


Leu 


Arg 


His 


He 


Leu 


Ser 


Arg 


Thr 


Pro 


Thr 


Ala 






35 










40 










45 








Gin 


Pro 


Ser 


Ser 


Pro 


Pro 


Lys 


Ser 


Thr 


Asn 


Val 


Ser 


Ser 


Ala 


Glu 


Thr 




50 










55 










60 










Phe 


Phe 


Pro 


Ser 


Val 


Ser 


Gly 


Gly 


Leu 


Phe 


Leu 


Pro 


Ser 


Val 


Met 


Glu 


65 










70 










75 










80 


Ser 


Leu 


Lys 


Leu 


Ala 


Lys 


Thr 


Asn 


Met 


Leu 


Ser 


Asn 


Thr 


Arg 


Glu 


Val 










85 










90 










95 




Glu 


Leu 


Asn 


Arg 


Glu 


He 


Arg 





















100 



(2) INFORMATION FOR SEQ ID NO: 351: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME / KEY : peptide 

(B) LOCATION: 1..88 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583405 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 351: 
Met Gly Asp Ser Asp Val Gly Asp Arg Leu Pro Pro Pro Ser Ser Ser 
15 10 15 

Asp Glu Leu Ser Ser Phe Leu Arg His lie Leu Ser Arg Thr Pro Thr 

20 25 30 

Ala Gin Pro Ser Ser Pro Pro Lys Ser Thr Asn Val Ser Ser Ala Glu 
35 40 45 
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Thr Phe Phe Pro Ser Val Ser Gly Gly Leu Phe Leu Pro Ser Val Met 

50 55 60 

Glu Ser Leu Lys Leu Ala Lys Thr Asn Met Leu Ser Asn Thr Arg Glu 
65 70 75 80 

Val Glu Leu Asn Arg Glu lie Arg 
85 

(2) INFORMATION FOR SEQ ID NO: 352: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 98 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 98 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583406 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 352: 



Met 


Lys 


Ala 


Leu 


Gin 


Lys 


Leu 


He 


Pro 


Asn 


Ser 


Asn 


Lys 


Thr 


Asp 


Lys 


1 








5 










10 










15 




Ala 


Ser 


Met 


Leu 
20 


Asp 


Glu 


Ala 


He 


Glu 
25 


Tyr 


Leu 


Lys 


Gin 


Leu 
30 


Gin 


Leu 


Gin 


Val 


Gin 


Thr 


Leu 


Ala 


Val 


Met 


Asn 


Gly 


Leu 


Gly Leu Asn 


Pro 


Met 






35 










40 










45 








Arg 


Leu 
50 


Pro 


Gin 


Val 


Pro 


Pro 
55 


Pro 


Thr 


His 


Thr 


Arg 
60 


He 


Asn 


Glu 


Thr 


Leu 


Glu 


Gin 


Asp 


Leu 


Asn 


Leu 


Glu 


Thr 


Leu 


Leu 


Ala 


Ala 


Pro 


His 


Ser 


65 










70 










75 










80 


Leu 


Glu 


Pro 


Ala 


Lys 
85 


Thr 


Ser 


Gin 


Gly 


Met 
90 


Cys 


Phe 


Ser 


Thr 


Ala 
95 


Thr 



Leu Leu 



(2) INFORMATION FOR SEQ ID NO: 353: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1308 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..1308 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583478 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 353: 



atattttcat ataaataaac ctctcaacct ccacactttc tcacccatca cacaatcctc 60 

aaaacagagt aacccaaaaa acagagcaat ctctaaaaaa tctcaagaaa cctcactaaa 120 

atgggttcaa cggcggagac acaattaact ccggtgcaag tcaccgacga cgaagctgcc 180 

ctcttcgcca tgcaactagc cagtgcttcc gttcttccga tggctttaaa atccgcctta 240 

gagcttgacc ttcttgagat tatggccaag aatggttctc ccatgtctcc taccgagatc 300 

gcttctaaac ttccgaccaa aaaccctgaa gctccggtca tgctcgaccg tatcctccgt 360 

cttcttacgt cttactccgt cttaacctgc tccaaccgta aactttccgg tgatggcgtt 420 

gaacggattt acgggcttgg tccggtttgc aagtatttga ccaagaacga agatggtgtt 480 

tccattgctg ctctttgtct tatgaaccaa gacaaggttc tcatggaaag ctggtaccat 54 0 

ttgaaggatg caattcttga tggtgggatt ccattcaaca aggcttatgg aatgagcgcg 600 

ttcgagtacc acgggactga ccctagattc aacaaggtct ttaacaatgg aatgtctaac 660 

cattccacaa tcaccatgaa gaagattctt gagacctata agggttttga agggttgact 720 

tctttggttg atgttggtgg tggcattggt gctacactca aaatgattgt ctccaagtac 780 

cctaatctta aaggcatcaa ctttgatctc ccacatgtca ttgaagatgc tccttctcat 840 

cctggtattg agcatgttgg aggagatatg tttgtaagtg tccctaaagg tgatgccata 900 

ttcatgaagt ggatatgtca tgactggagt gacgaacatt gcgtgaaatt cttgaaaaac 960 

tgctacgagt cacttccaga ggatggaaaa gtgatattag cagagtgtat acttccagag 1020 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 2 
Page 209 



acaccagact caagcctctc aaccaaacaa gtagtccatg tcgattgcat tatgttggct 1080 
cacaatcccg gaggcaaaga acgaaccgag aaagagtttg aggcattagc caaagcatca 1140 
ggcttcaagg gcatcaaagt tgtctgcgac gcttttggtg ttaaccttat tgagttactc 1200 
aagaagctct aaaaacaaac aatgttccta tgaagatgat ttatatgtaa acattatctc 1260 
atatctcctt ccacggttcc aaaactatgc tgtttaataa tggttttt 
(2) INFORMATION FOR SEQ ID NO: 354: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 363 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..363 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583479 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 354: 



Met 


Gly 


Ser 


Thr 


Ala 


Glu 


Thr 


Gin 


Leu 


Thr 


Pro 


Val 


Gin 


Val 


Thr 


Asp 


1 








5 










10 










15 




Asp 


Glu 


Ala 


Ala 


Leu 


Phe 


Ala 


Met 


Gin 


Leu 


Ala 


Ser 


Ala 


Ser 


Val 


Leu 








20 










25 










30 






Pro 


Met 


Ala 


Leu 


Lys 


Ser 


Ala 


Leu 


Glu 


Leu 


Asp 


Leu 


Leu 


Glu 


He 


Met 






35 










40 










45 








Ala 


Lys 


Asn 


Gly 


Ser 


Pro 


Met 


Ser 


Pro 


Thr 


Glu 


He 


Ala 


Ser 


Lys 


Leu 




50 










55 










60 










Pro 


Thr 


Lys 


Asn 


Pro 


Glu 


Ala 


Pro 


Val 


Met 


Leu 


Asp 


Arg 


He 


Leu 


Arg 


65 










70 










75 










80 


Leu 


Leu 


Thr 


Ser 


Tyr 


Ser 


Val 


Leu 


Thr 


Cys 


Ser 


Asn 


Arg 


Lys 


Leu 


Ser 










85 










90 










95 




Gly Asp 


Gly 


Val 


Glu 


Arg 


He 


Tyr 


Gly 


Leu 


Gly 


Pro 


Val 


Cys 


Lys 


Tyr 








100 










105 










110 






Leu 


Thr 


Lys 


Asn 


Glu 


Asp 


Gly 


Val 


Ser 


He 


Ala 


Ala 


Leu 


Cys 


Leu 


Met 






115 










120 










125 








Asn 


Gin 


Asp 


Lys 


Val 


Leu 


Met 


Glu 


Ser 


Trp 


Tyr 


His 


Leu 


Lys 


Asp 


Ala 




130 










135 










140 










He 


Leu Asp 


Gly 


Gly 


He 


Pro 


Phe 


Asn 


Lys 


Ala 


Tyr 


Gly Met 


Ser 


Ala 


145 










150 










155 










160 


Phe 


Glu 


Tyr 


His 


Gly 


Thr 


Asp 


Pro 


Arg 


Phe 


Asn 


Lys 


Val 


Phe 


Asn 


Asn 










165 










170 










175 




Gly 


Met 


Ser 


Asn 


His 


Ser 


Thr 


He 


Thr 


Met 


Lys 


Lys 


He 


Leu 


Glu 


Thr 








180 










185 










190 






Tyr 


Lys 


Gly 


Phe 


Glu 


Gly 


Leu 


Thr 


Ser 


Leu 


Val 


Asp 


Val 


Gly 


Gly 


Gly 






195 










200 










205 








He 


Gly 


Ala 


Thr 


Leu 


Lys 


Met 


He 


Val 


Ser 


Lys 


Tyr 


Pro 


Asn 


Leu 


Lys 




210 










215 










220 










Gly 


He 


Asn 


Phe 


Asp 


Leu 


Pro 


His 


Val 


He 


Glu 


Asp 


Ala 


Pro 


Ser 


His 


225 










230 










235 










240 


Pro 


Gly 


He 


Glu 


His 


Val 


Gly 


Gly 


Asp 


Met 


Phe 


Val 


Ser 


Val 


Pro 


Lys 










245 










250 










255 




Gly Asp 


Ala 


He 


Phe 


Met 


Lys 


Trp 


He 


Cys 


His 


Asp 


Trp 


Ser 


Asp 


Glu 








260 










265 










270 






His 


Cys 


Val 


Lys 


Phe 


Leu 


Lys 


Asn 


Cys 


Tyr 


Glu 


Ser 


Leu 


Pro 


Glu 


Asp 






275 










280 










285 








Gly 


Lys 


Val 


He 


Leu 


Ala 


Glu 


Cys 


He 


Leu 


Pro 


Glu 


Thr 


Pro 


Asp 


Ser 




290 










295 










300 










Ser 


Leu 


Ser 


Thr 


Lys 


Gin 


Val 


Val 


His 


Val 


Asp 


Cys 


He 


Met 


Leu 


Ala 


305 










310 










315 










320 


His 


Asn 


Pro 


Gly 


Gly 


Lys 


Glu 


Arg 


Thr 


Glu 


Lys 


Glu 


Phe 


Glu 


Ala 


Leu 










325 










330 










335 




Ala 


Lys 


Ala 


Ser 


Gly 


Phe 


Lys 


Gly 


He 


Lys 


Val 


Val 


Cys 


Asp 


Ala 


Phe 








340 










345 










350 
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Gly Val Asn Leu lie Glu Leu Leu Lys Lys Leu 

355 360 
(2) INFORMATION FOR SEQ ID NO: 355: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..340 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583480 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 355: 



Met 


Gin 


Leu 


Ala 


Ser 


Ala 


Ser 


Val 


Leu 


Pro 


Met 


Ala 


Leu 


Lys 


Ser 


Ala 


1 








5 










10 










15 




Leu 


Glu 


Leu 


Asp 


Leu 


Leu 


Glu 


He 


Met 


Ala 


Lys 


Asn 


Gly 


Ser 


Pro 


Met 








20 










25 










30 






Ser 


Pro 


Thr 


Glu 


He 


Ala 


Ser 


Lys 


Leu 


Pro 


Thr 


Lys 


Asn 


Pro 


Glu 


Ala 






35 










40 










45 








Pro 


Val 


Met 


Leu 


Asp 


Arg 


He 


Leu 


Arg 


Leu 


Leu 


Thr 


Ser 


Tyr 


Ser 


Val 




50 










55 










60 










Leu 


Thr 


Cys 


Ser 


Asn 


Arg 


Lys 


Leu 


Ser 


Gly Asp 


Gly Val 


Glu 


Arg 


He 


65 










70 










75 










80 


Tyr 


Gly 


Leu 


Gly 


Pro 


Val 


Cys 


Lys 


Tyr 


Leu 


Thr 


Lys 


Asn 


Glu 


Asp 


Gly 










85 










90 










95 




Val 


Ser 


He 


Ala 


Ala 


Leu 


Cys 


Leu 


Met 


Asn 


Gin 


Asp 


Lys 


Val 


Leu 


Met 








100 










105 










110 






Glu 


Ser 


Trp 


Tyr 


His 


Leu 


Lys 


Asp 


Ala 


He 


Leu 


Asp 


Gly 


Gly 


He 


Pro 






115 










120 










125 








Phe 


Asn 


Lys 


Ala 


Tyr 


Gly 


Met 


Ser 


Ala 


Phe 


Glu 


Tyr 


His 


Gly 


Thr 


Asp 




130 










135 










140 










Pro 


Arg 


Phe 


Asn 


Lys 


Val 


Phe 


Asn 


Asn 


Gly 


Met 


Ser 


Asn 


His 


Ser 


Thr 


145 










150 










155 










160 


He 


Thr 


Met 


Lys 


Lys 


He 


Leu 


Glu 


Thr 


Tyr 


Lys 


Gly 


Phe 


Glu 


Gly 


Leu 










165 










170 










175 




Thr 


Ser 


Leu 


Val 


Asp 


Val 


Gly 


Gly 


Gly 


He 


Gly Ala 


Thr 


Leu 


Lys 


Met 








180 










185 










190 






He 


Val 


Ser 


Lys 


Tyr 


Pro 


Asn 


Leu 


Lys 


Gly 


He 


Asn 


Phe 


Asp 


Leu 


Pro 






195 










200 










205 








His 


Val 


He 


Glu 


Asp 


Ala 


Pro 


Ser 


His 


Pro 


Gly 


He 


Glu 


His 


Val 


Gly 




210 










215 










220 










Gly Asp 


Met 


Phe 


Val 


Ser 


Val 


Pro 


Lys 


Gly 


Asp 


Ala 


He 


Phe 


Met 


Lys 


225 










230 










235 










240 


Trp 


He 


Cys 


His 


Asp 


Trp 


Ser 


Asp 


Glu 


His 


Cys 


Val 


Lys 


Phe 


Leu 


Lys 










245 










250 










255 




Asn 


Cys 


Tyr 


Glu 


Ser 


Leu 


Pro 


Glu 


Asp 


Gly 


Lys 


Val 


He 


Leu 


Ala 


Glu 








260 










265 










270 






C Y S 


He 


Leu 


Pro 


Glu 


Thr 


Pro 


Asp 


Ser 


Ser 


Leu 


Ser 


Thr 


Lys 


Gin 


Val 






275 










280 










285 








Val 


His 


Val 


Asp 


Cys 


He 


Met 


Leu 


Ala 


His 


Asn 


Pro 


Gly 


Gly 


Lys 


Glu 




290 










295 










300 










Arg 


Thr 


Glu 


Lys 


Glu 


Phe 


Glu 


Ala 


Leu 


Ala 


Lys 


Ala 


Ser 


Gly 


Phe 


Lys 


305 










310 










315 










320 


Gly 


He 


Lys 


Val 


Val 


Cys 


Asp 


Ala 


Phe 


Gly Val 


Asn 


Leu 


He 


Glu 


Leu 










325 










330 










335 





Leu Lys Lys Leu 
340 

(2} INFORMATION FOR SEQ ID NO: 35 6: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 330 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..330 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583481 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 356: 
Met Ala Leu Lys Ser Ala Leu Glu Leu Asp Leu Leu Glu lie Met Ala 
15 10 15 

Lys Asn Gly Ser Pro Met Ser Pro Thr Glu lie Ala Ser Lys Leu Pro 

20 25 30 

Thr Lys Asn Pro Glu Ala Pro Val Met Leu Asp Arg lie Leu Arg Leu 

35 40 45 

Leu Thr Ser Tyr Ser Val Leu Thr Cys Ser Asn Arg Lys Leu Ser Gly 

50 55 60 

Asp Gly Val Glu Arg lie Tyr Gly Leu Gly Pro Val Cys Lys Tyr Leu 
65 70 75 80 

Thr Lys Asn Glu Asp Gly Val Ser lie Ala Ala Leu Cys Leu Met Asn 

85 90 95 

Gin Asp Lys Val Leu Met Glu Ser Trp Tyr His Leu Lys Asp Ala lie 

100 105 110 

Leu Asp Gly Gly lie Pro Phe Asn Lys Ala Tyr Gly Met Ser Ala Phe 

115 120 125 

Glu Tyr His Gly Thr Asp Pro Arg Phe Asn Lys Val Phe Asn Asn Gly 

130 135 140 

Met Ser Asn His Ser Thr lie Thr Met Lys Lys lie Leu Glu Thr Tyr 
145 150 155 160 

Lys Gly Phe Glu Gly Leu Thr Ser Leu Val Asp Val Gly Gly Gly lie 

165 170 175 

Gly Ala Thr Leu Lys Met lie Val Ser Lys Tyr Pro Asn Leu Lys Gly 

180 185 190 

lie Asn Phe Asp Leu Pro His Val lie Glu Asp Ala Pro Ser His Pro 

195 200 205 

Gly lie Glu His Val Gly Gly Asp Met Phe Val Ser Val Pro Lys Gly 

210 215 220 

Asp Ala lie Phe Met Lys Trp lie Cys His Asp Trp Ser Asp Glu His 
225 230 235 240 

Cys Val Lys Phe Leu Lys Asn Cys Tyr Glu Ser Leu Pro Glu Asp Gly 

245 250 255 

Lys Val lie Leu Ala Glu Cys lie Leu Pro Glu Thr Pro Asp Ser Ser 

260 265 270 

Leu Ser Thr Lys Gin Val Val His Val Asp Cys lie Met Leu Ala His 

275 280 285 

Asn Pro Gly Gly Lys Glu Arg Thr Glu Lys Glu Phe Glu Ala Leu Ala 

290 295 300 

Lys Ala Ser Gly Phe Lys Gly lie Lys Val Val Cys Asp Ala Phe Gly 
305 310 315 320 

Val Asn Leu lie Glu Leu Leu Lys Lys Leu 
325 330 
(2) INFORMATION FOR SEQ ID NO: 357: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1591 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1591 
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(D) OTHER INFORMATION: / Ceres Seq. ID 1583482 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 357: 

ctcttgtctt ttgtctctcc accaattttt tcttgttctt tctctctcca ccacataaaa 60 

aaaaaaaaac ctagctttgt cccctcaact cactgattga actgcttgat tttcgattga 120 

tcatctgggt ggttttggat cgaagagtat tgttgtatta gtggctggtg gctctccaaa 180 

agagtaaggc cggagagaga aatcaatggc ctctggcggc ggagaggcgg ataaatcact 24 0 

tgaaatcggg tccgggaccg cggatcccaa aataggcggt actgggagca ggagcgccgg 300 

agaagaacga tacttcaggg cagatacact ggatttcagt aaatgggatt tgcatatggg 360 

tcaaacctct actagcagcg tcctcaccaa ttccgcttcc acgagcgctc ccgcaccggc 420 

gatgcaggaa tgggagattg acctctccaa actcgatatg aagcacgtcc tcgctcacgg 480 

tacttacggc actgtctacc gcggtgtcta cgccggccaa gaagtcgcag tgaaagtgtt 540 

agattgggga gaagatggtt acgccacacc agctgaaact acaactctcc gtgcttcctt 600 

cgagcaagag gtcgccgtct ggcagaagct cgatcatccc aacgttacca agttcatagg 660 

agcatccatg ggaacctctg atctgcggat ccctcctgct ggtgatactg gcggacgtgg 720 

taacggtgca catcctgcga gggcctgttg tgttgtggtt gaatatgttg ccggaggcac 780 

gcttaagaag ttcctcatca agaaatatag ggccaaacta cccatcaagg atgtcattca 840 

gctcgctttg gatctcgcta gagggcttag ttacctccac tccaaggcga ttgtacatag 900 

ggacgtgaag tcagagaaca tgctgttaca gcctaacaag acgctgaaga tcgctgattt 960 

cggggtagct agagttgaag ctcagaaccc tcaagacatg acgggtggaa ctggaacact 1020 

tggatacatg gcaccagagg ttcttgaagg aaagccttac aacaggaaat gcgatgtcta 1080 

tagctttggg gtatgcctct gggAaatata ctgctgtgac atgccctatg ctgactgtag 1140 

ttttgctgag atctctcacg ccgttgttca taggaatctg agaccagaga ttccgaaatg 1200 

ctgcccgcat gcagtggcaa acatcatgaa gagatgctgg gacccgaatc cagacaggcg 1260 

tccggagatg gaggaggtgg tgaagctgct tgaagccata gacacaagca aaggtggtgg 1320 

aatgatagct ccggaccagt ttcaggggtg cctctgtttc ttcaaacctc gaggcccctg 1380 

aatctctctc cctctctttc ctttttgctc cgtgtctgat atattcttga gagctgcgtg 1440 

attctttgga ttttgtattt actttgagct atgggagttg gattggtgtg ggttttgtca 1500 

taagaatctt tctgcgctct atgtatttat atacttaaca cagtcgtgta taattcgatt 1560 



aagctttatt ttattttttg atgttgattc c 
(2) INFORMATION FOR SEQ ID NO: 358: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH; 391 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME / KEY : peptide 

(B) LOCATION: 1..391 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583483 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 358: 



Met 


Ala 


Ser 


Gly 


Gly 


Gly 


Glu 


Ala 


Asp 


Lys 


Ser 


Leu 


Glu 


lie 


Gly 


Ser 


1 








5 










10 










15 




Gly 


Thr 


Ala 


Asp 


Pro 


Lys 


lie 


Gly 


Gly 


Thr 


Gly 


Ser 


Arg 


Ser 


Ala 


Gly 








20 










25 










30 






Glu 


Glu 


Arg 


Tyr 


Phe 


Arg 


Ala 


Asp 


Thr 


Leu 


Asp 


Phe 


Ser 


Lys 


Trp 


Asp 






35 










40 










45 








Leu 


His 


Met 


Gly 


Gin 


Thr 


Ser 


Thr 


Ser 


Ser 


Val 


Leu 


Thr 


Asn 


Ser 


Ala 




50 










55 










60 










Ser 


Thr 


Ser 


Ala 


Pro 


Ala 


Pro 


Ala 


Met 


Gin 


Glu 


Trp 


Glu 


He 


Asp 


Leu 


65 










70 










75 










80 


Ser 


Lys 


Leu 


Asp 


Met 


Lys 


His 


Val 


Leu 


Ala 


His 


Gly 


Thr 


Tyr 


Gly 


Thr 








85 










90 










95 




Val 


Tyr 


Arg 


Gly 


Val 


Tyr 


Ala 


Gly 


Gin 


Glu 


Val 


Ala 


Val 


Lys 


Val 


Leu 








100 










105 










110 






Asp 


Trp 


Gly 


Glu 


Asp 


Gly 


Tyr 


Ala 


Thr 


Pro 


Ala 


Glu 


Thr 


Thr 


Thr 


Leu 






115 










120 










125 








Arg 


Ala 


Ser 


Phe 


Glu 


Gin 


Glu 


Val 


Ala 


Val 


Trp 


Gin 


Lys 


Leu 


Asp 


His 




130 










135 










140 










Pro 


Asn 


Val 


Thr 


Lys 


Phe 


lie 


Gly 


Ala 


Ser 


Met 


Gly 


Thr 


Ser 


Asp 


Leu 



145 150 155 160 
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Arg 


lie 


Pro 


Pro 


Ala 

165 


Gly 


Asp 


Thr 


Gly 


Gly 
170 


Arg 


Gly 


Asn 


Gly 


Ala 
175 


His 


Pro 


Ala 


Arg 


Ala 
180 


Cys 


Cys 


Val 


Val 


Val 
185 


Glu 


Tyr 


Val 


Ala 


Gly 
190 


Gly 


Thr 


Leu 


Lys 


Lys 
195 


Phe 


Leu 


He 


Lys 


Lys 
200 


Tyr 


Arg 


Ala 


Lys 


Leu 
205 


Pro 


He 


Lys 


Asp 


Val 
210 


He 


Gin 


Leu 


Ala 


Leu 
215 


Asp 


Leu 


Ala 


Arg 


Gly 
220 


Leu 


Ser 


Tyr 


Leu 


His 


Ser 


Lys 


Ala 


He 


Val 


His 


Arg 


Asp 


Val 


Lys 


Ser 


Glu 


Asn 


Met 


Leu 


225 










230 










235 










240 


Leu 


Gin 


Pro 


Asn 


Lys 
245 


Thr 


Leu 


Lys 


He 


Ala 

250 


Asp 


Phe 


Gly 


Val 


Ala 
255 


Arg 


Val 


Glu 


Ala 


Gin 
260 


Asn 


Pro 


Gin 


Asp 


Met 
265 


Thr 


Gly 


Gly 


Thr 


Gly 
270 


Thr 


Leu 


Gly 


Tyr 


Met 
275 


Ala 


Pro 


Glu 


Val 


Leu 
280 


Glu 


Gly 


Lys 


Pro 


Tyr 
285 


Asn 


Arg 


Lys 


Cys 


Asp 
290 


Val 


Tyr 


Ser 


Phe 


Gly 
295 


Val 


Cys 


Leu 


Trp 


Glu 
300 


He 


Tyr 


Cys 


Cys 


Asp 


Met 


Pro 


Tyr 


Ala 


Asp 


Cys 


Ser 


Phe 


Ala 


Glu 


He 


Ser 


His 


Ala 


Val 


305 










310 










315 










320 


Val 


His 


Arg 


Asn 


Leu 

325 


Arg 


Pro 


Glu 


He 


Pro 
330 


Lys 


Cys 


Cys 


Pro 


His 
335 


Ala 


Val 


Ala 


Asn 


He 
340 


Met 


Lys 


Arg 


Cys 


Trp 
345 


Asp 


Pro 


Asn 


Pro 


Asp 
350 


Arg 


Arg 


Pro 


Glu 


Met 
355 


Glu 


Glu 


Val 


Val 


Lys 
360 


Leu 


Leu 


Glu 


Ala 


He 
365 


Asp 


Thr 


Ser 


Lys 


Gly 
370 


Gly 


Gly 


Met 


He 


Ala 
375 


Pro 


Asp 


Gin 


Phe 


Gin 
380 


Gly 


Cys 


Leu 


Cys 


Phe 


Phe 


Lys 


Pro 


Arg 


Gly 


Pro 





















385 390 
(2) INFORMATION FOR SEQ ID NO: 359: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 341 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 341 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583484 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 9: 



Met 


Gly 


Gin 


Thr 


Ser 


Thr 


Ser 


Ser 


Val 


Leu 


Thr 


Asn 


Ser 


Ala 


Ser 


Thr 


1 








5 










10 










15 




Ser 


Ala 


Pro 


Ala 


Pro 


Ala 


Met 


Gin 


Glu 


Trp 


Glu 


He 


Asp 


Leu 


Ser 


Lys 








20 










25 










30 






Leu 


Asp 


Met 


Lys 


His 


Val 


Leu 


Ala 


His 


Gly 


Thr 


Tyr 


Gly 


Thr 


Val 


Tyr 






35 










40 










45 








Arg 


Gly 


Val 


Tyr 


Ala 


Gly 


Gin 


Glu 


Val 


Ala 


Val 


Lys 


Val 


Leu 


Asp 


Trp 




50 










55 










60 










Gly 


Glu 


Asp 


Gly 


Tyr 


Ala 


Thr 


Pro 


Ala 


Glu 


Thr 


Thr 


Thr 


Leu 


Arg 


Ala 


65 










70 










75 










80 


Ser 


Phe 


Glu 


Gin 


Glu 


Val 


Ala 


Val 


Trp 


Gin 


Lys 


Leu 


Asp 


His 


Pro 


Asn 










85 










90 










95 




Val 


Thr 


Lys 


Phe 


He 


Gly 


Ala 


Ser 


Met 


Gly 


Thr 


Ser 


Asp 


Leu 


Arg 


He 








100 










105 










110 






Pro 


Pro 


Ala 


Gly 


Asp 


Thr 


Gly 


Gly 


Arg 


Gly 


Asn 


Gly 


Ala 


His 


Pro 


Ala 






115 










120 










125 








Arg 


Ala 


Cys 


Cys 


Val 


Val 


Val 


Glu 


Tyr 


Val 


Ala 


Gly 


Gly 


Thr 


Leu 


Lys 




130 










135 










140 










Lys 


Phe 


Leu 


He 


Lys 


Lys 


Tyr 


Arg 


Ala 


Lys 


Leu 


Pro 


He 


Lys 


Asp 


Val 
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145 










150 










155 










160 


He 


Gin 


Leu 


Ala 


Leu 
165 


Asp 


Leu 


Ala 


Arg 


Gly 
170 


Leu 


Ser 


Tyr 


Leu 


His 
175 


Ser 


Lys 


Ala 


He 


Val 


His 


Arg 


Asp 


Val 


Lys 


Ser 


Glu 


Asn 


Met 


Leu 


Leu 


Gin 






180 










185 










190 






Pro 


Asn 


Lys 
195 


Thr 


Leu 


Lys 


He 


Ala 
200 


Asp 


Phe 


Gly 


Val 


Ala 
205 


Arg 


Val 


Glu 


Ala 


Gin 
210 


Asn 


Pro 


Gin 


Asp 


Met 
215 


Thr 


Gly 


Gly 


Thr 


Gly 
220 


Thr 


Leu 


Gly 


Tyr 


Met 


Ala 


Pro 


Glu 


Val 


Leu 


Glu 


Gly 


Lys 


Pro 


Tyr 


Asn 


Arg 


Lys 


Cys 


A sp 


225 










230 










235 










240 


Val 


Tyr 


Ser 


Phe 


Gly 
245 


Val 


Cys 


Leu 


Trp 


Glu 
250 


He 


Tyr 


Cys 


Cys 


Asp 
255 


Met 


Pro 


Tyr 


Ala 


Asp 


Cys 


Ser 


Phe 


Ala 


Glu 


He 


Ser 


His 


Ala 


Val 


Val 


His 






260 










265 










270 






Arg 


Asn 


Leu 
275 


Arg 


Pro 


Glu 


He 


Pro 
280 


Lys 


Cys 


Cys 


Pro 


His 

285 


Ala 


Val 


Ala 


Asn 


He 
290 


Met 


Lys 


Arg 


Cys 


Trp 
295 


Asp 


Pro 


Asn 


Pro 


Asp 
300 


Arg 


Arg 


Pro 


Glu 


Met 


Glu 


Glu 


Val 


Val 


Lys 


Leu 


Leu 


Glu 


Ala 


He 


Asp 


Thr 


Ser 


Lys 


Gly 


305 










310 










315 










320 


Gly 


Gly 


Met 


He 


Ala 
325 


Pro 


Asp 


Gin 


Phe 


Gin 
330 


Gly 


Cys 


Leu 


Cys 


Phe 
335 


Phe 


Lys 


Pro 


Arg 


Gly 


Pro 

























340 



(2) INFORMATION FOR SEQ ID NO: 360: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 319 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..319 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583485 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 360: 



Met 


Gin 


Glu 


Trp 


Glu 


He 


Asp 


Leu 


Ser 


Lys 


Leu 


Asp 


Met 


Lys 


His 


Val 


1 








5 










10 










15 




Leu 


Ala 


His 


Gly 


Thr 


Tyr 


Gly 


Thr 


Val 


Tyr 


Arg 


Gly 


Val 


Tyr 


Ala 


Gly 








20 










25 










30 






Gin 


Glu 


Val 


Ala 


Val 


Lys 


Val 


Leu 


Asp 


Trp 


Gly 


Glu 


Asp 


Gly 


Tyr 


Ala 






35 










40 










45 








Thr 


Pro 


Ala 


Glu 


Thr 


Thr 


Thr 


Leu 


Arg 


Ala 


Ser 


Phe 


Glu 


Gin 


Glu 


Val 




50 










55 










60 










Ala 


Val 


Trp 


Gin 


Lys 


Leu 


Asp 


His 


Pro 


Asn 


Val 


Thr 


Lys 


Phe 


He 


Gly 


65 










70 










75 










80 


Ala 


Ser 


Met 


Gly 


Thr 


Ser 


Asp 


Leu 


Arg 


He 


Pro 


Pro 


Ala 


Gly 


Asp 


Thr 










85 










90 










95 




Gly 


Gly 


Arg 


Gly 


Asn 


Gly Ala 


His 


Pro 


Ala 


Arg 


Ala 


Cys 


Cys 


Val 


Val 








100 










105 










110 






Val 


Glu 


Tyr 


Val 


Ala 


Gly 


Gly 


Thr 


Leu 


Lys 


Lys 


Phe 


Leu 


He 


Lys 


Lys 






115 










120 










125 








Tyr 


Arg 


Ala 


Lys 


Leu 


Pro 


He 


Lys 


Asp 


Val 


He 


Gin 


Leu 


Ala 


Leu 


Asp 




130 










135 










140 










Leu 


Ala 


Arg 


Gly 


Leu 


Ser 


Tyr 


Leu 


His 


Ser 


Lys 


Ala 


He 


Val 


His 


Arg 


145 










150 










155 










160 


Asp 


Val 


Lys 


Ser 


Glu 


Asn 


Met 


Leu 


Leu 


Gin 


Pro 


Asn 


Lys 


Thr 


Leu 


Lys 








165 










170 










175 




He 


Ala 


Asp 


Phe 


Gly 


Val 


Ala 


Arg 


Val 


Glu 


Ala 


Gin 


Asn 


Pro 


Gin 


Asp 



180 185 190 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 2 
Page 215 



Met 


Thr 


Gly 


Gly 


Thr 


Gly 


Thr 


Leu 


Gly 


Tyr 


Met 


Ala 


Pro 


Glu 


Val 


Leu 






195 










200 










205 








Glu 


Gly 


Lys 


Pro 


Tyr 


Asn 


Arg 


Lys 


Cys 


Asp 


Val 


Tyr 


Ser 


Phe 


Gly 


Val 




210 










215 










220 










Cys 


Leu 


Trp 


Glu 


He 


Tyr 


Cys 


Cys 


Asp 


Met 


Pro 


Tyr 


Ala 


Asp 


Cys 


Ser 


225 








230 










235 










240 


Phe 


Ala 


Glu 


He 


Ser 


His 


Ala 


Val 


Val 


His 


Arg 


Asn 


Leu 


Arg 


Pro 


Glu 










245 










250 










255 




He 


Pro 


Lys 


Cys 


Cys 


Pro 


His 


Ala 


Val 


Ala 


Asn 


He 


Met 


Lys 


Arg 


Cys 








260 










265 










270 






Trp 


Asp 


Pro 


Asn 


Pro 


Asp 


Arg 


Arg 


Pro 


Glu 


Met 


Glu 


Glu 


Val 


Val 


Lys 






275 










280 










285 








Leu 


Leu 


Glu 


Ala 


He 


Asp 


Thr 


Ser 


Lys 


Gly 


Gly 


Gly 


Met 


He 


Ala 


Pro 




290 










295 










300 










Asp 


Gin 


Phe 


Gin 


Gly 


Cys 


Leu 


Cys 


Phe 


Phe 


Lys 


Pro 


Arg 


Gly 


Pro 




305 










310 










315 












(2) 


INFORMATION 


FOR 


SEQ 


ID NO:361: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 496 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 



(xi) 



1. .496 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583521 
SEQUENCE DESCRIPTION: SEQ ID NO: 361: 
attgttttaa ggcaaattaa gatcatcttc aaaatcttct cagatctctt ccaattttct 
agaaaaaaca tgtcttgctg tggtggaagc tgtggttgtg gatctgcctg caagtgcggc 
aatggttgcg gaggttgcaa aaggtaccct gacttggaga acaccgccac cgagactctt 
gtcctcggtg ttgctccggc gatgaactct cagtacgagg cttccggcga gactttcgtt 
gccgagaatg atgcctgcaa atgcggatct gactgcaagt gcaacccttg tacctgcaaa 
tgaagaactt cataaaccct aagtctgtaa taaccctaat gttatgttag gtttgcttat 
atgtaataat tggctgattt ttccggtagt tttgccggcg acgttggtct ttctcttctt 
cttcttcttc tgtgtgtgtt tttgtttcca ttcttaagca catacacaaa catgctcaag 
tgaaaaacca ctaaac 

(2) INFORMATION FOR SEQ ID NO: 3 62: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 100 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..100 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583522 
SEQUENCE DESCRIPTION: SEQ ID NO: 362: 
He Val Leu Arg Gin He Lys He He Phe Lys He Phe Ser Asp Leu 
15 10 15 

Phe Gin Phe Ser Arg Lys Asn Met Ser Cys Cys Gly Gly Ser Cys Gly 

20 25 30 

Cys Gly Ser Ala Cys Lys Cys Gly Asn Gly Cys Gly Gly Cys Lys Arg 

35 40 45 

Tyr Pro Asp Leu Glu Asn Thr Ala Thr Glu Thr Leu Val Leu Gly Val 

50 55 60 

Ala Pro Ala Met Asn Ser Gin Tyr Glu Ala Ser Gly Glu Thr Phe Val 
65 70 75 80 

Ala Glu Asn Asp Ala Cys Lys Cys Gly Ser Asp Cys Lys Cys Asn Pro 
85 90 95 



(ii) 
(ix) 



(xi) 



60 
120 
180 
240 
300 
360 
420 
480 
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Cys Thr Cys Lys 

100 

(2) INFORMATION FOR SEQ ID NO: 3 63: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 77 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE : 

(A) NAME/KEY: peptide 
<B) LOCATION: 1..77 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583523 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 363: 



Met 


Ser 


Cys 


Cys 


Gly 


Gly 


Ser 


Cys 


Gly 


Cys 


Gly 


Ser 


Ala 


Cys 


Lys 


Cys 


1 








5 










10 










15 




Gly Asn 


Gly 


Cys 


Gly 


Gly 


Cys 


Lys 


Arg 


Tyr 


Pro 


Asp 


Leu 


Glu 


Asn 


Thr 








20 










25 










30 






Ala 


Thr 


Glu 
35 


Thr 


Leu 


Val 


Leu 


Gly 
40 


Val 


Ala 


Pro 


Ala 


Met 
45 


Asn 


Ser 


Gin 


Tyr 


Glu 


Ala 


Ser 


Gly 


Glu 


Thr 


Phe 


Val 


Ala 


Glu 


Asn 


Asp 


Ala 


Cys 


Lys 


50 










55 










60 










Cys 


Gly 


Ser 


Asp 


Cys 


Lys 


Cys 


Asn 


Pro 


Cys 


Thr 


Cys 


Lys 








65 










70 










75 












(2) 


INFORMATION 


FOR 


SEQ 


ID ' 


NO:364 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 317 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..317 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583528 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 364: 
aaattgaaga atttatcaaa gtcattttca ccctattcca agatcttgat ttcactttct 
tggagatgaa tcctttcact ctagttgatg gaagtcctta tcctctggat atgaggggtg 
agcttgatga tactgctgcc ttcaaaaact ttaaaaaatg gggcgacatt gaatttcctc 
tgccatttgg aagagtaatg agtcctacag aaagctttat ccacggactg gatgagaaga 
caagtgcgtc tttgaagttt accgttctga accccaaggg acggatttgg acaatggtag 
ctggtggagg agtatcg 

(2) INFORMATION FOR SEQ ID NO: 3 65: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 105 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..105 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583529 





(xi) 


SEQUENCE DESCRIPTION: SEQ : 


ED NC 


>:365 












He 


Glu 


Glu 


Phe He Lys 


Val 


He Phe 


Thr 


Leu 


Phe 


Gin 


Asp 


Leu 


Asp 


1 






5 






10 










15 




Phe 


Thr 


Phe 


Leu Glu Met 


Asn 


Pro Phe 


Thr 


Leu 


Val 


Asp 


Gly 


Ser 


Pro 








20 




25 










30 






Tyr 


Pro 


Leu 


Asp Met Arg 


Gly 


Glu Leu 


Asp 


Asp 


Thr 


Ala 


Ala 


Phe 


Lys 




35 






40 








45 








Asn 


Phe 


Lys 


Lys Trp Gly 


Asp 


He Glu 


Phe 


Pro 


Leu 


Pro 


Phe 


Gly 


Arg 
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50 55 60 

Val Met Ser Pro Thr Glu Ser Phe lie His Gly Leu Asp Glu Lys Thr 
65 70 75 80 

Ser Ala Ser Leu Lys Phe Thr Val Leu Asn Pro Lys Gly Arg lie Trp 

85 90 95 

Thr Met Val Ala Gly Gly Gly Val Ser 
100 105 
(2) INFORMATION FOR SEQ ID NO: 366: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..84 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583530 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 3 66 












Met 


Asn 


Pro 


Phe 


Thr 


Leu 


Val Asp Gly 


Ser 


Pro 


Tyr 


Pro 


Leu 


Asp 


Met 


1 








5 






10 










15 




Arg 


Gly 


Glu 


Leu 


Asp 


Asp 


Thr Ala Ala 


Phe 


Lys 


Asn 


Phe 


Lys 


Lys 


Trp 






20 






25 










30 






Gly Asp 


He 


Glu 


Phe 


Pro 


Leu Pro Phe 


Gly 


Arg 


Val 


Met 


Ser 


Pro 


Thr 






35 








40 








45 








Glu 


Ser 


Phe 


He 


His 


Gly 


Leu Asp Glu 


Lys 


Thr 


Ser 


Ala 


Ser 


Leu 


Lys 




50 










55 






60 










Phe 


Thr 


Val 


Leu 


Asn 


Pro 


Lys Gly Arg 


He 


Trp 


Thr 


Met 


Val 


Ala 


Gly 


65 










70 






75 










80 


Gly 


Gly 


Val 


Ser 






















(2) 


INFORMATION 


FOR 


SEQ 


ID NO:367: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..69 

(D) OTHER INFORMATION: / Ceres Seq* ID 1583531 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 67: 
Met Arg Gly Glu Leu Asp Asp Thr Ala Ala Phe Lys Asn Phe Lys Lys 
15 10 15 

Trp Gly Asp He Glu Phe Pro Leu Pro Phe Gly Arg Val Met Ser Pro 

20 25 30 

Thr Glu Ser Phe He His Gly Leu Asp Glu Lys Thr Ser Ala Ser Leu 

35 40 45 

Lys Phe Thr Val Leu Asn Pro Lys Gly Arg He Trp Thr Met Val Ala 

50 55 60 

Gly Gly Gly Val Ser 
65 

(2) INFORMATION FOR SEQ ID NO: 368: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1228 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 
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Ala 


Ser 


Val 


Ala 


Thr 


Ala 


Ser 


Ala 


Lys 


Glu 


Trp 


Thr 


Asn 


Asn 


He 


Glu 




50 










55 










60 










Met 


Gin 


Phe 


Arg 


Lys 


Lys 


He 


Glu 


Met 


Leu 


Ser 


Lys 


He 


Asn 


His 


Lys 


65 










70 










75 










80 


Asn 


Phe 


Val 


Asn 


Leu 


Leu 


Gly 


Tyr 


Cys 


Glu 


Glu 


Glu 


Glu 


Pro 


Phe 


Thr 










85 










90 










95 




Arg 


He 


Leu 


Val 


Phe 


Glu 


Tyr 


Ala 


Ser 


Asn 


Gly 


Thr 


Val 


Phe 


Glu 


His 






100 










105 










110 






Leu 


His 


Tyr 


Lys 


Glu 


Ser 


Glu 


His 


Leu 


Asp 


Trp 


Val 


Met 


Arg 


Leu 


Arg 






115 










120 










125 








He 


Ala 


Met 


Gly 


He 


Ala 


Tyr 


Cys 


Leu 


Asp 


His 


Met 


His 


Gly 


Leu 


Lys 




130 










135 










140 










Pro 


Pro 


He 


Val 


His 


Ser 


Asn 


Leu 


Leu 


Ser 


Ser 


Ser 


Val 


Gin 


Leu 


Thr 


145 










150 










155 










160 


Glu 


Asp 


Tyr 


Ala 


Val 


Lys 


He 


Ala 


Asp 


Phe 


Asn 


Phe 


Gly 


Tyr 


Leu 


Lys 










165 










170 










175 




Gly 


Pro 


Ser 


Glu 


Thr 


Glu 


Ser 


Ser 


Thr 


Asn 


Ala 


Leu 


He 


Asp 


Thr 


Asn 






180 










185 










190 







60 



240 
300 
360 
420 
480 
540 
600 
660 



840 
900 
960 



(A) NAME /KEY : - 

(B) LOCATION: 1..1228 

<D) OTHER INFORMATION: / Ceres Seq. ID 1583561 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 368: 
ttcagaaagt gttcataact ggtgtcccca aactcaaaag atctgagatt gaagctgctt 
gtgaagattt tagtaatgtc attgggtctt gccccattgg tacattgttc aaagggacgc 120 
tatcaagtgg ggtggagata gctgtggctt ctgttgctac tgcgtctgcc aaagaatgga 180 
caaataacat agaaatgcag ttcagaaaga agatcgaaat gttatccaag ataaaccaca 
agaattttgt caaccttctt ggttactgtg aagaagaaga acctttcact aggatcttgg 
tctttgaata tgcatcaaac ggaacagtct ttgaacattt acactataaa gaatcagaac 
acttggactg ggtaatgcgg ctaagaatag ccatgggcat agcttattgc cttgaccata 
tgcacggact caagccacct atagtccaca gcaatcttct ctcatcatca gttcaactca 
cagaggacta tgcagtcaaa attgcggatt tcaattttgg atacctaaaa ggcccatccg 
agacagaaag cagcaccaat gcactcatag atacaaacat ctcagaaaca acacaagaag 
acaatgttca cagcttcggg ttgctgttgt ttgaactgat gacaggaaaa ctcccggagt 
cggttaaaaa aggcgactcg atagataccg gattggctgt cttcttgaga ggaaagacat 720 
taagggagat ggtggatccg acaattgaaa gctttgacga gaagattgag aatataggtg 780 
aagtgatcaa aagctgcata agagcagacg cgaaacagag accgataatg aaggaagtca 
cagggagatt acgagagatc actggattat caccagacga cactatccca aaactttcac 
cgctctggtg ggcagagctg gaagttctgt ycactgcgtg aagagacaac tactgaactt 
cacaaaaaaa tctgtaagta ttaatatgaa gatttgagtg aggtttttga gtctcttaga 1020 
agctcttggc ttctctttag gctactttca tctatcaatc tatatataag tatggaactt 1080 
agtttatata acggctttaa aattcggtgg atctatttgg tcgtttatgt cgtaaacaga 1140 
aaacggtaat accccttttt ggtgtgcata atgttggtgg gtggttactg tctttgtaaa 1200 
ttagtgttac tatgttcttg atatgttt 
(2) INFORMATION FOR SEQ ID NO: 369: 
(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 312 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/ KEY : peptide 

(B) LOCATION: 1..312 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583562 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:369: 
Gin Lys Val Phe He Thr Gly Val Pro Lys Leu Lys Arg Ser Glu He 
15 10 15 

Glu Ala Ala Cys Glu Asp Phe Ser Asn Val He Gly Ser Cys Pro He 

20 25 30 

Gly Thr Leu Phe Lys Gly Thr Leu Ser Ser Gly Val Glu He Ala Val 
35 40 45 
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He 


Ser 


Glu 


Thr 


Thr 


Gin 


Glu 


Asp 


Asn 


Val 


His 


Ser 


Phe 


Gly 


Leu 


Leu 






195 










200 










205 








Leu 


Phe 


Glu 


Leu 


Met 


Thr 


Gly 


Lys 


Leu 


Pro 


Glu 


Ser 


Val 


Lys 


Lys 


Gly 




210 










215 










220 










Asp 


Ser 


He 


Asp 


Thr 


Gly 


Leu 


Ala 


Val 


Phe 


Leu 


Arg 


Gly 


Lys 


Thr 


Leu 


225 










230 










235 










240 


Arg 


Glu 


Met 


Val 


Asp 


Pro 


Thr 


He 


Glu 


Ser 


Phe 


Asp 


Glu 


Lys 


He 


Glu 








245 










250 










255 




Asn 


He 


Gly 


Glu 


Val 


He 


Lys 


Ser 


Cys 


He 


Arg 


Ala 


Asp 


Ala 


Lys 


Gin 








260 










265 










270 






Arg 


Pro 


He 


Met 


Lys 


Glu 


Val 


Thr 


Gly 


Arg 


Leu 


Arg 


Glu 


He 


Thr 


Gly 






275 










280 










285 








Leu 


Ser 


Pro 


Asp 


Asp 


Thr 


He 


Pro 


Lys 


Leu 


Ser 


Pro 


Leu 


Trp 


Trp 


Ala 




290 










295 










300 










Glu 


Leu 


Glu 


Val 


Leu 


Xaa 


Thr 


Ala 



















305 310 
(2) INFORMATION FOR SEQ ID NO: 370: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/ KEY : peptide 

(B) LOCATION: 1..248 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583563 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 370: 



Met 


Gin 


Phe 


Arg 


Lys 


Lys 


He 


Glu 


Met 


Leu 


Ser 


Lys 


He 


Asn 


His 


Lys 


1 








5 










10 










15 




Asn 


Phe 


Val 


Asn 
20 


Leu 


Leu 


Gly 


Tyr 


Cys 
25 


Glu 


Glu 


Glu 


Glu 


Pro 
30 


Phe 


Thr 


Arg 


He 


Leu 


Val 


Phe 


Glu 


Tyr 


Ala 


Ser 


Asn 


Gly 


Thr 


Val 


Phe 


Glu 


His 




35 










40 










45 








Leu 


His 
50 


Tyr 


Lys 


Glu 


Ser 


Glu 
55 


His 


Leu 


Asp 


Trp 


Val 
60 


Met 


Arg 


Leu 


Arg 


He 


Ala 


Met 


Gly 


He 


Ala 


Tyr 


Cys 


Leu Asp 


His 


Met 


His 


Gly 


Leu 


Lys 


65 










70 










75 










80 


Pro 


Pro 


He 


Val 


His 
85 


Ser 


Asn 


Leu 


Leu 


Ser 
90 


Ser 


Ser 


Val 


Gin 


Leu 
95 


Thr 


Glu 


Asp 


Tyr 


Ala 


Val 


Lys 


He 


Ala 


Asp 


Phe 


Asn 


Phe 


Gly 


Tyr 


Leu 


Lys 






100 










105 










110 






Gly 


Pro 


Ser 


Glu 


Thr 


Glu 


Ser 


Ser 


Thr 


Asn 


Ala 


Leu 


He 


Asp 


Thr 


Asn 




115 










120 










125 








He 


Ser 

130 


Glu 


Thr 


Thr 


Gin 


Glu 

135 


Asp 


Asn 


Val 


His 


Ser 
140 


Phe 


Gly 


Leu 


Leu 


Leu 


Phe 


Glu 


Leu 


Met 


Thr 


Gly 


Lys 


Leu 


Pro 


Glu 


Ser 


Val 


Lys 


Lys 


Gly 


145 










150 










155 










160 


Asp 


Ser 


He 


Asp 


Thr 


Gly 


Leu 


Ala 


Val 


Phe 


Leu 


Arg 


Gly 


Lys 


Thr 


Leu 






165 










170 










175 




Arg 


Glu 


Met 


Val 


Asp 


Pro 


Thr 


He 


Glu 


Ser 


Phe 


Asp 


Glu 


L Y S 


He 


Glu 






180 










185 










190 






Asn 


He 


Gly 
195 


Glu 


Val 


He 


Lys 


Ser 
200 


Cys 


He 


Arg 


Ala 


Asp 
205 


Ala 


Lys 


Gin 


Arg 


Pro 


He 


Met 


Lys 


Glu 


Val 


Thr 


Gly Arg 


Leu 


Arg 


Glu 


He 


Thr 


Gly 




210 










215 










220 










Leu 


Ser 


Pro 


Asp 


Asp 


Thr 


He 


Pro 


Lys 


Leu 


Ser 


Pro 


Leu 


Trp 


Trp 


Ala 


225 








230 










235 










240 


Glu 


Leu 


Glu 


Val 


Leu 
245 


Xaa 


Thr 


Ala 


















(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:371: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 240 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

( D ) TOPOLOGY : linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..240 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583564 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 371: 



Met 


Leu 


Ser 


Lys 


He 


Asn 


His 


Lys 


Asn 


Phe 


Val 


Asn 


Leu 


Leu 


Gly 


Tyr 


1 






5 










10 










15 




Cys 


Glu 


Glu 


Glu 


Glu 


Pro 


Phe 


Thr 


Arg 


He 


Leu 


Val 


Phe 


Glu 


Tyr 


Ala 






20 










25 










30 






Ser 


Asn 


Gly 


Thr 


Val 


Phe 


Glu 


His 


Leu 


His 


Tyr 


Lys 


Glu 


Ser 


Glu 


His 






35 










40 










45 








Leu 


Asp 


Trp 


Val 


Met 


Arg 


Leu 


Arg 


He 


Ala 


Met 


Gly 


He 


Ala 


Tyr 


Cys 




50 










55 










60 










Leu 


Asp 


His 


Met 


His 


Gly 


Leu 


Lys 


Pro 


Pro 


He 


Val 


His 


Ser 


Asn 


Leu 


65 








70 










75 










80 


Leu 


Ser 


Ser 


Ser 


Val 


Gin 


Leu 


Thr 


Glu 


Asp 


Tyr 


Ala 


Val 


Lys 


He 


Ala 










85 










90 










95 




Asp 


Phe 


Asn 


Phe 


Gly 


Tyr 


Leu 


Lys 


Gly 


Pro 


Ser 


Glu 


Thr 


Glu 


Ser 


Ser 








100 










105 










110 






Thr 


Asn 


Ala 


Leu 


He 


Asp 


Thr 


Asn 


He 


Ser 


Glu 


Thr 


Thr 


Gin 


Glu 


Asp 






115 










120 










125 








Asn 


Val 


His 


Ser 


Phe 


Gly 


Leu 


Leu 


Leu 


Phe 


Glu 


Leu 


Met 


Thr 


Gly 


Lys 




130 










135 










140 










Leu 


Pro 


Glu 


Ser 


Val 


Lys 


Lys 


Gly 


Asp 


Ser 


He 


Asp 


Thr 


Gly 


Leu 


Ala 


145 










150 










155 










160 


Val 


Phe 


Leu 


Arg 


Gly 


Lys 


Thr 


Leu 


Arg 


Glu 


Met 


Val 


Asp 


Pro 


Thr 


He 










165 










170 










175 




Glu 


Ser 


Phe 


Asp 


Glu 


Lys 


He 


Glu 


Asn 


He 


Gly 


Glu 


Val 


He 


Lys 


Ser 








180 










185 










190 






Cys 


He 


Arg 


Ala 


Asp 


Ala 


Lys 


Gin 


Arg 


Pro 


He 


Met 


Lys 


Glu 


Val 


Thr 




195 










200 










205 








Gly 


Arg 


Leu 


Arg 


Glu 


He 


Thr 


Gly 


Leu 


Ser 


Pro 


Asp 


Asp 


Thr 


He 


Pro 




210 










215 










220 










Lys 


Leu 


Ser 


Pro 


Leu 


Trp 


Trp 


Ala 


Glu 


Leu 


Glu 


Val 


Leu 


Xaa 


Thr 


Ala 


225 










230 










235 










240 


(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO:372: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 723 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..723 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583637 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 372: 
atggctgacg aacagaatca acagaatggc cctgccaaca ttggtattgt agatgcacca 
cgagatcacc ttcagaggaa agaaattgca cctcctgcta tcttgaacaa caacttcgag 
attaagagtg gtctcatctc gatgattcag gggaacaaat tccatgatct gccaatggaa 
gatccactcg accaccttga tgaatttgat aggctttgca acctaacaaa aatcaatggt 
gttagtgaag acggattcaa gctccgtttg tttccattct ccttgggcga caaagcccac 
atctgggaga agaatctgcc ccacgactca atcaccacct gggatgattg caagaaggct 
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tttctagcaa agttcttctc caacgccaga actacaagac tcagaaatga gatttttggc 
ttttcaaaga agactgcaat gggaatcaca agaacaaaga tgttgaagaa tgctgggaat 
tggttgagaa cttggctcaa tcagatggta attacaacga ggactgagac aggaccatta 
gaggcacagt atcaggtcca agatggggag ggtaaccagt tagaagaagt tagctacatc 
aacaacaacc agggtggcta caaagggtat aacaacttca agaccaacaa ccccaacctc 
tcttaccgca gcaccaacgt tgctaaccca caggaccaag tgtatcctcc acagcaacaa 
tga 

INFORMATION FOR SEQ ID NO: 37 3 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 240 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 



(2) 



(ii) 
(ix) 



FEATURE: 

(A) NAME / KEY : 
<B) LOCATION: 



peptide 
1. .240 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583638 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 373: 
Met Ala Asp Glu Gin Asn Gin Gin Asn Gly Pro Ala Asn lie Gly lie 
15 10 15 

Val Asp Ala Pro Arg Asp His Leu Gin Arg Lys Glu He Ala Pro Pro 

20 25 30 

Ala He Leu Asn Asn Asn Phe Glu He Lys Ser Gly Leu He Ser Met 

35 40 45 

He Gin Gly Asn Lys Phe His Asp Leu Pro Met Glu Asp Pro Leu Asp 

50 55 60 

His Leu Asp Glu Phe Asp Arg Leu Cys Asn Leu Thr Lys He Asn Gly 
65 70 75 80 

Val Ser Glu Asp Gly Phe Lys Leu Arg Leu Phe Pro Phe Ser Leu Gly 

85 90 95 

Asp Lys Ala His He Trp Glu Lys Asn Leu Pro His Asp Ser He Thr 

100 105 HO 

Thr Trp Asp Asp Cys Lys Lys Ala Phe Leu Ala Lys Phe Phe Ser Asn 

115 120 125 

Ala Arg Thr Thr Arg Leu Arg Asn Glu He Phe Gly Phe Ser Lys Lys 

130 135 140 

Thr Ala Met Gly He Thr Arg Thr Lys Met Leu Lys Asn Ala Gly Asn 
145 150 155 160 

Trp Leu Arg Thr Trp Leu Asn Gin Met Val He Thr Thr Arg Thr Glu 

165 170 175 

Thr Gly Pro Leu Glu Ala Gin Tyr Gin Val Gin Asp Gly Glu Gly Asn 

180 185 190 

Gin Leu Glu Glu Val Ser Tyr He Asn Asn Asn Gin Gly Gly Tyr Lys 

195 200 205 

Gly Tyr Asn Asn Phe Lys Thr Asn Asn Pro Asn Leu Ser Tyr Arg Ser 

210 215 220 

Thr Asn Val Ala Asn Pro Gin Asp Gin Val Tyr Pro Pro Gin Gin Gin 
225 230 235 240 



420 
480 
540 
600 
660 
720 



(2) INFORMATION FOR SEQ ID NO: 37 4: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 193 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 193 
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(D) OTHER INFORMATION: / Ceres Seq. ID 1583640 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 374: 
Met lie Gin Gly Asn Lys Phe His Asp Leu Pro Met Glu Asp Pro Leu 
15 10 15 

Asp His Leu Asp Glu Phe Asp Arg Leu Cys Asn Leu Thr Lys lie Asn 

20 25 30 

Gly Val Ser Glu Asp Gly Phe Lys Leu Arg Leu Phe Pro Phe Ser Leu 

35 40 45 

Gly Asp Lys Ala His He Trp Glu Lys Asn Leu Pro His Asp Ser He 

50 55 60 

Thr Thr Trp Asp Asp Cys Lys Lys Ala Phe Leu Ala Lys Phe Phe Ser 
65 70 75 80 

Asn Ala Arg Thr Thr Arg Leu Arg Asn Glu He Phe Gly Phe Ser Lys 

85 90 95 

Lys Thr Ala Met Gly He Thr Arg Thr Lys Met Leu Lys Asn Ala Gly 

100 105 HO 

Asn Trp Leu Arg Thr Trp Leu Asn Gin Met Val He Thr Thr Arg Thr 

115 120 125 

Glu Thr Gly Pro Leu Glu Ala Gin Tyr Gin Val Gin Asp Gly Glu Gly 

130 135 140 

Asn Gin Leu Glu Glu Val Ser Tyr He Asn Asn Asn Gin Gly Gly Tyr 
145 150 155 160 

Lys Gly Tyr Asn Asn Phe Lys Thr Asn Asn Pro Asn Leu Ser Tyr Arg 

165 170 175 

Ser Thr Asn Val Ala Asn Pro Gin Asp Gin Val Tyr Pro Pro Gin Gin 
180 185 190 

Gin 

(2) INFORMATION FOR SEQ ID NO: 375: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1366 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1366 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583860 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 375: 
ttatattcgg tggtctaata agattcgtca gcttggtttt ccaatggcaa tggtacaaaa 
ccatagaaaa gtggagcaaa gaaaaacaag taaatatgtc ttccactcta cttccacaag 
gagagctctc aaatcatcct ccaattctcg gattatcaaa ctccaagccg tcgccgaaac 
tagctctgaa atcgaaagca gtagtgtcac tgagccaact gctttgactc tacgtcaaat 
ctgccaaggt tatgtgcccg agcatatctt gcacaggcga gcttcgtttc tctcckttta 
tttctgtaga attttgaatt ttgacaagta tttaaaattt cgaacttttt gtgaattgat 
gcagaatgga ggagattgga caggttcagg aaagacattg acttacctgc tactcatatt 
ctctcttata aaccctcaac gatcttctgt gcaagctgtt attgttgttc ccactcgaga 
gctcggtatg caagttacaa aagttgctcg aatgttggct gcaaaatcgg agattgatgt 540 
gaaaggatgt actgtgatgg ctctcttaga tggagggacg ctaagaaggc acaaaagctg 600 
gctaaaggct gagccaccag caattttggt tgctacggta gcaagtttgt gtcacatgct 660 
agaaaagcac atatttagaa ttgactcggt gagagttctt gttgtagatg aggttgattt 720 
cttattctac tcatcaaaac aagttggttc tgtgcggaag cttttgacat cattttcttc 780 
atgcgataaa cgtcagacag tttttgcaag tgcttctatt ccccagcata aacattttgt 
gcatgactgt atacaacaaa agtggacaaa gatgtgtgag aagacaaaca agcatcaagt 
actacttgcc ttgttagagt ctgatgcacc tgaatcagct atcatttttg tcggcgagca 
gtctgagaag tcaaaaaagg ctggaaatga tccatctaca actctactaa tgaaattcct 
aaaaacttca tataaaggct cgctggagat cctcctacta gagggggata tgaatttcaa 
ttcacgagca gcttcactaa cagaaatcag gcaaggagga gggtttcttc ttgtttctac 1140 
tgatattgca gcaagaggga ttgatctacc agaaacaact catattttta actttgatct 1200 
cccacaaacg gttacagatt atctgcaccg agctggaaga gctggtcgaa aacccttttc 1260 



60 
120 
180 
240 
300 
360 
420 
480 



840 
900 
960 
1020 
1080 
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tgacaggaag tgcattgtca ctaatctgat cacctcggag gaaagatttg tcttgcaaag 1320 
atacgaaaat gaacttatgt tcagctgcga ggaaatgatg ttgtag 
(2) INFORMATION FOR SEQ ID NO: 37 6: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 454 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..454 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583861 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 376: 
Tyr lie Arg Trp Ser Asn Lys lie Arg Gin Leu Gly Phe Pro Met Ala 
15 10 15 

Met Val Gin Asn His Arg Lys Val Glu Gin Arg Lys Thr Ser Lys Tyr 

20 25 30 

Val Phe His Ser Thr Ser Thr Arg Arg Ala Leu Lys Ser Ser Ser Asn 

35 40 45 

Ser Arg lie lie Lys Leu Gin Ala Val Ala Glu Thr Ser Ser Glu lie 

50 55 60 

Glu Ser Ser Ser Val Thr Glu Pro Thr Ala Leu Thr Leu Arg Gin lie 
65 70 75 80 

Cys Gin Gly Tyr Val Pro Glu His lie Leu His Arg Arg Ala Ser Phe 

85 90 95 

Leu Ser Xaa Tyr Phe Cys Arg lie Leu Asn Phe Asp Lys Tyr Leu Lys 

100 105 HO 

Phe Arg Thr Phe Cys Glu Leu Met Gin Asn Gly Gly Asp Trp Thr Gly 

115 120 125 

Ser Gly Lys Thr Leu Thr Tyr Leu Leu Leu lie Phe Ser Leu lie Asn 

130 135 140 

Pro Gin Arg Ser Ser Val Gin Ala Val He Val Val Pro Thr Arg Glu 
145 150 155 160 

Leu Gly Met Gin Val Thr Lys Val Ala Arg Met Leu Ala Ala Lys Ser 

165 170 175 

Glu He Asp Val Lys Gly Cys Thr Val Met Ala Leu Leu Asp Gly Gly 

180 185 190 

Thr Leu Arg Arg His Lys Ser Trp Leu Lys Ala Glu Pro Pro Ala He 

195 200 205 

Leu Val Ala Thr Val Ala Ser Leu Cys His Met Leu Glu Lys His He 

210 215 220 

Phe Arg He Asp Ser Val Arg Val Leu Val Val Asp Glu Val Asp Phe 
225 230 235 240 

Leu Phe Tyr Ser Ser Lys Gin Val Gly Ser Val Arg Lys Leu Leu Thr 

245 250 255 

Ser Phe Ser Ser Cys Asp Lys Arg Gin Thr Val Phe Ala Ser Ala Ser 

260 265 270 

He Pro Gin His Lys His Phe Val His Asp Cys He Gin Gin Lys Trp 

275 280 285 

Thr Lys Met Cys Glu Lys Thr Asn Lys His Gin Val Leu Leu Ala Leu 

290 295 300 

Leu Glu Ser Asp Ala Pro Glu Ser Ala He He Phe Val Gly Glu Gin 
305 310 315 320 

Ser Glu Lys Ser Lys Lys Ala Gly Asn Asp Pro Ser Thr Thr Leu Leu 

325 330 335 

Met Lys Phe Leu Lys Thr Ser Tyr Lys Gly Ser Leu Glu He Leu Leu 

340 345 350 

Leu Glu Gly Asp Met Asn Phe Asn Ser Arg Ala Ala Ser Leu Thr Glu 

355 360 365 

He Arg Gin Gly Gly Gly Phe Leu Leu Val Ser Thr Asp He Ala Ala 
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370 375 380 

Arg Gly lie Asp Leu Pro Glu Thr Thr His lie Phe Asn Phe Asp Leu 
385 390 395 400 

Pro Gin Thr Val Thr Asp Tyr Leu His Arg Ala Gly Arg Ala Gly Arg 

405 410 415 

Lys Pro Phe Ser Asp Arg Lys Cys lie Val Thr Asn Leu lie Thr Ser 

420 425 430 

Glu Glu Arg Phe Val Leu Gin Arg Tyr Glu Asn Glu Leu Met Phe Ser 

435 440 445 

Cys Glu Glu Met Met Leu 
450 

(2) INFORMATION FOR SEQ ID NO: 377; 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 4 40 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
<ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..440 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583862 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 377: 
Met Ala Met Val Gin Asn His Arg Lys Val Glu Gin Arg Lys Thr Ser 
15 10 15 

Lys Tyr Val Phe His Ser Thr Ser Thr Arg Arg Ala Leu Lys Ser Ser 

20 25 30 

Ser Asn Ser Arg lie lie Lys Leu Gin Ala Val Ala Glu Thr Ser Ser 

35 40 45 

Glu He Glu Ser Ser Ser Val Thr Glu Pro Thr Ala Leu Thr Leu Arg 

50 55 60 

Gin He Cys Gin Gly Tyr Val Pro Glu His He Leu His Arg Arg Ala 
65 70 75 80 

Ser Phe Leu Ser Xaa Tyr Phe Cys Arg He Leu Asn Phe Asp Lys Tyr 

85 90 95 

Leu Lys Phe Arg Thr Phe Cys Glu Leu Met Gin Asn Gly Gly Asp Trp 

100 105 HO 

Thr Gly Ser Gly Lys Thr Leu Thr Tyr Leu Leu Leu He Phe Ser Leu 

115 120 125 

He Asn Pro Gin Arg Ser Ser Val Gin Ala Val He Val Val Pro Thr 

130 135 140 

Arg Glu Leu Gly Met Gin Val Thr Lys Val Ala Arg Met Leu Ala Ala 
145 150 155 160 

Lys ser Glu He Asp Val Lys Gly Cys Thr Val Met Ala Leu Leu Asp 

165 170 175 

Gly Gly Thr Leu Arg Arg His Lys Ser Trp Leu Lys Ala Glu Pro Pro 

180 185 190 

Ala He Leu Val Ala Thr Val Ala Ser Leu Cys His Met Leu Glu Lys 

195 200 205 

His He Phe Arg He Asp Ser Val Arg Val Leu Val Val Asp Glu Val 

210 215 220 

Asp Phe Leu Phe Tyr Ser Ser Lys Gin Val Gly Ser Val Arg Lys Leu 
225 230 235 240 

Leu Thr Ser Phe Ser Ser Cys Asp Lys Arg Gin Thr Val Phe Ala Ser 

245 250 255 

Ala Ser He Pro Gin His Lys His Phe Val His Asp Cys He Gin Gin 

260 265 270 

Lys Trp Thr Lys Met Cys Glu Lys Thr Asn Lys His Gin Val Leu Leu 

275 280 285 

Ala Leu Leu Glu Ser Asp Ala Pro Glu Ser Ala He He Phe Val Gly 
290 295 300 



Attorney Docket No, 2750-1237P 
Client Docket No. 80146.003 



Table 2 
Page 225 



Glu 


Gin 


Ser 


Glu 


Lys 


Ser 


Lys 


Lys 


Ala 


Gly Asn 


Asp 


Pro 


Ser 


Thr 


Thr 


305 










310 










315 










320 


Leu 


Leu 


Met 


Lys 


Phe 


Leu 


Lys 


Thr 


Ser 


Tyr 


Lys 


Gly 


Ser 


Leu 


Glu 


He 










325 










330 










335 




Leu 


Leu 


Leu 


Glu 


Gly 


Asp 


Met 


Asn 


Phe 


Asn 


Ser 


Arg 


Ala 


Ala 


Ser 


Leu 








340 










345 










350 






Thr 


Glu 


lie 


Arg 


Gin 


Gly 


Gly 


Gly 


Phe 


Leu 


Leu 


Val 


Ser 


Thr 


Asp 


He 






355 










360 










365 








Ala 


Ala 


Arg 


Gly 


He 


Asp 


Leu 


Pro 


Glu 


Thr 


Thr 


His 


He 


Phe 


Asn 


Phe 




370 










375 










380 










Asp 


Leu 


Pro 


Gin 


Thr 


Val 


Thr 


Asp 


Tyr 


Leu 


His 


Arg 


Ala 


Gly 


Arg 


Ala 


385 










390 










395 










400 


Gly 


Arg 


Lys 


Pro 


Phe 


Ser 


Asp 


Arg 


Lys 


Cys 


He 


Val 


Thr 


Asn 


Leu 


He 










405 










410 










415 




Thr 


Ser 


Glu 


Glu 


Arg 


Phe 


Val 


Leu 


Gin 


Arg 


Tyr 


Glu 


Asn 


Glu 


Leu 


Met 








420 










425 










430 






Phe 


Ser 


Cys 


Glu 


Glu 


Met 


Met 


Leu 



















435 440 



(2) INFORMATION FOR SEQ ID NO: 37 8: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 438 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

{ D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..4 38 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583863 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 378: 

Met Val Gin Asn His Arg Lys Val Glu Gin Arg Lys Thr Ser Lys Tyr 

15 10 15 

Val Phe His Ser Thr Ser Thr Arg Arg Ala Leu Lys Ser Ser Ser Asn 

20 25 30 

Ser Arg He He Lys Leu Gin Ala Val Ala Glu Thr Ser Ser Glu He 

35 40 45 

Glu Ser Ser Ser Val Thr Glu Pro Thr Ala Leu Thr Leu Arg Gin He 

50 55 60 

Cys Gin Gly Tyr Val Pro Glu His He Leu His Arg Arg Ala Ser Phe 



Leu Ser Xaa Tyr Phe Cys Arg He Leu Asn Phe Asp Lys Tyr Leu Lys 

85 90 95 

Phe Arg Thr Phe Cys Glu Leu Met Gin Asn Gly Gly Asp Trp Thr Gly 

100 105 HO 

Ser Gly Lys Thr Leu Thr Tyr Leu Leu Leu He Phe Ser Leu He Asn 

115 120 125 

Pro Gin Arg Ser Ser Val Gin Ala Val He Val Val Pro Thr Arg Glu 

130 135 140 

Leu Gly Met Gin Val Thr Lys Val Ala Arg Met Leu Ala Ala Lys Ser 
145 150 155 160 

Glu He Asp Val Lys Gly Cys Thr Val Met Ala Leu Leu Asp Gly Gly 

165 170 175 

Thr Leu Arg Arg His Lys Ser Trp Leu Lys Ala Glu Pro Pro Ala He 

180 185 190 

Leu Val Ala Thr Val Ala Ser Leu Cys His Met Leu Glu Lys His He 

195 200 205 

Phe Arg He Asp Ser Val Arg Val Leu Val Val Asp Glu Val Asp Phe 

210 215 220 

Leu Phe Tyr Ser Ser Lys Gin Val Gly Ser Val Arg Lys Leu Leu Thr 
225 230 235 240 

Ser Phe Ser Ser Cys Asp Lys Arg Gin Thr Val Phe Ala Ser Ala Ser 



65 



70 



75 



80 
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245 










250 










255 




He 


Pro 


Gin 


His 
260 


Lys 


His 


Phe 


Val 


His 
265 


Asp 


Cys 


He 


Gin 


Gin 
270 


Lys 


Trp 


Thr 


Lys 


Met 


Cys 


Glu 


Lys 


Thr 


Asn 


Lys 


His 


Gin 


Val 


Leu 


Leu 


Ala 


Leu 




275 










280 










285 








Leu 


Glu 
290 


Ser 


Asp 


Ala 


Pro 


Glu 
295 


Ser 


Ala 


He 


He 


Phe 
300 


Val 


Gly 


Glu 


Gin 


Ser 


Glu 


Lys 


Ser 


Lys 


Lys 


Ala 


Gly Asn 


Asp 


Pro 


Ser 


Thr 


Thr 


Leu 


Leu 


305 








310 










315 










320 


Met 


Lys 


Phe 


Leu 


Lys 


Thr 


Ser 


Tyr 


Lys 


Gly 


Ser 


Leu 


Glu 


He 


Leu 


Leu 








325 










330 










335 




Leu 


Glu 


Gly 


Asp 


Met 


Asn 


Phe 


Asn 


Ser 


Arg 


Ala 


Ala 


Ser 


Leu 


Thr 


Glu 






340 










345 










350 






He 


Arg 


Gin 


Gly 


Gly Gly 


Phe 


Leu 


Leu 


Val 


Ser 


Thr 


Asp 


He 


Ala 


Ala 




355 










360 










365 








Arg 


Gly 


He 


Asp 


Leu 


Pro 


Glu 


Thr 


Thr 


His 


He 


Phe 


Asn 


Phe 


Asp 


Leu 


370 










375 










380 










Pro 


Gin 


Thr 


Val 


Thr 


Asp 


Tyr 


Leu 


His 


Arg 


Ala 


Gly 


Arg 


Ala 


Gly 


Arg 


385 










390 










395 










400 


Lys 


Pro 


Phe 


Ser 


Asp 


Arg 


Lys 


Cys 


He 


Val 


Thr 


Asn 


Leu 


He 


Thr 


Ser 








405 










410 










4 ± O 




Glu 


Glu 


Arg 


Phe 


Val 


Leu 


Gin 


Arg 


Tyr 


Glu 


Asn 


Glu 


Leu 


Met 


Phe 


Ser 






420 










425 










430 






Cys 


Glu 


Glu 
435 


Met 


Met 


Leu 






















(2) 


INFORMATION 


FOR 


SEQ 


ID ' 


NO:379: 

















(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1122 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 



MOLECULE TYPE: 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



DNA (genomic) 



1. .1122 

(D) OTHER INFORMATION: / Ceres Seq. ID 158391 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:379: 
atggcggagg ctattggtgc tgtatctttg gtgggtcatc gaccttcaat 
acagtgaaga atgaattgaa aacgcagaaa tcacagagca ttgttcgctt 
gtagattata gtgcgaaagg tgttctctct catttgatga cgcagagtgt 
cgaatgtcgg tgtttccgat cagggctctt gctatggaat tgacgaaaga 
gacgatcggt taccgaaaac ttggaattat cttgattctg gtgctgatga 
ttatggcctc ctgagaacaa agctgataag ccttcattgc ataatccttt 
gagcggatgg gttgtggttg gttaggtgct atatttgagt gggaaggagt 
gacaatcctg acttggataa ccaatcatgg cttactttag ctcaggaaga 
cctcctccgg cttttatgct cagacgtgtt gaagggatga agaacgagca 
gaggttctgt gttggtcgag agatcctgtt caagtgagaa ggatggctaa 
gagatcttta aagcactaca tggaggagtg tatagactaa gagatgggtc 
gtgaatgtct tgatgaataa taagatccct atggctttgg tatcgactcg 
acattggaga atgctgttgg atcgataggt atcagaaagt tcttcagtgt 
tcggaagatg tttacagagg taaaccggat cctgagatgt tcatttacgc 
cttgatttca taccggaacg ttgcattgta tttggaaact caaaccagac 
gctcacgatg ggaggatgaa atgtgtggct gtggcgagta aacacccgat 
ggtgcagctg agctggtggt aagaagacta gacgagttat cgattatcga 
cttgcagata ccgatttgac agagttcgaa ccggaattgg agatggaaaa 
cgtgagctgc cttcatcggc tgtagcagtt gatgatttct ga 
(2) INFORMATION FOR SEQ ID NO: 380: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 373 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 



tgtgaggatt 
tccggtgaag 
gaaaaagaat 
gaagaagaaa 
taagcctagt 
acttaggcaa 
attgattgaa 
agggaaatct 
ggcgatatct 
gcgtaaagaa 
gcaggagttt 
tcctcgggaa 
gatagttgca 
agcacagctt 
catagaggca 
ttatgagctt 
tttgaagaaa 
ggaagatgag 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE : 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..373 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583912 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 380: 
Met Ala Glu Ala lie Gly Ala Val Ser Leu Val Gly His Arg Pro Ser 
15 10 15 

lie Val Arg lie Thr Val Lys Asn Glu Leu Lys Thr Gin Lys Ser Gin 

20 25 30 

Ser lie Val Arg Phe Pro Val Lys Val Asp Tyr Ser Ala Lys Gly Val 

35 40 45 

Leu Ser His Leu Met Thr Gin Ser Val Lys Lys Asn Arg Met Ser Val 

50 55 60 

Phe Pro lie Arg Ala Leu Ala Met Glu Leu Thr Lys Glu Lys Lys Lys 
65 70 75 80 

Asp Asp Arg Leu Pro Lys Thr Trp Asn Tyr Leu Asp Ser Gly Ala Asp 

85 90 95 

Asp Lys Pro Ser Leu Trp Pro Pro Glu Asn Lys Ala Asp Lys Pro Ser 

100 105 110 

Leu His Asn Pro Leu Leu Arg Gin Glu Arg Met Gly Cys Gly Trp Leu 

115 120 125 

Gly Ala lie Phe Glu Trp Glu Gly Val Leu He Glu Asp Asn Pro Asp 

130 135 140 

Leu Asp Asn Gin Ser Trp Leu Thr Leu Ala Gin Glu Glu Gly Lys Ser 
145 150 155 160 

Pro Pro Pro Ala Phe Met Leu Arg Arg Val Glu Gly Met Lys Asn Glu 

165 170 175 

Gin Ala He Ser Glu Val Leu Cys Trp Ser Arg Asp Pro Val Gin Val 

180 185 190 

Arg Arg Met Ala Lys Arg Lys Glu Glu He Phe Lys Ala Leu His Gly 

195 200 205 

Gly Val Tyr Arg Leu Arg Asp Gly Ser Gin Glu Phe Val Asn Val Leu 

210 215 220 

Met Asn Asn Lys He Pro Met Ala Leu Val Ser Thr Arg Pro Arg Glu 
225 230 235 240 

Thr Leu Glu Asn Ala Val Gly Ser He Gly He Arg Lys Phe Phe Ser 

245 250 255 

Val He Val Ala Ser Glu Asp Val Tyr Arg Gly Lys Pro Asp Pro Glu 

260 ' 265 270 

Met Phe He Tyr Ala Ala Gin Leu Leu Asp Phe He Pro Glu Arg Cys 

275 280 285 

He Val Phe Gly Asn Ser Asn Gin Thr He Glu Ala Ala His Asp Gly 

290 295 300 

Arg Met Lys Cys Val Ala Val Ala Ser Lys His Pro He Tyr Glu Leu 
305 310 315 320 

Gly Ala Ala Glu Leu Val Val Arg Arg Leu Asp Glu Leu Ser He He 

325 330 335 

Asp Leu Lys Lys Leu Ala Asp Thr Asp Leu Thr Glu Phe Glu Pro Glu 

340 345 350 

Leu Glu Met Glu Lys Glu Asp Glu Arg Glu Leu Pro Ser Ser Ala Val 

355 360 365 

Ala Val Asp Asp Phe 
370 

(2) INFORMATION FOR SEQ ID NO: 381: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 321 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..321 

(D) OTHER INFORMATION: / Ceres Seq. ID 1583914 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 381: 
Met Thr Gin Ser Val Lys Lys Asn Arg Met Ser Val Phe Pro lie Arg 
15 10 15 

Ala Leu Ala Met Glu Leu Thr Lys Glu Lys Lys Lys Asp Asp Arg Leu 

20 25 30 

Pro Lys Thr Trp Asn Tyr Leu Asp Ser Gly Ala Asp Asp Lys Pro Ser 

35 40 45 

Leu Trp Pro Pro Glu Asn Lys Ala Asp Lys Pro Ser Leu His Asn Pro 

50 55 60 

Leu Leu Arg Gin Glu Arg Met Gly Cys Gly Trp Leu Gly Ala lie Phe 
65 70 75 80 

Glu Trp Glu Gly Val Leu lie Glu Asp Asn Pro Asp Leu Asp Asn Gin 

85 90 95 

Ser Trp Leu Thr Leu Ala Gin Glu Glu Gly Lys Ser Pro Pro Pro Ala 

100 105 110 

Phe Met Leu Arg Arg Val Glu Gly Met Lys Asn Glu Gin Ala lie Ser 

115 120 125 

Glu Val Leu Cys Trp Ser Arg Asp Pro Val Gin Val Arg Arg Met Ala 

130 135 140 

Lys Arg Lys Glu Glu lie Phe Lys Ala Leu His Gly Gly Val Tyr Arg 
145 150 155 160 

Leu Arg Asp Gly Ser Gin Glu Phe Val Asn Val Leu Met Asn Asn Lys 

165 170 175 

lie Pro Met Ala Leu Val Ser Thr Arg Pro Arg Glu Thr Leu Glu Asn 

180 185 190 

Ala Val Gly Ser lie Gly lie Arg Lys Phe Phe Ser Val lie Val Ala 

195 200 205 

Ser Glu Asp Val Tyr Arg Gly Lys Pro Asp Pro Glu Met Phe lie Tyr 

210 215 220 

Ala Ala Gin Leu Leu Asp Phe lie Pro Glu Arg Cys lie Val Phe Gly 
225 230 235 240 

Asn Ser Asn Gin Thr He Glu Ala Ala His Asp Gly Arg Met Lys Cys 

245 250 255 

Val Ala Val Ala Ser Lys His Pro He Tyr Glu Leu Gly Ala Ala Glu 

260 265 270 

Leu Val Val Arg Arg Leu Asp Glu Leu Ser He He Asp Leu Lys Lys 

275 280 285 

Leu Ala Asp Thr Asp Leu Thr Glu Phe Glu Pro Glu Leu Glu Met Glu 

290 295 300 

Lys Glu Asp Glu Arg Glu Leu Pro Ser Ser Ala Val Ala Val Asp Asp 
305 310 315 320 

Phe 

(2) INFORMATION FOR SEQ ID NO: 382: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 420 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..420 

(D) OTHER INFORMATION: / Ceres Seq. ID 1584116 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 382: 

gaacaacgat cctcgccgca gtgaagaaga cggtggatcc gaagacgcaa gttatctaca 



60 
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atcaaaaccc cgacaccaat ttcgtcaagg ccggtgactt cgattacgca atcgtggccg 120 

tcggagagaa accgtacgcg gagggattcg gagacagtac gaacttaacc atatcggaac 180 

caggtccgag cacgatcggg aacgtgtgcg catcggtgaa atgtgtggtg gtggttgtat 240 

cgggacgtcc ggtggtgatg cagccgtaca tttcgaacat cgatgctcta gtggcggcgt 300 

ggcttccggg aacggaaggt caaggagtgg ctgatgtttt gttcggagat tatggattca 360 

ccgggaaatt ggctcggacg tggtttaaga cagtggatca gttgccgatg aacgttggtg 420 

(2) INFORMATION FOR SEQ ID NO: 383: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 139 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..139 

(D) OTHER INFORMATION: / Ceres Seq. ID 1584117 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 383: 
Thr Thr lie Leu Ala Ala Val Lys Lys Thr Val Asp Pro Lys Thr Gin 
15 10 15 

Val lie Tyr Asn Gin Asn Pro Asp Thr Asn Phe Val Lys Ala Gly Asp 

20 25 30 

Phe Asp Tyr Ala He Val Ala Val Gly Glu Lys Pro Tyr Ala Glu Gly 

35 40 45 

Phe Gly Asp Ser Thr Asn Leu Thr He Ser Glu Pro Gly Pro Ser Thr 

50 55 60 

He Gly Asn Val Cys Ala Ser Val Lys Cys Val Val Val Val Val Ser 
65 70 75 80 

Gly Arg Pro Val Val Met Gin Pro Tyr He Ser Asn He Asp Ala Leu 

85 90 95 

Val Ala Ala Trp Leu Pro Gly Thr Glu Gly Gin Gly Val Ala Asp Val 

100 105 HO 

Leu Phe Gly Asp Tyr Gly Phe Thr Gly Lys Leu Ala Arg Thr Trp Phe 

115 120 125 

Lys Thr Val Asp Gin Leu Pro Met Asn Val Gly 

130 135 
(2) INFORMATION FOR SEQ ID NO: 384: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 288 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
{D} TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..288 

( D) OTHER INFORMATION: / Ceres Seq. ID 1584187 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:384: 

ggatacaatt cccgtattga ggccactgtc gatcgttgcc ttgctccgtt ttacgaagac 



60 



cgattcgttg aaaataagtg gaagacgatc acctctttct tggtccgtaa ggcaactgac 120 
tcagtcagag aaacaaagca tgaatacggc atactgttta tggatcggac cgtcgtcgtt 180 
caggccccac cgagatcacc accagttcca gattccgatt tcactccctt cgactacatt 240 
ttggagaaat ctgcttacaa gaacgtttta gtcggtgagg aggaataa 
(2) INFORMATION FOR SEQ ID NO: 385: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 95 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
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(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 95 

(D) OTHER INFORMATION: / Ceres Seq. ID 1584188 





(xi) 


SEQUENCE DESCRIPTION: SEQ 


ID NO:385 












Gly 


Tyr 


Asn 


Ser 


Arg 


lie 


Glu Ala Thr 


Val 


Asp 


Arg 


Cys 


Leu 


Ala 


Pro 


1 






5 






10 










15 




Phe 


Tyr 


Glu 


Asp 


Arg 


Phe 


Val Glu Asn 


Lys 


Trp 


Lys 


Thr 


lie 


Thr 


Ser 








20 






25 










30 






Phe 


Leu 


Val 


Arg 


Lys 


Ala 


Thr Asp Ser 


Val 


Arg 


Glu 


Thr 


Lys 


His 


Glu 






35 








40 








45 








Tyr 


Gly 


lie 


Leu 


Phe 


Met 


Asp Arg Thr 


Val 


Val 


Val 


Gin 


Ala 


Pro 


Pro 




50 










55 






60 










Arg 


Ser 


Pro 


Pro 


Val 


Pro 


Asp Ser Asp 


Phe 


Thr 


Pro 


Phe 


Asp 


Tyr 


He 


65 










70 






75 










80 


Leu 


Glu 


Lys 


Ser 


Ala 


Tyr 


Lys Asn Val 


Leu 


Val 


Gly 


Glu 


Glu 


Glu 












85 






90 










95 




(2) 


INFORMATION 


FOR 


SEQ 


ID NO:386: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 89 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/ KEY : peptide 

(B) LOCATION: 1..89 







(D) OTHER 


INFORMATION: 


/ Ceres Seq. ID 15E 


34189 








(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 386: 










Asp 


Thr 


He 


Pro 


Val 


Leu 


Arg Pro 


Leu 


Ser He Val 


Ala 


Leu 


Leu 


Arg 


1 








5 








10 






15 




Phe 


Thr 


Lys 


Thr 


Asp 


Ser 


Leu Lys 


He 


Ser Gly Arg 


Arg 


Ser 


Pro 


Leu 






20 








25 






30 






Ser 


Trp 


Ser 


Val 


Arg 


Gin 


Leu Thr 


Gin 


Ser Glu Lys 


Gin 


Ser 


Met 


Asn 




35 








40 






45 








Thr 


Ala 


Tyr 


Cys 


Leu 


Trp 


He Gly 


Pro 


Ser Ser Phe 


Arg 


Pro 


His 


Arg 




50 










55 




60 










Asp 


His 


His 


Gin 


Phe 


Gin 


He Pro 


He 


Ser Leu Pro 


Ser 


Thr 


Thr 


Phe 


65 










70 






75 








80 


Trp 


Arg 


Asn 


Leu 


Leu 
85 


Thr 


Arg Thr 


Phe 












(2) 


INFORMATION 


FOR 


SEQ 


ID NO:3£ 


37 : 













(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 510 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. .510 

(D) OTHER INFORMATION: / Ceres Seq. ID 1584335 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 387: 
aaggacaaga ttcttggagg gaagatcatc cgtgtggagg ctcatccgat tcctgaacat 60 
ccgagaccac gtaggctctc gaaacgtgtg gctcttgtag gtgatgctgc agggtatgtg 120 
actaaatgct ctggtgaagg gatctacttt gctgctaaga gtggaagaat gtgtgctgaa 180 
gccattgtcg aaggttcaca gaatggtaag aagatgattg acgaagggga cttgaggaag 240 
tacttggaga aatgggataa gacatacttg cctacctaca gggtacttga tgtgttgcag 
aaagtgtttt acagatcaaa tccggctaga gaagcgtttg tggagatgtg taatgatgag 
tatgttcaga agatgacatt cgatagctat ctgtacaagc gggttgcgcc gggtagtcct 
ttggaggata tcaagttggc tgtgaacacc attggaagtt tggttagggc taatgctcta 



300 
360 
420 
480 
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aggagagaga ttgagaagct tagtgtttaa 
(2) INFORMATION FOR SEQ ID NO: 38 8: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 169 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..169 
(D) OTHER INFORMATION: / Ceres Seq. ID 1584336 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 388: 



Lys 


Asp 


Lys 


He 


Leu 


Gly 


Gly 


Lys 


He 


He 


Arg 


Val 


Glu 


Ala 


His 


Pro 


1 








5 










10 










15 




lie 


Pro 


Glu 


His 
20 


Pro 


Arg 


Pro 


Arg 


Arg 
25 


Leu 


Ser 


Lys 


Arg 


Val 
30 


Ala 


Leu 


Val 


Gly 


Asp 

35 


Ala 


Ala 


Gly 


Tyr 


Val 
40 


Thr 


Lys 


Cys 


Ser 


Gly 
45 


Glu 


Gly 


He 


Tyr 


Phe 

50 


Ala 


Ala 


Lys 


Ser 


Gly 
55 


Arg 


Met 


Cys 


Ala 


Glu 
60 


Ala 


He 


Val 


Glu 


Gly 


Ser 


Gin 


Asn 


Gly 


Lys 


Lys 


Met 


He 


Asp 


Glu 


Gly Asp 


Leu 


Arg 


Lys 


65 










70 










75 










80 


Tyr 


Leu 


Glu 


Lys 


Trp 


Asp 


Lys 


Thr 


Tyr 


Leu 


Pro 


Thr 


Tyr 


Arg 


Val 


Leu 








85 










90 










95 




Asp 


Val 


Leu 


Gin 


Lys 


Val 


Phe 


Tyr 


Arg 


Ser 


Asn 


Pro 


Ala 


Arg 


Glu 


Ala 






100 










105 










110 






Phe 


Val 


Glu 
115 


Met 


Cys 


Asn 


Asp 


Glu 
120 


Tyr 


Val 


Gin 


Lys 


Met 
125 


Thr 


Phe 


Asp 


Ser 


Tyr 
130 


Leu 


Tyr 


Lys 


Arg 


Val 
135 


Ala 


Pro 


Gly 


Ser 


Pro 
140 


Leu 


Glu 


Asp 


He 


Lys 


Leu 


Ala 


Val 


Asn 


Thr 


He 


Gly 


Ser 


Leu 


Val 


Arg 


Ala 


Asn 


Ala 


Leu 


145 










150 










155 










160 


Arg 


Arg 


Glu 


He 


Glu 
165 


Lys 


Leu 


Ser 


Val 
















(2) 


INFORMATION 


FOR 


SEQ 


ID ' 


NO: 31 


39: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 113 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..113 

(D) OTHER INFORMATION: / Ceres Seq. ID 1584337 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 389: 



Met 


Cys 


Ala 


Glu 


Ala 


He 


Val 


Glu 


Gly 


Ser 


Gin 


Asn 


Gly 


Lys 


Lys 


Met 


1 






5 










10 










15 




He 


Asp 


Glu 


Gly 


Asp 


Leu 


Arg 


Lys 


Tyr 


Leu 


Glu 


Lys 


Trp 


Asp 


Lys 


Thr 






20 










25 










30 






Tyr 


Leu 


Pro 


Thr 


Tyr 


Arg 


Val 


Leu 


Asp 


Val 


Leu 


Gin 


Lys 


Val 


Phe 


Tyr 




35 










40 










45 








Arg 


Ser 


Asn 


Pro 


Ala 


Arg 


Glu 


Ala 


Phe 


Val 


Glu 


Met 


Cys 


Asn 


Asp 


Glu 


50 










55 










60 










Tyr 


Val 


Gin 


Lys 


Met 


Thr 


Phe 


Asp 


Ser 


Tyr 


Leu 


Tyr 


Lys 


Arg 


Val 


Ala 


65 










70 










75 










80 


Pro 


Gly 


Ser 


Pro 


Leu 


Glu 


Asp 


He 


Lys 


Leu 


Ala 


Val 


Asn 


Thr 


He 


Gly 








85 










90 










95 




Ser 


Leu 


Val 


Arg 
100 


Ala 


Asn 


Ala 


Leu 


Arg 
105 


Arg 


Glu 


He 


Glu 


Lys 
110 


Leu 


Ser 
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Val 



(2) INFORMATION FOR SEQ ID NO: 3 90: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 98 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY 

(B) LOCATION 



Met 
1 

Thr 

Tyr 

Glu 

Ala 

65 

Gly 

Ser 



peptide 
1. . 98 

(D) OTHER INFORMATION: / Ceres Seq 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 390 
lie Asp Glu Gly Asp Leu Arg Lys Tyr Leu 

5 10 
Tyr Leu Pro Thr Tyr Arg Val Leu Asp Val 

20 25 
Arg Ser Asn Pro Ala Arg Glu Ala Phe Val 

35 40 
Tyr Val Gin Lys Met Thr Phe Asp Ser Tyr 
50 55 

Pro Gly Ser Pro Leu Glu Asp lie Lys Leu 

70 75 
Ser Leu Val Arg Ala Asn Ala Leu Arg Arg 
85 90 

Val 



ID 1584338 
Glu Lys 
Leu Gin 



Glu Met 
45 

Leu Tyr 
60 

Ala Val 
Glu lie 



Trp Asp Lys 
15 

Lys Val Phe 
30 

Cys Asn Asp 



Lys Arg Val 

Asn Thr lie 
80 

Glu Lys Leu 
95 



(2) 



ID 1584543 



INFORMATION FOR SEQ ID NO: 391: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 824 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA {genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..824 

(D) OTHER INFORMATION: / Ceres Seq, 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 391: 
attagttaag aatccgattg atgttcacaa tngtgttgtt tcgatttcga 
cttatcttgt gcagtggttg ggaatagtgg tacattgcta aatagtcaat 
cattgataag catgagattg tgattcggtt gaacaatgcg aaaacagaga 
gaaggttgga tcgaaaacaa atatatcttt cataaatagt aacattttgc 
gagaagggag agttgctatt gtcatcctta tggtgaaaca gcgcctattg 
ttgccaaccg attcatgttt tggattacac tttgtgtaaa ccatctcatc 
gcttatcacg gatccgaggt tcgatgtcat gtgtgctaga atcgtgaagt 
gaagaagttc ttggaagaga agaaagcgaa agggtttgtc gattggagta 
aggctcgttg tttcactatt cgtcgggtat gcaggctgtg atgcttgcag 
tgagaaagtt agtgtcttcg ggtttgggaa gttgaattca accaagcacc 
taatcagaaa gcagagctga agcttcatga ctatgaagca gagtatagat 
cctggaaaac agtcccaggg ccattccatt cttaccaaaa gaattcaaga 
attgttgttc aagcgaagaa gcttgttggc aagagtgtcg ttttgtccaa 
ttttgcttca gattgggcag tgaagatgca acactttccc aatg 
(2) INFORMATION FOR SEQ ID NO: 3 92: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



gtgaaagata 
acggagatct 
ggtttgagaa 
atcaatgtgg 
tgatgtacat 
gagctcctct 
attactcggt 
aagatcatga 
tggggatttg 
attaccatac 
tgtatcgaga 
tccctttgtt 
gtcccaaaga 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
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(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..274 

(D) OTHER INFORMATION: / Ceres Seq. ID 1584544 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 392: 
Leu Val Lys Asn Pro lie Asp Val His Asn Xaa Val Val Ser lie Ser 
15 10 15 

Ser Glu Arg Tyr Leu Ser Cys Ala Val Val Gly Asn Ser Gly Thr Leu 

20 25 30 

Leu Asn Ser Gin Tyr Gly Asp Leu lie Asp Lys His Glu lie Val lie 

35 40 45 

Arg Leu Asn Asn Ala Lys Thr Glu Arg Phe Glu Lys Lys Val Gly Ser 

50 55 60 

Lys Thr Asn lie Ser Phe lie Asn Ser Asn lie Leu His Gin Cys Gly 
65 70 75 80 

Arg Arg Glu Ser Cys Tyr Cys His Pro Tyr Gly Glu Thr Ala Pro lie 

85 90 95 

Val Met Tyr lie Cys Gin Pro lie His Val Leu Asp Tyr Thr Leu Cys 

100 105 110 

Lys Pro Ser His Arg Ala Pro Leu Leu He Thr Asp Pro Arg Phe Asp 

115 120 125 

Val Met Cys Ala Arg He Val Lys Tyr Tyr Ser Val Lys Lys Phe Leu 

130 135 140 

Glu Glu Lys Lys Ala Lys Gly Phe Val Asp Trp Ser Lys Asp His Glu 
145 150 155 160 

Gly Ser Leu Phe His Tyr Ser Ser Gly Met Gin Ala Val Met Leu Ala 

165 170 175 

Val Gly He Cys Glu Lys Val Ser Val Phe Gly Phe Gly Lys Leu Asn 

180 185 190 

Ser Thr Lys His His Tyr His Thr Asn Gin Lys Ala Glu Leu Lys Leu 

195 200 205 

His Asp Tyr Glu Ala Glu Tyr Arg Leu Tyr Arg Asp Leu Glu Asn Ser 

210 215 220 

Pro Arg Ala He Pro Phe Leu Pro Lys Glu Phe Lys He Pro Leu Leu 
225 230 235 240 

Leu Leu Phe Lys Arg Arg Ser Leu Leu Ala Arg Val Ser Phe Cys Pro 

245 250 255 

Ser Pro Lys Asp Phe Ala Ser Asp Trp Ala Val Lys Met Gin His Phe 
260 265 270 

Pro Asn 

(2) INFORMATION FOR SEQ ID NO: 393: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 177 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..177 

(D) OTHER INFORMATION: / Ceres Seq. ID 1584545 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 393: 
Met Tyr He Cys Gin Pro He His Val Leu Asp Tyr Thr Leu Cys Lys 
15 10 15 

Pro Ser His Arg Ala Pro Leu Leu He Thr Asp Pro Arg Phe Asp Val 

20 25 30 

Met Cys Ala Arg He Val Lys Tyr Tyr Ser Val Lys Lys Phe Leu Glu 

35 40 45 

Glu Lys Lys Ala Lys Gly Phe Val Asp Trp Ser Lys Asp His Glu Gly 
50 55 60 
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Ser 


Leu 


Phe 


His 


Tyr 


Ser 


Ser 


Gly 


Met 


Gin 


Ala 


Val 


Met 


Leu 


Ala 


Val 


65 










70 










75 










80 


Gly 


He 


Cys 


Glu 


Lys 


Val 


Ser 


Val 


Phe 


Gly 


Phe 


Gly 


Lys 


Leu 


Asn 


Ser 






85 










90 










95 




Thr 


Lys 


His 


His 


Tyr 


His 


Thr 


Asn 


Gin 


Lys 


Ala 


Glu 


Leu 


Lys 


Leu 


His 






100 










105 










110 






Asp 


Tyr 


Glu 


Ala 


Glu 


Tyr 


Arg 


Leu 


Tyr 


Arg 


Asp 


Leu 


Glu 


Asn 


Ser 


Pro 


115 










120 










125 








Arg 


Ala 


He 


Pro 


Phe 


Leu 


Pro 


Lys 


Glu 


Phe 


Lys 


He 


Pro 


Leu 


Leu 


Leu 


130 










135 










140 










Leu 


Phe 


Lys 


Arg 


Arg 


Ser 


Leu 


Leu 


Ala 


Arg 


Val 


Ser 


Phe 


Cys 


Pro 


Ser 


145 








150 










155 










160 


Pro 


Lys 


Asp 


Phe 


Ala 


Ser 


Asp 


Trp 


Ala 


Val 


Lys 


Met 


Gin 


His 


Phe 


Pro 



165 170 175 



Asn 

(2) INFORMATION FOR SEQ ID NO: 394: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..145 

(D) OTHER INFORMATION: / Ceres Seq. ID 1584546 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 394: 



Met 


Cys 


Ala 


Arg 


lie 


Val 


Lys 


Tyr 


Tyr 


Ser 


Val 


Lys 


Lys 


Phe 


Leu 


Glu 


1 






5 










10 










15 




Glu 


Lys 


Lys 


Ala 


Lys 


Gly 


Phe 


Val 


Asp 


Trp 


Ser 


Lys 


Asp 


His 


Glu 


Gly 




20 










25 










30 






Ser 


Leu 


Phe 


His 


Tyr 


Ser 


Ser 


Gly 


Met 


Gin 


Ala 


Val 


Met 


Leu 


Ala 


Val 






35 










40 










45 








Gly 


He 


Cys 


Glu 


Lys 


Val 


Ser 


Val 


Phe 


Gly 


Phe 


Gly 


Lys 


Leu 


Asn 


Ser 


50 










55 










60 










Thr 


Lys 


His 


His 


Tyr 


His 


Thr 


Asn 


Gin 


Lys 


Ala 


Glu 


Leu 


Lys 


Leu 


His 


65 








70 










75 










80 


Asp 


Tyr 


Glu 


Ala 


Glu 


Tyr 


Arg 


Leu 


Tyr 


Arg 


Asp 


Leu 


Glu 


Asn 


Ser 


Pro 






85 










90 










95 




Arg 


Ala 


He 


Pro 


Phe 


Leu 


Pro 


Lys 


Glu 


Phe 


Lys 


He 


Pro 


Leu 


Leu 


Leu 






100 










105 










110 






Leu 


Phe 


Lys 


Arg 


Arg 


Ser 


Leu 


Leu 


Ala 


Arg 


Val 


Ser 


Phe 


Cys 


Pro 


Ser 






115 










120 










125 








Pro 


Lys 


Asp 


Phe 


Ala 


Ser 


Asp 


Trp 


Ala 


Val 


Lys 


Met 


Gin 


His 


Phe 


Pro 



130 135 140 



Asn 
145 

(2) INFORMATION FOR SEQ ID NO: 395: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1283 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..1283 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585005 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 395: 
aagcaacgta ttctccagga ggtttccacc atctttcttc tgtaagacaa aagaagaagg 
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cgtccaagtc aaaacatgag ttaccttttg aaactcgtta cttccctcaa aatcttgacc 
acttcagttt cacaccagac agctacaaag tcttccacca gaagtacctc atcaacaacc 
gtttctggcg aaaaggtggt cccatctttg tctacactgg aaatgaagga aacatcgact 
ggtttgcttc caacaccggt ttcatgctgg atattgctcc caagttccgg gctcttcttg 
ttttcattga acaccggttc tatggagaat caacgccatt tgggaagaag tcgcataagt 
cagctgagac attgggttac ctaaactctc agcaagcgtt ggctgattat gcaatcctga 
taagaagctt gaagcagaat ctaacgtctg aggcatctcc tgtggttgtc tttggtggct 
cttatggtgg aatgcttgca gcgtggttca gactcaagta tccccacata acaatcggtg 
cattggcatc ctccgctcca atacttcatt tcgataacat tgtaccattg acaagcttct 
atgatgccat ttctcaggat tttaaggatg caagtattaa ttgtttcaaa gtcatcaaga 
gaagctggga agagctagag gcagtttcaa ctatgaaaaa tggcttgcaa gaactcagca 
aaaagttccg aacttgcaag ggccttcatt ctcaatattc agccagagat tggttaagtg 
gagcatttgt ttatacagcc atggttaatt atccaactgc agctaatttc atggcgccac 
tgcctggtta tcccgtagag cagatgtgca agatcatcga cggcttccct cgaggatcca 
gtaatcttga ccgtgccttt gctgctgcga gcttatacta caactattca ggatcagaaa 
aatgcttcga aatggaacaa caaactgatg atcatggact tgatggttgg caatatcaga 
ggatagagac agtactgaag agatttggaa gcaacatcat attctccaat ggaatgcagg 
acccttggag ccgtggaggg gttctgaaga acatttcaag tagcatcgtt gcgcttgtga 
ccaagaaagg agctcaccat gcagatctca gggctgctac aaaagatgac ccagagtggc 
tgaaagagca gaggaggcaa gaggttgcca ttatagagaa atggatcagt gagtattata 
gagatttaag agaagagcaa tag 
(2) INFORMATION FOR SEQ ID NO: 396: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 426 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

( D) TOPOLOGY : linear 
peptide 



(ii) MOLECULE TYPE: 
(ix) FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 
1. . 426 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585006 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 396: 
Ala Thr Tyr Ser Pro Gly Gly Phe His His Leu Ser Ser Val Arg Gin 
15 10 15 

Lys Lys Lys Ala Ser Lys Ser Lys His Glu Leu Pro Phe Glu Thr Arg 

20 25 30 

Tyr Phe Pro Gin Asn Leu Asp His Phe Ser Phe Thr Pro Asp Ser Tyr 

35 40 45 

Lys Val Phe His Gin Lys Tyr Leu lie Asn Asn Arg Phe Trp Arg Lys 

50 55 60 

Gly Gly Pro He Phe Val Tyr Thr Gly Asn Glu Gly Asn He Asp Trp 
65 70 75 80 

Phe Ala Ser Asn Thr Gly Phe Met Leu Asp He Ala Pro Lys Phe Arg 

85 90 95 

Ala Leu Leu Val Phe He Glu His Arg Phe Tyr Gly Glu Ser Thr Pro 

100 105 HO 

Phe Gly Lys Lys Ser His Lys Ser Ala Glu Thr Leu Gly Tyr Leu Asn 

115 120 125 

Ser Gin Gin Ala Leu Ala Asp Tyr Ala He Leu He Arg Ser Leu Lys 

130 135 140 

Gin Asn Leu Thr Ser Glu Ala Ser Pro Val Val Val Phe Gly Gly Ser 
145 150 155 160 

Tyr Gly Gly Met Leu Ala Ala Trp Phe Arg Leu Lys Tyr Pro His He 

165 170 175 

Thr He Gly Ala Leu Ala Ser Ser Ala Pro He Leu His Phe Asp Asn 

180 185 190 

He Val Pro Leu Thr Ser Phe Tyr Asp Ala He Ser Gin Asp Phe Lys 

195 200 205 

Asp Ala Ser He Asn Cys Phe Lys Val He Lys Arg Ser Trp Glu Glu 
210 215 220 



120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
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Leu 


Glu 


Ala 


Val 


Ser 


Thr 


Met 


Lys 


Asn 


Gly 


Leu 


Gin 


Glu 


Leu 


Ser 


Lys 


225 










230 










235 










240 


Lys 


Phe 


Arg 


Thr 


Cys 


Lys 


Gly 


Leu 


His 


Ser 


Gin 


Tyr 


Ser 


Ala 


Arg 


Asp 








245 










250 










255 




Trp 


Leu 


Ser 


Gly Ala 


Phe 


Val 


Tyr 


Thr 


Ala 


Met 


Val 


Asn 


Tyr 


Pro 


Thr 






260 










265 










270 






Ala 


Ala 


Asn 


Phe 


Met 


Ala 


Pro 


Leu 


Pro 


Gly 




Pro 


Val 


Glu 


Gin 


Met 






275 










280 










285 








Cys 


Lys 


He 


He 


Asp 


Gly 


Phe 


Pro 


Arg 


Gly 


Ser 


Ser 


Asn 


Leu Asp 


Arg 


290 










295 










300 










Ala 


Phe 


Ala 


Ala 


Ala 


Ser 


Leu 


Tyr 


Tyr 


Asn 


Tyr 


Ser 


Gly 


Ser 


Glu 


Lys 


305 










310 










315 










320 


Cys 


Phe 


Glu 


Met 


Glu 


Gin 


Gin 


Thr 


Asp 


Asp 


His 


Gly 


Leu 


Asp 


Gly 


Trp 








325 










330 










335 




Gin 


Tyr 


Gin 


Arg 


He 


Glu 


Thr 


Val 


Leu 


Lys 


Arg 


Phe 


Gly 


Ser 


Asn 


He 






340 










345 










350 






He 


Phe 


Ser 


Asn 


Gly 


Met 


Gin 


Asp 


Pro 


Trp 


Ser 


Arg 


Gly 


Gly 


Val 


Leu 






355 










360 










365 








Lys 


Asn 


He 


Ser 


Ser 


Ser 


He 


Val 


Ala 


Leu 


Val 


Thr 


Lys 


Lys 


Gly Ala 


370 










375 










380 










His 


His 


Ala 


Asp 


Leu 


Arg 


Ala 


Ala 


Thr 


Lys 


Asp 


Asp 


Pro 


Glu 


Trp 


Leu 


385 








390 










395 










400 


Lys 


Glu 


Gin 


Arg 


Arg 


Gin 


Glu 


Val 


Ala 


He 


He 


Glu 


Lys 


Trp 


He 


Ser 






405 










410 










415 




Glu 


Tyr 


Tyr 


Arg 


Asp 


Leu 


Arg 


Glu 


Glu 


Gin 















420 425 



(2) INFORMATION FOR SEQ ID NO: 397: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 339 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

( B) LOCATION: 1..339 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585007 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO 


■:397 












Met 


Leu 


Asp 


He 


Ala 


Pro 


Lys 


Phe 


Arg 


Ala 


Leu 


Leu 


Val 


Phe 


He 


Glu 


1 






5 










10 










15 




His 


Arg 


Phe 


Tyr 


Gly 


Glu 


Ser 


Thr 


Pro 


Phe 


Gly 


Lys 


Lys 


Ser 


His 


Lys 






20 










25 










30 






Ser 


Ala 


Glu 


Thr 


Leu 


Gly 


Tyr 


Leu 


Asn 


Ser 


Gin 


Gin 


Ala 


Leu 


Ala 


Asp 






35 










40 










45 








Tyr 


Ala 


He 


Leu 


He 


Arg 


Ser 


Leu 


Lys 


Gin 


Asn 


Leu 


Thr 


Ser 


Glu 


Ala 


50 










55 










60 










Ser 


Pro 


Val 


Val 


Val 


Phe 


Gly 


Gly 


Ser 


Tyr 


Gly 


Gly 


Met 


Leu 


Ala 


Ala 


65 










70 










75 










80 


Trp 


Phe 


Arg 


Leu 


Lys 


Tyr 


Pro 


His 


He 


Thr 


He 


Gly 


Ala 


Leu 


Ala 


Ser 






85 










90 










95 




Ser 


Ala 


Pro 


He 


Leu 


His 


Phe 


Asp 


Asn 


He 


Val 


Pro 


Leu 


Thr 


Ser 


Phe 








100 










105 










110 






Tyr 


Asp 


Ala 


He 


Ser 


Gin 


Asp 


Phe 


Lys 


Asp 


Ala 


Ser 


He 


Asn 


Cys 


Phe 


115 










120 










125 








Lys 


Val 


He 


Lys 


Arg 


Ser 


Trp 


Glu 


Glu 


Leu 


Glu 


Ala 


Val 


Ser 


Thr 


Met 


130 










135 










140 










Lys 


Asn 


Gly 


Leu 


Gin 


Glu 


Leu 


Ser 


Lys 


Lys 


Phe 


Arg 


Thr 


Cys 


Lys 


Gly 


145 








150 










155 










160 


Leu 


His 


Ser 


Gin 


Tyr 


Ser 


Ala 


Arg 


Asp 


Trp 


Leu 


Ser 


Gly Ala 


Phe 


Val 










165 










170 










175 




Tyr 


Thr 


Ala 


Met 


Val 


Asn 


Tyr 


Pro 


Thr 


Ala 


Ala 


Asn 


Phe 


Met 


Ala 


Pro 
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180 










185 










190 






Leu 


Pro 


Gly 


Tyr 


Pro 


Val 


Glu 


Gin 


Met 


Cys 


Lys 


He 


He 


Asp 


Gly 


Phe 






195 








200 










205 








Pro 


Arg 


Gly 


Ser 


Ser 


Asn 


Leu 


Asp 


Arg 


Ala 


Phe 


Ala 


Ala 


Ala 


Ser 


Leu 




210 








215 










220 










Tyr 


Tyr 


Asn 


Tyr 


Ser 


Gly 


Ser 


Glu 


Lys 


Cys 


Phe 


Glu 


Met 


Glu 


Gin 


Gin 


225 






230 










235 










240 


Thr 


Asp 


Asp 


His 


Gly 


Leu 


Asp 


Gly 


Trp 


Gin 


Tyr 


Gin 


Arg 


He 


Glu 


Thr 








245 










250 










255 




Val 


Leu 


Lys 


Arg 


Phe 


Gly 


Ser 


Asn 


He 


He 


Phe 


Ser 


Asn 


Gly 


Met 


Gin 






260 










265 










270 






Asp 


Pro 


Trp 


Ser 


Arg 


Gly 


Gly 


Val 


Leu 


Lys 


Asn 


He 


Ser 


Ser 


Ser 


He 




275 










280 










285 








Val 


Ala 


Leu 


Val 


Thr 


Lys 


Lys 


Gly Ala 


His 


His 


Ala 


Asp 


Leu 


Arg 


Ala 




290 










295 










300 










Ala 


Thr 


Lys 


Asp 


Asp 


Pro 


Glu 


Trp 


Leu 


Lys 


Glu 


Gin 


Arg 


Arg 


Gin 


Glu 


305 








310 










315 










320 


Val 


Ala 


He 


He 


Glu 


Lys 


Trp 


He 


Ser 


Glu 


Tyr 


Tyr 


Arg 


Asp 


Leu 


Arg 










325 








330 










335 




Glu 


Glu 


Gin 





























(2) INFORMATION FOR SEQ ID NO: 398: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 63 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..2 63 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585008 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 398: 
Met Leu Ala Ala Trp Phe Arg Leu Lys Tyr Pro His He Thr He Gly 
15 10 15 

Ala Leu Ala Ser Ser Ala Pro He Leu His Phe Asp Asn He Val Pro 

20 25 30 

Leu Thr Ser Phe Tyr Asp Ala He Ser Gin Asp Phe Lys Asp Ala Ser 

35 40 45 

He Asn Cys Phe Lys Val He Lys Arg Ser Trp Glu Glu Leu Glu Ala 

50 55 60 

Val Ser Thr Met Lys Asn Gly Leu Gin Glu Leu Ser Lys Lys Phe Arg 
65 70 75 80 

Thr Cys Lys Gly Leu His Ser Gin Tyr Ser Ala Arg Asp Trp Leu Ser 

85 90 95 

Gly Ala Phe Val Tyr Thr Ala Met Val Asn Tyr Pro Thr Ala Ala Asn 

100 105 HO 

Phe Met Ala Pro Leu Pro Gly Tyr Pro Val Glu Gin Met Cys Lys He 

115 120 125 

He Asp Gly Phe Pro Arg Gly Ser Ser Asn Leu Asp Arg Ala Phe Ala 

130 135 140 

Ala Ala Ser Leu Tyr Tyr Asn Tyr Ser Gly Ser Glu Lys Cys Phe Glu 
145 150 155 160 

Met Glu Gin Gin Thr Asp Asp His Gly Leu Asp Gly Trp Gin Tyr Gin 

165 170 175 

Arg He Glu Thr Val Leu Lys Arg Phe Gly Ser Asn He He Phe Ser 

180 185 190 

Asn Gly Met Gin Asp Pro Trp Ser Arg Gly Gly Val Leu Lys Asn He 

195 200 205 

Ser Ser Ser He Val Ala Leu Val Thr Lys Lys Gly Ala His His Ala 
210 215 220 
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Asp Leu Arg Ala Ala Thr Lys Asp Asp Pro Glu Trp Leu Lys Glu Gin 
225 230 235 240 

Arg Arg Gin Glu Val Ala He He Glu Lys Trp He Ser Glu Tyr Tyr 

245 250 255 

Arg Asp Leu Arg Glu Glu Gin 
260 

(2) INFORMATION FOR SEQ ID NO: 399: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1993 base pairs 
{ B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1993 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585020 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 399: 
tttcatgttg tcagtatacc tgaccaagtg ggtattagcc aaagatgaag gacctccaga 
aatggttcag atatcagatg ctatacgtga tggagctgaa ggatttttaa gaacgcagta 120 

■ ■ 180 

240 
300 



60 



tggtactata tccaaaatgg cgttcttgct agcttttgtg atcctttgca tatacttgtt 
ccgtaactta accccccagc aagaggcttc tggtttagga aggacaatgt ctgcatacat 
tactgttgct gcatttcttt tgggtgcgct atgttcaggt attgctggat atgttggaat 
gtgggtgtca gttcgtgcta atgtacgggt ttccagtgct gctagacgat ccgcaaggga 360 
agcattgcag atagctgttc gtgctggtgg attctcagct ctggttgttg ttggtatggc 420 
tgtgattggt attgccatcc tgtattctac attttatgta tggttggacg tggattcacc 
tggctcaatg aaggttactg atctgcctct tcttcttgtg ggatatggtt ttggtgcatc 
atttgttgcc ttatttgctc agttgggtgg tggaatatat actaagggag ctgatgtcgg 
ggcagatctg gtcgggaaag ttgagcacgg tattcctgag gatgaccctc ggaaccctgc 
agttatagct gatttggttg gagacaatgt tggggactgc gctgctcgag gtgctgattt 720 

7 8 0 
840 
900 
960 



80 
540 
600 
660 



atttgaaagt atagctgcag aaatcatcag tgcaatgata cttgggggta caatggctca 
gaagtgcaaa attgaagatc catctggttt tattttattt cctctagttg ttcactcgtt 
tgacttggta atatcatcaa ttggtattct atcgatcaaa ggaactcgca atgctagtgt 
gaaatctcca gtagaggatc caatggttgt tcttcagaaa ggatattcgc tgactattat 
attagctgtt ttgacatttg gcgcgtcgac tcgctggctg ctatacacag aacaagctcc 1020 
atctgcttgg ttgaatttct tcatgtgtgg cttagttggc attatcacag cctatgtctt 1080 
tgtctggata tccagatatt atactgacta taagtatgag cctgttcgga cgttggctct 1140 
tgctagctcc actggtcatg gaaccaatat aattgctggg gtcagcttgg gtctggaatc 1200 
cacggctctt cctgttttgg ttataagtgt agctatcatt tctgcttttt ggctgggcaa 1260 
tacctcggga ctaatagatg aaaagggaaa ccccactgga ggtctatttg gaacagctgt 1320 
agctacaatg ggaatgttga gcactgcagc ttatgttctt acaatggaca tgtttggtcc 1380 
catagctgat aatgcaggtg ggattgttga gatgagccag cagccagaaa gtgtccgtga 1440 
gatcactgat gttcttgatg ctgttgggaa taccacaaaa gcaacaacaa aagggtttgc 1500 
tattggatct gctgcccttg catcattcct tctttttagt gcgtatatgg acgaggtttc 1560 
agcgtttgct aatgtatctt ctaaagaggt tgatattgcg atcccagaag tcttcattgg 1620 
agggttatta ggtgccatgc ttatattcct gtttagcgct tgggcttgtg cagcagttgg 1680 
tcgaactgca caggaggttg tcaacgaagt aagaagacag ttcattgaga ggcctggcat 1740 
aatggactac aaggagaagc cagattatgg tcgatgtgtc gccattgtcg catcttcagc 1800 
tttgagggaa atgataaaac caggagcttt ggctataata tcacccattg cagttggttt 1860 
tgtgttccgg atcttgggat actacactgg acaacctttg cttggagcta aagtagtagc 1920 
tgcaatgcta atgtttgcga cacaggaggt gcttgggaca atgcaaagaa atacattgag 1980 
actggagctc tag 

(2) INFORMATION FOR SEQ ID NO: 4 00: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 663 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 
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{ B) LOCATION: 1. . 663 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585021 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 400: 
Phe Met Leu Ser Val Tyr Leu Thr Lys Trp Val Leu Ala Lys Asp Glu 
15 10 15 

Gly Pro Pro Glu Met Val Gin He Ser Asp Ala He Arg Asp Gly Ala 

20 25 30 

Glu Gly Phe Leu Arg Thr Gin Tyr Gly Thr He Ser Lys Met Ala Phe 

35 40 45 

Leu Leu Ala Phe Val He Leu Cys He Tyr Leu Phe Arg Asn Leu Thr 

50 55 60 

Pro Gin Gin Glu Ala Ser Gly Leu Gly Arg Thr Met Ser Ala Tyr He 
65 70 75 80 

Thr Val Ala Ala Phe Leu Leu Gly Ala Leu Cys Ser Gly He Ala Gly 

85 90 95 

Tyr Val Gly Met Trp Val Ser Val Arg Ala Asn Val Arg Val Ser Ser 

100 105 HO 

Ala Ala Arg Arg Ser Ala Arg Glu Ala Leu Gin He Ala Val Arg Ala 

115 120 125 

Gly Gly Phe Ser Ala Leu Val Val Val Gly Met Ala Val He Gly He 

130 135 140 

Ala He Leu Tyr Ser Thr Phe Tyr Val Trp Leu Asp Val Asp Ser Pro 
145 150 155 160 

Gly Ser Met Lys Val Thr Asp Leu Pro Leu Leu Leu Val Gly Tyr Gly 

165 170 175 

Phe Gly Ala Ser Phe Val Ala Leu Phe Ala Gin Leu Gly Gly Gly He 

180 185 190 

Tyr Thr Lys Gly Ala Asp Val Gly Ala Asp Leu Val Gly Lys Val Glu 

195 200 205 

His Gly He Pro Glu Asp Asp Pro Arg Asn Pro Ala Val He Ala Asp 

210 215 220 

Leu Val Gly Asp Asn Val Gly Asp Cys Ala Ala Arg Gly Ala Asp Leu 
225 230 235 240 

Phe Glu Ser He Ala Ala Glu He He Ser Ala Met He Leu Gly Gly 

245 250 255 

Thr Met Ala Gin Lys Cys Lys He Glu Asp Pro Ser Gly Phe He Leu 

260 265 270 

Phe Pro Leu Val Val His Ser Phe Asp Leu Val He Ser Ser He Gly 

275 280 285 

He Leu Ser He Lys Gly Thr Arg Asn Ala Ser Val Lys Ser Pro Val 

290 295 300 

Glu Asp Pro Met Val Val Leu Gin Lys Gly Tyr Ser Leu Thr He He 
305 310 315 320 

Leu Ala Val Leu Thr Phe Gly Ala Ser Thr Arg Trp Leu Leu Tyr Thr 

325 330 335 

Glu Gin Ala Pro Ser Ala Trp Leu Asn Phe Phe Met Cys Gly Leu Val 

340 345 350 

Gly He He Thr Ala Tyr Val Phe Val Trp He Ser Arg Tyr Tyr Thr 

355 360 365 

Asp Tyr Lys Tyr Glu Pro Val Arg Thr Leu Ala Leu Ala Ser Ser Thr 

370 375 380 

Gly His Gly Thr Asn He He Ala Gly Val Ser Leu Gly Leu Glu Ser 
385 390 395 400 

Thr Ala Leu Pro Val Leu Val He Ser Val Ala He He Ser Ala Phe 

405 410 415 

Trp Leu Gly Asn Thr Ser Gly Leu He Asp Glu Lys Gly Asn Pro Thr 

420 425 430 

Gly Gly Leu Phe Gly Thr Ala Val Ala Thr Met Gly Met Leu Ser Thr 

435 440 445 

Ala Ala Tyr Val Leu Thr Met Asp Met Phe Gly Pro He Ala Asp Asn 
450 455 460 
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Ala 


Gly 


Gly 


He 


Val 


Glu 


Met 


Ser 


Gin 


Gin 


Pro 


Glu 


Ser 


Val 


Arg 


Glu 


465 






470 










475 










480 


He 


Thr 


Asp 


Val 


Leu 


Asp 


Ala 


Val 


Gly 


Asn 


Thr 


Thr 


Lys 


Ala 


Thr 


Thr 








485 










490 










495 




Lys 


Gly 


Phe 


Ala 


He 


Gly 


Ser 


Ala 


Ala 


Leu 


Ala 


Ser 


Phe 


Leu 


Leu 


Phe 




500 










505 










510 






Ser 


Ala 


515 


Met 


Asp 


Glu 


Val 


Ser 
520 


Ala 


Phe 


Ala 


Asn 


Val 
525 


Ser 


Ser 


Lys 


Glu 


Val 


Asp 


He 


Ala 


He 


Pro 


Glu 


Val 


Phe 


He 


Gly 


Gly 


Leu 


Leu 


Gly 




530 








535 










540 










Ala 


Met 


Leu 


He 


Phe 


Leu 


Phe 


Ser 


Ala 


Trp 


Ala 


Cys 


Ala 


Ala 


Val 


Gly 


545 










550 










555 










560 


Arg 


Thr 


Ala 


Gin 


Glu 


Val 


Val 


Asn 


Glu 


Val 


Arg 


Arg 


Gin 


Phe 


He 


Glu 








565 










570 










575 




Arg 


Pro 


Gly 


He 


Met 


Asp 


Tyr 


Lys 


Glu 


Lys 


Pro 


Asp 


Tyr 


Gly 


Arg 


Cys 




580 










585 










590 






Val 


Ala 


He 
595 


Val 


Ala 


Ser 


Ser 


Ala 
600 


Leu 


Arg 


Glu 


Met 


He 
605 


Lys 


Pro 


Gly 


Ala 


Leu 
610 


Ala 


He 


He 


Ser 


Pro 
615 


He 


Ala 


Val 


Gly 


Phe 
620 


Val 


Phe 


Arg 


He 


Leu 


Gly 


Tyr 


Tyr 


Thr 


Gly 


Gin 


Pro 


Leu 


Leu 


Gly 


Ala 


Lys 


Val 


Val 


Ala 


625 






630 










635 










640 


Ala 


Met 


Leu 


Met 


Phe 
645 


Ala 


Thr 


Gin 


Glu 


Val 
650 


Leu 


Gly 


Thr 


Met 


Gin 
655 


Arg 


Asn 


Thr 


Leu 


Arg 


Leu 


Glu 


Leu 





















660 

(2) INFORMATION FOR SEQ ID NO:401: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 662 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

( D ) TOPOLOGY : linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION : 1..662 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585022 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO 


:401 










Gly 


Met 


Leu 


Ser 


Val 


Tyr 


Leu 


Thr 


Lys 


Trp 


Val 


Leu 


Ala 


Lys 


Asp 


Glu 


1 








5 










10 










15 




Pro 


Pro 


Glu 


Met 


Val 


Gin 


He 


Ser 


Asp 


Ala 


He 


Arg 


Asp 


Gly 


Ala 


Glu 








20 










25 










30 






Gly 


Phe 


Leu 


Arg 


Thr 


Gin 


Tyr 


Gly 


Thr 


He 


Ser 


Lys 


Met 


Ala 


Phe 


Leu 




35 










40 










45 








Leu 


Ala 


Phe 


Val 


He 


Leu 


Cys 


He 


Tyr 


Leu 


Phe 


Arg 


Asn 


Leu 


Thr 


Pro 




50 










55 










60 










Gin 


Gin 


Glu 


Ala 


Ser 


Gly 


Leu 


Gly Arg 


Thr 


Met 


Ser 


Ala 


Tyr 


He 


Thr 


65 










70 










75 










80 


Val 


Ala 


Ala 


Phe 


Leu 


Leu 


Gly 


Ala 


Leu 


Cys 


Ser 


Gly 


He 


Ala 


Gly 


Tyr 










85 










90 










95 




Val 


Gly 


Met 


Trp 


Val 


Ser 


Val 


Arg 


Ala 


Asn 


Val 


Arg 


Val 


Ser 


Ser 


Ala 






100 










105 










110 






Ala 


Arg 


Arg 


Ser 


Ala 


Arg 


Glu 


Ala 


Leu 


Gin 


He 


Ala 


Val 


Arg 


Ala 


Gly 




115 










120 










125 








Gly 


Phe 


Ser 


Ala 


Leu 


Val 


Val 


Val 


Gly 


Met 


Ala 


Val 


He 


Gly 


He 


Ala 


130 










135 










140 








Gly 


He 


Leu 


Tyr 


Ser 


Thr 


Phe 


Tyr 


Val 


Trp 


Leu 


Asp 


Val 


Asp 


Ser 


Pro 


145 








150 










155 










160 


Ser 


Met 


Lys 


Val 


Thr 


Asp 


Leu 


Pro 


Leu 


Leu 


Leu 


Val 


Gly 


Tyr 


Gly 


Phe 








165 










170 










175 




Gly 


Ala 


Ser 


Phe 


Val 


Ala 


Leu 


Phe 


Ala 


Gin 


Leu 


Gly 


Gly 


Gly 


He 


Tyr 
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180 185 190 

Thr Lys Gly Ala Asp Val Gly Ala Asp Leu Val Gly Lys Val Glu His 

195 200 205 

Gly lie Pro Glu Asp Asp Pro Arg Asn Pro Ala Val lie Ala Asp Leu 

210 215 220 

Val Gly Asp Asn Val Gly Asp Cys Ala Ala Arg Gly Ala Asp Leu Phe 
225 230 235 240 

Glu Ser He Ala Ala Glu He He Ser Ala Met He Leu Gly Gly Thr 

245 250 255 

Met Ala Gin Lys Cys Lys He Glu Asp Pro Ser Gly Phe He Leu Phe 

260 265 270 

Pro Leu Val Val His Ser Phe Asp Leu Val He Ser Ser He Gly He 

275 280 285 

Leu Ser He Lys Gly Thr Arg Asn Ala Ser Val Lys Ser Pro Val Glu 

290 295 300 

Asp Pro Met Val Val Leu Gin Lys Gly Tyr Ser Leu Thr He He Leu 
305 310 315 320 

Ala Val Leu Thr Phe Gly Ala Ser Thr Arg Trp Leu Leu Tyr Thr Glu 

325 330 335 

Gin Ala Pro Ser Ala Trp Leu Asn Phe Phe Met Cys Gly Leu Val Gly 

340 345 350 

He He Thr Ala Tyr Val Phe Val Trp He Ser Arg Tyr Tyr Thr Asp 

355 360 365 

Tyr Lys Tyr Glu Pro Val Arg Thr Leu Ala Leu Ala Ser Ser Thr Gly 

370 375 380 

His Gly Thr Asn He He Ala Gly Val Ser Leu Gly Leu Glu Ser Thr 
385 390 395 400 

Ala Leu Pro Val Leu Val He Ser Val Ala He He Ser Ala Phe Trp 

405 410 415 

Leu Gly Asn Thr Ser Gly Leu He Asp Glu Lys Gly Asn Pro Thr Gly 

420 425 430 

Gly Leu Phe Gly Thr Ala Val Ala Thr Met Gly Met Leu Ser Thr Ala 

435 440 445 

Ala Tyr Val Leu Thr Met Asp Met Phe Gly Pro He Ala Asp Asn Ala 

450 455 460 

Gly Gly He Val Glu Met Ser Gin Gin Pro Glu Ser Val Arg Glu He 
465 470 475 480 

Thr Asp Val Leu Asp Ala Val Gly Asn Thr Thr Lys Ala Thr Thr Lys 

485 490 495 

Gly Phe Ala He Gly Ser Ala Ala Leu Ala Ser Phe Leu Leu Phe Ser 

500 505 510 

Ala Tyr Met Asp Glu Val Ser Ala Phe Ala Asn Val Ser Ser Lys Glu 

515 520 525 

Val Asp He Ala He Pro Glu Val Phe He Gly Gly Leu Leu Gly Ala 

530 535 540 

Met Leu He Phe Leu Phe Ser Ala Trp Ala Cys Ala Ala Val Gly Arg 
545 550 555 560 

Thr Ala Gin Glu Val Val Asn Glu Val Arg Arg Gin Phe He Glu Arg 

565 570 575 

Pro Gly He Met Asp Tyr Lys Glu Lys Pro Asp Tyr Gly Arg Cys Val 

580 585 590 

Ala He Val Ala Ser Ser Ala Leu Arg Glu Met He Lys Pro Gly Ala 

595 600 605 

Leu Ala He He Ser Pro He Ala Val Gly Phe Val Phe Arg He Leu 

610 615 620 

Gly Tyr Tyr Thr Gly Gin Pro Leu Leu Gly Ala Lys Val Val Ala Ala 
625 630 635 640 

Met Leu Met Phe Ala Thr Gin Glu Val Leu Gly Thr Met Gin Arg Asn 

645 650 655 

Thr Leu Arg Leu Glu Leu 
660 
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{2) INFORMATION FOR SEQ ID NO: 402: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 643 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..643 

(D) OTHER INFORMATION: / Ceres Seq, ID 1585023 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 402: 
Met Val Gin He Ser Asp Ala He Arg Asp Gly Ala Glu Gly Phe Leu 
15 10 15 

Arg Thr Gin Tyr Gly Thr He Ser Lys Met Ala Phe Leu Leu Ala Phe 

20 25 30 

Val He Leu Cys He Tyr Leu Phe Arg Asn Leu Thr Pro Gin Gin Glu 

35 40 45 

Ala Ser Gly Leu Gly Arg Thr Met Ser Ala Tyr He Thr Val Ala Ala 

50 55 60 

Phe Leu Leu Gly Ala Leu Cys Ser Gly He Ala Gly Tyr Val Gly Met 
65 70 75 80 

Trp Val Ser Val Arg Ala Asn Val Arg Val Ser Ser Ala Ala Arg Arg 

85 90 95 

Ser Ala Arg Glu Ala Leu Gin He Ala Val Arg Ala Gly Gly Phe Ser 

100 105 HO 

Ala Leu Val Val Val Gly Met Ala Val He Gly He Ala He Leu Tyr 

115 120 125 

Ser Thr Phe Tyr Val Trp Leu Asp Val Asp Ser Pro Gly Ser Met Lys 

130 135 140 

Val Thr Asp Leu Pro Leu Leu Leu Val Gly Tyr Gly Phe Gly Ala Ser 
145 150 155 160 

Phe Val Ala Leu Phe Ala Gin Leu Gly Gly Gly He Tyr Thr Lys Gly 

165 170 175 

Ala Asp Val Gly Ala Asp Leu Val Gly Lys Val Glu His Gly He Pro 

180 185 190 

Glu Asp Asp Pro Arg Asn Pro Ala Val He Ala Asp Leu Val Gly Asp 

195 200 205 

Asn Val Gly Asp Cys Ala Ala Arg Gly Ala Asp Leu Phe Glu Ser He 

210 215 220 

Ala Ala Glu He He Ser Ala Met He Leu Gly Gly Thr Met Ala Gin 
225 230 235 240 

Lys Cys Lys He Glu Asp Pro Ser Gly Phe He Leu Phe Pro Leu Val 

245 250 255 

Val His Ser Phe Asp Leu Val He Ser Ser He Gly He Leu Ser He 

260 265 270 

Lys Gly Thr Arg Asn Ala Ser Val Lys Ser Pro Val Glu Asp Pro Met 

275 280 285 

Val Val Leu Gin Lys Gly Tyr Ser Leu Thr He He Leu Ala Val Leu 

290 295 300 

Thr Phe Gly Ala Ser Thr Arg Trp Leu Leu Tyr Thr Glu Gin Ala Pro 
305 310 315 320 

Ser Ala Trp Leu Asn Phe Phe Met Cys Gly Leu Val Gly He He Thr 

325 330 335 

Ala Tyr Val Phe Val Trp He Ser Arg Tyr Tyr Thr Asp Tyr Lys Tyr 

340 345 350 

Glu Pro Val Arg Thr Leu Ala Leu Ala Ser Ser Thr Gly His Gly Thr 

355 360 365 

Asn He He Ala Gly Val Ser Leu Gly Leu Glu Ser Thr Ala Leu Pro 

370 375 380 

Val Leu Val He Ser Val Ala He He Ser Ala Phe Trp Leu Gly Asn 
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385 390 395 400 

Thr Ser Gly Leu He Asp Glu Lys Gly Asn Pro Thr Gly Gly Leu Phe 

405 410 415 

Gly Thr Ala Val Ala Thr Met Gly Met Leu Ser Thr Ala Ala Tyr Val 

420 425 430 

Leu Thr Met Asp Met Phe Gly Pro He Ala Asp Asn Ala Gly Gly He 

435 440 445 

Val Glu Met Ser Gin Gin Pro Glu Ser Val Arg Glu He Thr Asp Val 

450 455 460 

Leu Asp Ala Val Gly Asn Thr Thr Lys Ala Thr Thr Lys Gly Phe Ala 
465 470 475 480 

He Gly Ser Ala Ala Leu Ala Ser Phe Leu Leu Phe Ser Ala Tyr Met 

485 490 495 

Asp Glu Val Ser Ala Phe Ala Asn Val Ser Ser Lys Glu Val Asp He 

500 505 510 

Ala He Pro Glu Val Phe He Gly Gly Leu Leu Gly Ala Met Leu He 

515 520 525 

Phe Leu Phe Ser Ala Trp Ala Cys Ala Ala Val Gly Arg Thr Ala Gin 

530 535 540 

Glu Val Val Asn Glu Val Arg Arg Gin Phe He Glu Arg Pro Gly He 
545 550 555 560 

Met Asp Tyr Lys Glu Lys Pro Asp Tyr Gly Arg Cys Val Ala He Val 

565 570 575 

Ala Ser Ser Ala Leu Arg Glu Met He Lys Pro Gly Ala Leu Ala He 

580 585 590 

He Ser Pro He Ala Val Gly Phe Val Phe Arg He Leu Gly Tyr Tyr 

595 600 605 

Thr Gly Gin Pro Leu Leu Gly Ala Lys Val Val Ala Ala Met Leu Met 

610 615 620 

Phe Ala Thr Gin Glu Val Leu Gly Thr Met Gin Arg Asn Thr Leu Arg 
625 630 635 640 

Leu Glu Leu 

(2) INFORMATION FOR SEQ ID NO: 4 03: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2080 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..2080 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585024 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 03: 
tttcatgttg tcagtatacc tgaccaagtg ggtattagcc aaagatgaag gacctccaga 
aatggttcag atatcagatg ctatacgtga tggagctgaa ggatttttaa gaacgcagta 120 
tggtactata tccaaaatgg cgttcttgct agcttttgtg atcctttgca tatacttgtt 180 
ccgtaactta accccccagc aagaggcttc tggtttagga aggacaatgt ctgcatacat 240 
tactgttgct gcatttcttt tgggtgcgct atgttcaggt attgctggat atgttggaat 300 
gtgggtgtca gttcgtgcta atgtacgggt ttccagtgct gctagacgat ccgcaaggga 360 
agcattgcag atagctgttc gtgctggtgg attctcagct ctggttgttg ttggtatggc 420 

* 1 j J ~ ~ 480 

540 
600 
660 



60 



v- w ^ ^^^} ~ r< ~j s - -j -> ^ - 

tgtgattggt attgccatcc tgtattctac attttatgta tggttggacg tggattcacc 
tggctcaatg aaggttactg atctgcctct tcttcttgtg ggatatggtt ttggtgcatc 
atttgttgcc ttatttgctc agttgggtgg tggaatatat actaagggag ctgatgtcgg 
ggcagatctg gtcgggaaag ttgagcacgg tattcctgag gatgaccctc ggaaccctgc 
agttatagct gatttggttg gagacaatgt tggggactgc gctgctcgag gtgctgattt 720 
atttgaaagt atagctgcag aaatcatcag tgcaatgata cttgggggta caatggctca 78 0 

gaagtgcaaa attgaagatc catctggttt tattttattt cctctagttg ttcactcgtt 
tgacttggta atatcatcaa ttggtattct atcgatcaaa ggaactcgca atgctagtgt 
gaaatctcca gtagaggatc caatggttgt tcttcagaaa ggatattcgc tgactattat 



840 
900 
960 
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attagctgtt ttgacatttg gcgcgtcgac tcgctggctg ctatacacag aacaagctcc 
atctgcttgg ttgaatttct tcatgtgtgg cttagttggc attatcacag cctatgtctt 
tgtctggata tccagatatt atactgacta taagtatgag cctgttcgga cgttggctct 
tgctagctcc actggtcatg gaaccaatat aattgctggg gtcagcttgg gtctggaatc 
cacggctctt cctgttttgg ttataagtgt agctatcatt tctgcttttt ggctgggcaa 
tacctcggga ctaatagatg aaaagggaaa ccccactgga ggtctatttg gaacagctgt 
agctacaatg ggaatgttga gcactgcagc ttatgttctt acaatggaca tgtttggtcc 
catagctgat aatgcaggtg ggattgttga gatgagccag cagccagaaa gtgtccgtga 
gatcactgat gttcttgatg ctgttgggaa taccacaaaa gcaacaacaa aagggtttgc 
tattggatct gctgcccttg catcattcct tctttttagt gcgtatatgg acgaggtttc 
agcgtttgct aatgtatctt ctaaagaggt tgatattgcg atcccagaag tcttcattgg 
agggttatta ggtgccatgc ttatattcct gtttagcgct tgggcttgtg cagcagttgg 
tcgaactgca caggaggttg tcaacgaagt aagaagacag ttcattgaga ggcctggcat 
aatggactac aaggagaagc cagattatgg tcgatgtgtc gccattgtcg catcttcagc 
tttgagggaa atgataaaac caggagcttt ggctataata tcacccattg cagttggttt 
tgtgttccgg atcttgggat actacactgg acaacctttg cttggagcta aagtagtagc 
tgcaatgcta atgtttgcga cagtatgtgg tatcttaatg gctctattct taaatacagc 
aggaggtgct tgggacaatg caaagaaata cattgagact ggagctctag gaggcaaagg 
aagtgattcc cacaaagctg cagtaactgg tgatacgtaa 
(2) INFORMATION FOR SEQ ID NO: 404; 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 692 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 



peptide 



ID 1585025 



(xi) 



peptide 
1. .692 

(D) OTHER INFORMATION: / Ceres Seq. 
SEQUENCE DESCRIPTION: SEQ ID NO: 404: 
Phe Met Leu Ser Val Tyr Leu Thr Lys Trp Val Leu Ala Lys Asp Glu 
15 10 15 

Gly Pro Pro Glu Met Val Gin lie Ser Asp Ala lie Arg Asp Gly Ala 

20 25 30 

Glu Gly Phe Leu Arg Thr Gin Tyr Gly Thr He Ser Lys Met Ala Phe 

35 40 45 

Leu Leu Ala Phe Val He Leu Cys He Tyr Leu Phe Arg Asn Leu Thr 

50 55 60 

Pro Gin Gin Glu Ala Ser Gly Leu Gly Arg Thr Met Ser Ala Tyr He 
65 70 75 80 



Thr Val Ala Ala Phe Leu Leu Gly Ala Leu Cys Ser Gly He Ala Gly 

85 90 95 

Tyr Val Gly Met Trp Val Ser Val Arg Ala Asn Val Arg Val Ser Ser 

100 105 HO 

Ala Ala Arg Arg Ser Ala Arg Glu Ala Leu Gin He Ala Val Arg Ala 

115 120 125 

Gly Gly Phe Ser Ala Leu Val Val Val Gly Met Ala Val He Gly He 

130 135 140 

Ala He Leu Tyr Ser Thr Phe Tyr Val Trp Leu Asp Val Asp Ser Pro 
145 150 155 160 

Gly Ser Met Lys Val Thr Asp Leu Pro Leu Leu Leu Val Gly Tyr Gly 

165 170 175 

Phe Gly Ala Ser Phe Val Ala Leu Phe Ala Gin Leu Gly Gly Gly He 

180 185 190 

Tyr Thr Lys Gly Ala Asp Val Gly Ala Asp Leu Val Gly Lys Val Glu 

195 200 205 

His Gly He Pro Glu Asp Asp Pro Arg Asn Pro Ala Val He Ala Asp 

210 215 220 

Leu Val Gly Asp Asn Val Gly Asp Cys Ala Ala Arg Gly Ala Asp Leu 
225 230 235 240 



1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
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Phe Glu Ser He Ala Ala Glu He He Ser Ala Met He Leu Gly Gly 

245 250 255 

Thr Met Ala Gin Lys Cys Lys He Glu Asp Pro Ser Gly Phe He Leu 

260 265 270 

Phe Pro Leu Val Val His Ser Phe Asp Leu Val He Ser Ser He Gly 

275 280 285 

He Leu Ser He Lys Gly Thr Arg Asn Ala Ser Val Lys Ser Pro Val 

290 295 300 

Glu Asp Pro Met Val Val Leu Gin Lys Gly Tyr Ser Leu Thr He He 
305 310 315 320 

Leu Ala Val Leu Thr Phe Gly Ala Ser Thr Arg Trp Leu Leu Tyr Thr 

325 330 335 

Glu Gin Ala Pro Ser Ala Trp Leu Asn Phe Phe Met Cys Gly Leu Val 

340 345 350 

Gly He He Thr Ala Tyr Val Phe Val Trp He Ser Arg Tyr Tyr Thr 

355 360 365 

Asp Tyr Lys Tyr Glu Pro Val Arg Thr Leu Ala Leu Ala Ser Ser Thr 

370 375 380 

Gly His Gly Thr Asn He He Ala Gly Val Ser Leu Gly Leu Glu Ser 
385 390 395 400 

Thr Ala Leu Pro Val Leu Val He Ser Val Ala He He Ser Ala Phe 

405 410 415 

Trp Leu Gly Asn Thr Ser Gly Leu He Asp Glu Lys Gly Asn Pro Thr 

420 425 430 

Gly Gly Leu Phe Gly Thr Ala Val Ala Thr Met Gly Met Leu Ser Thr 

435 440 445 

Ala Ala Tyr Val Leu Thr Met Asp Met Phe Gly Pro He Ala Asp Asn 

450 455 460 

Ala Gly Gly He Val Glu Met Ser Gin Gin Pro Glu Ser Val Arg Glu 
465 470 475 480 

He Thr Asp Val Leu Asp Ala Val Gly Asn Thr Thr Lys Ala Thr Thr 

485 490 495 

Lys Gly Phe Ala He Gly Ser Ala Ala Leu Ala Ser Phe Leu Leu Phe 

500 505 510 

Ser Ala Tyr Met Asp Glu Val Ser Ala Phe Ala Asn Val Ser Ser Lys 

515 520 525 

Glu Val Asp He Ala He Pro Glu Val Phe He Gly Gly Leu Leu Gly 

530 535 540 

Ala Met Leu He Phe Leu Phe Ser Ala Trp Ala Cys Ala Ala Val Gly 
545 550 555 560 

Arg Thr Ala Gin Glu Val Val Asn Glu Val Arg Arg Gin Phe He Glu 

565 570 575 

Arg Pro Gly He Met Asp Tyr Lys Glu Lys Pro Asp Tyr Gly Arg Cys 

580 585 590 

Val Ala He Val Ala Ser Ser Ala Leu Arg Glu Met He Lys Pro Gly 

595 600 605 

Ala Leu Ala He He Ser Pro He Ala Val Gly Phe Val Phe Arg He 

610 615 620 

Leu Gly Tyr Tyr Thr Gly Gin Pro Leu Leu Gly Ala Lys Val Val Ala 
625 630 635 640 

Ala Met Leu Met Phe Ala Thr Val Cys Gly He Leu Met Ala Leu Phe 

645 650 655 

Leu Asn Thr Ala Gly Gly Ala Trp Asp Asn Ala Lys Lys Tyr He Glu 

660 665 670 

Thr Gly Ala Leu Gly Gly Lys Gly Ser Asp Ser His Lys Ala Ala Val 

675 680 685 

Thr Gly Asp Thr 
690 

(2) INFORMATION FOR SEQ ID NO: 405: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 691 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION : 1..691 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585026 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 405: 
Met Leu Ser Val Tyr Leu Thr Lys Trp Val Leu Ala Lys Asp Glu Gly 
15 10 15 

Pro Pro Glu Met Val Gin He Ser Asp Ala He Arg Asp Gly Ala Glu 

20 25 30 

Gly Phe Leu Arg Thr Gin Tyr Gly Thr He Ser Lys Met Ala Phe Leu 

35 40 45 

Leu Ala Phe Val He Leu Cys He Tyr Leu Phe Arg Asn Leu Thr Pro 

50 55 60 

Gin Gin Glu Ala Ser Gly Leu Gly Arg Thr Met Ser Ala Tyr He Thr 
65 70 75 80 

Val Ala Ala Phe Leu Leu Gly Ala Leu Cys Ser Gly He Ala Gly Tyr 

85 90 95 

Val Gly Met Trp Val Ser Val Arg Ala Asn Val Arg Val Ser Ser Ala 

100 105 HO 

Ala Arg Arg Ser Ala Arg Glu Ala Leu Gin He Ala Val Arg Ala Gly 

115 120 125 

Gly Phe Ser Ala Leu Val Val Val Gly Met Ala Val He Gly He Ala 

130 135 140 

He Leu Tyr Ser Thr Phe Tyr Val Trp Leu Asp Val Asp Ser Pro Gly 
145 150 155 160 

Ser Met Lys Val Thr Asp Leu Pro Leu Leu Leu Val Gly Tyr Gly Phe 

165 170 175 

Gly Ala Ser Phe Val Ala Leu Phe Ala Gin Leu Gly Gly Gly He Tyr 

180 185 190 

Thr Lys Gly Ala Asp Val Gly Ala Asp Leu Val Gly Lys Val Glu His 

195 200 205 

Gly He Pro Glu Asp Asp Pro Arg Asn Pro Ala Val lie Ala Asp Leu 

210 215 220 

Val Gly Asp Asn Val Gly Asp Cys Ala Ala Arg Gly Ala Asp Leu Phe 
225 230 235 240 

Glu Ser He Ala Ala Glu He He Ser Ala Met He Leu Gly Gly Thr 

245 250 255 

Met Ala Gin Lys Cys Lys He Glu Asp Pro Ser Gly Phe He Leu Phe 

260 265 270 

Pro Leu Val Val His Ser Phe Asp Leu Val He Ser Ser He Gly He 

275 280 285 

Leu Ser He Lys Gly Thr Arg Asn Ala Ser Val Lys Ser Pro Val Glu 

290 295 300 

Asp Pro Met Val Val Leu Gin Lys Gly Tyr Ser Leu Thr He He Leu 
305 310 315 320 

Ala Val Leu Thr Phe Gly Ala Ser Thr Arg Trp Leu Leu Tyr Thr Glu 

325 330 335 

Gin Ala Pro Ser Ala Trp Leu Asn Phe Phe Met Cys Gly Leu Val Gly 

340 345 350 

He He Thr Ala Tyr Val Phe Val Trp He Ser Arg Tyr Tyr Thr Asp 

355 360 365 

Tyr Lys Tyr Glu Pro Val Arg Thr Leu Ala Leu Ala Ser Ser Thr Gly 

370 375 380 

His Gly Thr Asn He He Ala Gly Val Ser Leu Gly Leu Glu Ser Thr 
385 390 395 400 

Ala Leu Pro Val Leu Val He Ser Val Ala He He Ser Ala Phe Trp 
405 410 415 
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Leu Gly Asn Thr Ser Gly Leu He Asp Glu Lys Gly Asn Pro Thr Gly 

420 425 430 

Gly Leu Phe Gly Thr Ala Val Ala Thr Met Gly Met Leu Ser Thr Ala 

435 440 445 

Ala Tyr Val Leu Thr Met Asp Met Phe Gly Pro He Ala Asp Asn Ala 

450 455 460 

Gly Gly He Val Glu Met Ser Gin Gin Pro Glu Ser Val Arg Glu He 
465 470 475 480 

Thr Asp Val Leu Asp Ala Val Gly Asn Thr Thr Lys Ala Thr Thr Lys 

485 490 495 

Gly Phe Ala He Gly Ser Ala Ala Leu Ala Ser Phe Leu Leu Phe Ser 

500 505 510 

Ala Tyr Met Asp Glu Val Ser Ala Phe Ala Asn Val Ser Ser Lys Glu 

515 520 525 

Val Asp He Ala He Pro Glu Val Phe He Gly Gly Leu Leu Gly Ala 

530 535 540 

Met Leu He Phe Leu Phe Ser Ala Trp Ala Cys Ala Ala Val Gly Arg 
545 550 555 560 

Thr Ala Gin Glu Val Val Asn Glu Val Arg Arg Gin Phe He Glu Arg 

565 570 575 

Pro Gly He Met Asp Tyr Lys Glu Lys Pro Asp Tyr Gly Arg Cys Val 

580 585 590 

Ala He Val Ala Ser Ser Ala Leu Arg Glu Met He Lys Pro Gly Ala 

595 600 605 

Leu Ala He He Ser Pro He Ala Val Gly Phe Val Phe Arg He Leu 

610 615 620 

Gly Tyr Tyr Thr Gly Gin Pro Leu Leu Gly Ala Lys Val Val Ala Ala 
625 630 635 640 

Met Leu Met Phe Ala Thr Val Cys Gly He Leu Met Ala Leu Phe Leu 

645 650 655 

Asn Thr Ala Gly Gly Ala Trp Asp Asn Ala Lys Lys Tyr He Glu Thr 

660 665 670 

Gly Ala Leu Gly Gly Lys Gly Ser Asp Ser His Lys Ala Ala Val Thr 

675 680 685 

Gly Asp Thr 
690 

(2) INFORMATION FOR SEQ ID NO: 40 6: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 672 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . .672 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585027 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 


: 406 












Met 


Val 


Gin 


He 


Ser 


Asp 


Ala 


He 


Arg 


Asp 


Gly 


Ala 


Glu 


Gly 


Phe 


Leu 


1 








5 








10 










15 




Arg 


Thr 


Gin 


Tyr 


Gly 


Thr 


He 


Ser 


Lys 


Met 


Ala 


Phe 


Leu 


Leu 


Ala 


Phe 






20 










25 










30 






Val 


He 


Leu 


Cys 


He 


Tyr 


Leu 


Phe 


Arg 


Asn 


Leu 


Thr 


Pro 


Gin 


Gin 


Glu 






35 








40 










45 






Ala 


Ala 


Ser 


Gly 


Leu 


Gly 


Arg 


Thr 


Met 


Ser 


Ala 


Tyr 


He 


Thr 


Val 


Ala 




50 








55 










60 










Phe 


Leu 


Leu 


Gly 


Ala 


Leu 


Cys 


Ser 


Gly 


He 


Ala 


Gly 


Tyr 


Val 


Gly 


Met 


65 








70 










75 










80 


Trp 


Val 


Ser 


Val 


Arg 


Ala 


Asn 


Val 


Arg 


Val 


Ser 


Ser 


Ala 


Ala 


Arg 


Arg 








85 










90 










95 




Ser 


Ala 


Arg 


Glu 


Ala 


Leu 


Gin 


He 


Ala 


Val 


Arg 


Ala 


Gly 


Gly 


Phe 


Ser 
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100 105 110 

Ala Leu Val Val Val Gly Met Ala Val He Gly He Ala He Leu Tyr 

115 120 125 

Ser Thr Phe Tyr Val Trp Leu Asp Val Asp Ser Pro Gly Ser Met Lys 

130 135 140 

Val Thr Asp Leu Pro Leu Leu Leu Val Gly Tyr Gly Phe Gly Ala Ser 
145 150 155 160 

Phe Val Ala Leu Phe Ala Gin Leu Gly Gly Gly He Tyr Thr Lys Gly 

165 170 175 

Ala Asp Val Gly Ala Asp Leu Val Gly Lys Val Glu His Gly He Pro 

180 185 190 

Glu Asp Asp Pro Arg Asn Pro Ala Val He Ala Asp Leu Val Gly Asp 

195 200 205 

Asn Val Gly Asp Cys Ala Ala Arg Gly Ala Asp Leu Phe Glu Ser He 

210 215 220 

Ala Ala Glu He He Ser Ala Met He Leu Gly Gly Thr Met Ala Gin 
225 230 235 240 

Lys Cys Lys He Glu Asp Pro Ser Gly Phe He Leu Phe Pro Leu Val 

245 250 255 

Val His Ser Phe Asp Leu Val He Ser Ser He Gly He Leu Ser He 

260 265 270 

Lys Gly Thr Arg Asn Ala Ser Val Lys Ser Pro Val Glu Asp Pro Met 

275 280 285 

Val Val Leu Gin Lys Gly Tyr Ser Leu Thr He He Leu Ala Val Leu 

290 295 300 

Thr Phe Gly Ala Ser Thr Arg Trp Leu Leu Tyr Thr Glu Gin Ala Pro 
305 310 315 320 

Ser Ala Trp Leu Asn Phe Phe Met Cys Gly Leu Val Gly He He Thr 

325 330 335 

Ala Tyr Val Phe Val Trp He Ser Arg Tyr Tyr Thr Asp Tyr Lys Tyr 

340 345 350 

Glu Pro Val Arg Thr Leu Ala Leu Ala Ser Ser Thr Gly His Gly Thr 

355 360 365 

Asn He He Ala Gly Val Ser Leu Gly Leu Glu Ser Thr Ala Leu Pro 

370 375 380 

Val Leu Val He Ser Val Ala He He Ser Ala Phe Trp Leu Gly Asn 
385 390 395 400 

Thr Ser Gly Leu He Asp Glu Lys Gly Asn Pro Thr Gly Gly Leu Phe 

405 410 415 

Gly Thr Ala Val Ala Thr Met Gly Met Leu Ser Thr Ala Ala Tyr Val 

420 425 430 

Leu Thr Met Asp Met Phe Gly Pro He Ala Asp Asn Ala Gly Gly He 

435 440 445 

Val Glu Met Ser Gin Gin Pro Glu Ser Val Arg Glu He Thr Asp Val 

450 455 460 

Leu Asp Ala Val Gly Asn Thr Thr Lys Ala Thr Thr Lys Gly Phe Ala 
465 470 475 480 

He Gly Ser Ala Ala Leu Ala Ser Phe Leu Leu Phe Ser Ala Tyr Met 

485 490 495 

Asp Glu Val Ser Ala Phe Ala Asn Val Ser Ser Lys Glu Val Asp He 

500 505 510 

Ala He Pro Glu Val Phe He Gly Gly Leu Leu Gly Ala Met Leu He 

515 520 525 

Phe Leu Phe Ser Ala Trp Ala Cys Ala Ala Val Gly Arg Thr Ala Gin 

530 535 540 

Glu Val Val Asn Glu Val Arg Arg Gin Phe He Glu Arg Pro Gly He 
545 550 555 560 

Met Asp Tyr Lys Glu Lys Pro Asp Tyr Gly Arg Cys Val Ala He Val 

565 570 575 

Ala Ser Ser Ala Leu Arg Glu Met He Lys Pro Gly Ala Leu Ala He 
580 585 590 
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lie Ser Pro He 
595 

Thr Gly Gin Pro 
610 

Phe Ala Thr Val 
625 

Gly Gly Ala Trp 

Gly Gly Lys Gly 
660 



Ala Val Gly Phe 
600 

Leu Leu Gly Ala 
615 

Cys Gly He Leu 
630 

Asp Asn Ala Lys 
645 

Ser Asp Ser His 



Val Phe Arg He 

Lys Val Val Ala 
620 

Met Ala Leu Phe 
635 

Lys Tyr He Glu 
650 

Lys Ala Ala Val 
665 



Leu Gly Tyr Tyr 
605 

Ala Met Leu Met 

Leu Asn Thr Ala 
640 

Thr Gly Ala Leu 
655 

Thr Gly Asp Thr 
670 



(2) INFORMATION FOR SEQ ID NO: 4 07: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1296 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1296 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585047 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 407: 
atggctcaac ttcaacagaa ggtgaacaat caagaacagg caaatcgatc cttggtgcag 60 
caactcgaag cagctacctc ccaaggacag atcaggacta ctcgtttcag cgcgaggcat 120 
cttcaggatc gacgggcagc agcagatctc aatcccacac ggctcgtatt ccacacacct 180 
ggcaatacga caaggcccgt ccgcagaacc gcaccggaaa tcggaagaga ccgaaccgag 240 

360 
420 
480 



840 
900 
960 
1020 
1080 



ccggcaattt tgggaaatcg ggaaacaaat cgaaacgaac cgcaactccc tcctccccga 
gcagaggtta ccgaggccga tcatatcgac gcctcggaca atgaagactc cgaggagaat 
attaggtggg ctgaagagta cgccagagaa caggaaataa gcgccatcaa gctatcccta 
gccaaggcag agaacgagat gaagctcgtg agatcccaaa tgcataacgc agtctcctcg 
gcaccaaaca tcgaccgcaa tctggaagaa tcccacaaca caccgttcac acacaagatc 540 
tccaacgcga taatctcaga tccaggaaaa ctaagaatcg agtacttcaa cgggtcttcc 600 
gacccgaaag gacatttaaa gtcgttcatc atctctgtgg cccgagtcaa attcagacca 660 
gaagaaagag atgctggtct ctgtcacctg ttcgtcgagc acttgaaagg gccaaccctg 720 
gattggttct cgagactcga aggaaattct gtggacagtt ttcaggagct atcgacgctc 780 
ttcctgaagc aatattcggt gctaatcgat cccggcacat cagacgccga tctatggtca 
ctatctcggc agcctaatga gccacttcga gacttcctcg cgaaattccg atccacccta 
gccaaagtag aaggaatcaa cgacgtggcg actctctcgg ctctgaagaa agcactgtgg 
tacaaatccg aatttcgaaa ggaactaaat ttgtccaaac cactgacaat ccgagacgcc 
ttgcaccgag cctcggatta cgtatcccat gaagaagaga tgaaactact agctaaaaga 
cccaaaccga ccaagcaaac gcctcacatc gataaacctc aacctggcgc tccgaatcac 1140 
aagaaaggcg cccaaggtgg gacattcgtt caccatgaag gacgaaattt ctccggagcc 1200 
cataattacc aggccgatac accccaaggc gaantgcccg agcccgaggg cgaggccgcg 1260 
gacgaggacg cggtcgggaa tcctacacgt ggataa 
(2) INFORMATION FOR SEQ ID NO: 408: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 431 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..431 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585048 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 408: 
Met Ala Gin Leu Gin Gin Lys Val Asn Asn Gin Glu Gin Ala Asn Arg 
15 10 15 

Ser Leu Val Gin Gin Leu Glu Ala Ala Thr Ser Gin Gly Gin He Arg 
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20 25 30 

Thr Thr Arg Phe Ser Ala Arg His Leu Gin Asp Arg Arg Ala Ala Ala 

35 40 45 

Asp Leu Asn Pro Thr Arg Leu Val Phe His Thr Pro Gly Asn Thr Thr 

50 55 60 

Arg Pro Val Arg Arg Thr Ala Pro Glu He Gly Arg Asp Arg Thr Glu 
65 70 75 80 

Pro Ala He Leu Gly Asn Arg Glu Thr Asn Arg Asn Glu Pro Gin Leu 

85 90 95 

Pro Pro Pro Arg Ala Glu Val Thr Glu Ala Asp His He Asp Ala Ser 

100 105 HO 

Asp Asn Glu Asp Ser Glu Glu Asn He Arg Trp Ala Glu Glu Tyr Ala 

115 120 125 

Arg Glu Gin Glu He Ser Ala He Lys Leu Ser Leu Ala Lys Ala Glu 

130 135 140 

Asn Glu Met Lys Leu Val Arg Ser Gin Met His Asn Ala Val Ser Ser 
145 150 155 160 

Ala Pro Asn He Asp Arg Asn Leu Glu Glu Ser His Asn Thr Pro Phe 

165 170 175 

Thr His Lys He Ser Asn Ala He He Ser Asp Pro Gly Lys Leu Arg 

180 185 190 

He Glu Tyr Phe Asn Gly Ser Ser Asp Pro Lys Gly His Leu Lys Ser 

195 200 205 

Phe He He Ser Val Ala Arg Val Lys Phe Arg Pro Glu Glu Arg Asp 

210 215 220 

Ala Gly Leu Cys His Leu Phe Val Glu His Leu Lys Gly Pro Thr Leu 
225 230 235 240 

Asp Trp Phe Ser Arg Leu Glu Gly Asn Ser Val Asp Ser Phe Gin Glu 

245 250 255 

Leu Ser Thr Leu Phe Leu Lys Gin Tyr Ser Val Leu He Asp Pro Gly 

260 265 270 

Thr Ser Asp Ala Asp Leu Trp Ser Leu Ser Arg Gin Pro Asn Glu Pro 

275 280 285 

Leu Arg Asp Phe Leu Ala Lys Phe Arg Ser Thr Leu Ala Lys Val Glu 

290 295 300 

Gly He Asn Asp Val Ala Thr Leu Ser Ala Leu Lys Lys Ala Leu Trp 
305 310 315 320 

Tyr Lys Ser Glu Phe Arg Lys Glu Leu Asn Leu Ser Lys Pro Leu Thr 

325 330 335 

He Arg Asp Ala Leu His Arg Ala Ser Asp Tyr Val Ser His Glu Glu 

340 345 350 

Glu Met Lys Leu Leu Ala Lys Arg Pro Lys Pro Thr Lys Gin Thr Pro 

355 360 365 

His He Asp Lys Pro Gin Pro Gly Ala Pro Asn His Lys Lys Gly Ala 

370 375 380 

Gin Gly Gly Thr Phe Val His His Glu Gly Arg Asn Phe Ser Gly Ala 
385 390 395 400 

His Asn Tyr Gin Ala Asp Thr Pro Gin Gly Glu Xaa Pro Glu Pro Glu 

405 410 415 

Gly Glu Ala Ala Asp Glu Asp Ala Val Gly Asn Pro Thr Arg Gly 

420 425 430 

(2) INFORMATION FOR SEQ ID NO: 409: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 285 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

{ D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 285 
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(D) OTHER INFORMATION: / Ceres Seq. ID 1585050 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 409: 
Met Lys Leu Val Arg Ser Gin Met His Asn Ala Val Ser Ser Ala Pro 
15 10 15 

Asn He Asp Arg Asn Leu Glu Glu Ser His Asn Thr Pro Phe Thr His 

20 25 30 

Lys He Ser Asn Ala He He Ser Asp Pro Gly Lys Leu Arg He Glu 

35 40 45 

Tyr Phe Asn Gly Ser Ser Asp Pro Lys Gly His Leu Lys Ser Phe He 

50 55 60 

He Ser Val Ala Arg Val Lys Phe Arg Pro Glu Glu Arg Asp Ala Gly 
65 70 75 80 

Leu Cys His Leu Phe Val Glu His Leu Lys Gly Pro Thr Leu Asp Trp 

85 90 95 

Phe Ser Arg Leu Glu Gly Asn Ser Val Asp Ser Phe Gin Glu Leu Ser 

100 105 HO 

Thr Leu Phe Leu Lys Gin Tyr Ser Val Leu He Asp Pro Gly Thr Ser 

115 120 125 

Asp Ala Asp Leu Trp Ser Leu Ser Arg Gin Pro Asn Glu Pro Leu Arg 

130 135 140 

Asp Phe Leu Ala Lys Phe Arg Ser Thr Leu Ala Lys Val Glu Gly He 
145 150 155 160 

Asn Asp Val Ala Thr Leu Ser Ala Leu Lys Lys Ala Leu Trp Tyr Lys 

165 170 175 

Ser Glu Phe Arg Lys Glu Leu Asn Leu Ser Lys Pro Leu Thr He Arg 

180 185 190 

Asp Ala Leu His Arg Ala Ser Asp Tyr Val Ser His Glu Glu Glu Met 

195 200 205 

Lys Leu Leu Ala Lys Arg Pro Lys Pro Thr Lys Gin Thr Pro His He 

210 215 220 

Asp Lys Pro Gin Pro Gly Ala Pro Asn His Lys Lys Gly Ala Gin Gly 
225 230 235 240 

Gly Thr Phe Val His His Glu Gly Arg Asn Phe Ser Gly Ala His Asn 

245 250 255 

Tyr Gin Ala Asp Thr Pro Gin Gly Glu Xaa Pro Glu Pro Glu Gly Glu 

260 265 270 

Ala Ala Asp Glu Asp Ala Val Gly Asn Pro Thr Arg Gly 
275 280 285 

INFORMATION FOR SEQ ID NO: 410: 



(2) 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1317 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE : 

(A) NAME/KEY: 

(B) LOCATION: 



DNA (genomic) 



ID 1585207 



(xi) 



1. .1317 

(D) OTHER INFORMATION: / Ceres Seq. 
SEQUENCE DESCRIPTION: SEQ ID NO: 410: 
atggcgcaga agtttttcaa acgtggaaaa aaccgtcgaa ccactgcaaa caacctagga 
aattctagag ttcgacgaac tatcttcgga tccccaaatt cccttccgcg aaatgcaaca 
agtgatcaaa cgaccccgca tggtacaaat gcggagaatc aaccagaaga tggagaaacg 
catcccgagg acgataacca gttcggtgac cagcacgagc aggaaaactt cgaccagctc 
gaaaccatga aagaagtatg ccgagagttg aaagaaatga ggtcgaaatt ccaccaagct 
acaagctcag agccggatat caaccgggtt atcgaagagg ctaggcgaac actgttcacc 
ccgcgaattg cgagttttag aatcagagat tctcgaaaat tcaacttaga accatataat 
ggtttggaag atccgaaagg ctatctcgca gcttttctaa tagccgctgg gcgagttgac 
ctaaatgaag ccgatgaaga cgcaagcatt tcggacaccg acctctggaa tctatctcaa 
gggccgaatg agacacttcg ggctttcatc accaaattta aaaacgtgct ttcaaagctc 
ccaaggatct cacaacaatc ggctctgtcg gctttgcgaa agggattatg gtacgattcg 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
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aggtttaagg aagatctgat acttcacaaa cctgatacta ttcaagacgc actatttcga 720 
gcgaataatt ggatggaagt cgaagatgaa aaagaaagtt tcgcaaaaag aaataagcaa 780 
gcaaaaccaa cggtcacttt tccgcccaag aagttcgaac ctcgggaaaa tcagggacca 840 
aggaagtttg gttcacaacc attcaataac aacgtcggaa agcagtttca ggggaaagga 900 
aggtcaaata cttgggtccg agatggaagc tcatattgcg atatacaccg agtcactaga 960 
catctgacca aagactgcag tgtccttaag aggcatctcg cagagttatg ggccagtgga 1020 
gatctctcaa aattcaacat ggagaacttc atcaagcaac atcatgaatc aagggataat 1080 
ccagaggctc aaaactctaa aagaccaagg caggtgggcg aagaggaacc tagaacttca 1140 
aagggtaaaa ttaatgtaat tcttggagga tctaaacttt gtcgtgactc catcagtgaa 1200 
aacaagaagc atagacgcaa tgttcatttg aaatcaagct taagtgaaga agtagatttt 12 60 
caaggcactt cgatattgtt cgaagaagga aacgcaacat ctcggaagac ctcatga 
(2) INFORMATION FOR SEQ ID NO: 411: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 438 amino acids 

(B) TYPE: amino acid 
{C} STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/ KEY : peptide 

(B) LOCATION: 1..438 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585208 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 411: 
Met Ala Gin Lys Phe Phe Lys Arg Gly Lys Asn Arg Arg Thr Thr Ala 
15 10 15 

Asn Asn Leu Gly Asn Ser Arg Val Arg Arg Thr He Phe Gly Ser Pro 

20 25 30 

Asn Ser Leu Pro Arg Asn Ala Thr Ser Asp Gin Thr Thr Pro His Gly 

35 40 45 

Thr Asn Ala Glu Asn Gin Pro Glu Asp Gly Glu Thr His Pro Glu Asp 

50 55 60 

Asp Asn Gin Phe Gly Asp Gin His Glu Gin Glu Asn Phe Asp Gin Leu 
65 70 75 80 

Glu Thr Met Lys Glu Val Cys Arg Glu Leu Lys Glu Met Arg Ser Lys 

85 90 95 

Phe His Gin Ala Thr Ser Ser Glu Pro Asp He Asn Arg Val He Glu 

100 105 HO 

Glu Ala Arg Arg Thr Leu Phe Thr Pro Arg He Ala Ser Phe Arg He 

115 120 125 

Arg Asp Ser Arg Lys Phe Asn Leu Glu Pro Tyr Asn Gly Leu Glu Asp 

130 135 140 

Pro Lys Gly Tyr Leu Ala Ala Phe Leu He Ala Ala Gly Arg Val Asp 
145 150 155 160 

Leu Asn Glu Ala Asp Glu Asp Ala Ser He Ser Asp Thr Asp Leu Trp 

165 170 175 

Asn Leu Ser Gin Gly Pro Asn Glu Thr Leu Arg Ala Phe He Thr Lys 

180 185 190 

Phe Lys Asn Val Leu Ser Lys Leu Pro Arg He Ser Gin Gin Ser Ala 

195 200 205 

Leu Ser Ala Leu Arg Lys Gly Leu Trp Tyr Asp Ser Arg Phe Lys Glu 

210 215 220 

Asp Leu He Leu His Lys Pro Asp Thr He Gin Asp Ala Leu Phe Arg 
225 230 235 240 

Ala Asn Asn Trp Met Glu Val Glu Asp Glu Lys Glu Ser Phe Ala Lys 

245 250 255 

Arg Asn Lys Gin Ala Lys Pro Thr Val Thr Phe Pro Pro Lys Lys Phe 

260 265 270 

Glu Pro Arg Glu Asn Gin Gly Pro Arg Lys Phe Gly Ser Gin Pro Phe 

275 280 285 

Asn Asn Asn Val Gly Lys Gin Phe Gin Gly Lys Gly Arg Ser Asn Thr 
290 295 300 
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Trp Val Arg Asp Gly Ser Ser Tyr Cys Asp lie His Arg Val Thr Arg 
305 310 315 320 

His Leu Thr Lys Asp Cys Ser Val Leu Lys Arg His Leu Ala Glu Leu 

325 330 335 

Trp Ala Ser Gly Asp Leu Ser Lys Phe Asn Met Glu Asn Phe He Lys 

340 345 350 

Gin His His Glu Ser Arg Asp Asn Pro Glu Ala Gin Asn Ser Lys Arg 

355 360 365 

Pro Arg Gin Val Gly Glu Glu Glu Pro Arg Thr Ser Lys Gly Lys He 

370 375 380 

Asn Val He Leu Gly Gly Ser Lys Leu Cys Arg Asp Ser He Ser Glu 
385 390 395 400 

Asn Lys Lys His Arg Arg Asn Val His Leu Lys Ser Ser Leu Ser Glu 

405 410 415 

Glu Val Asp Phe Gin Gly Thr Ser He Leu Phe Glu Glu Gly Asn Ala 

420 425 430 

Thr Ser Arg Lys Thr Ser 
435 

(2) INFORMATION FOR SEQ ID NO: 412: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 356 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..356 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585210 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 412: 
Met Lys Glu Val Cys Arg Glu Leu Lys Glu Met Arg Ser Lys Phe His 
15 10 15 

Gin Ala Thr Ser Ser Glu Pro Asp He Asn Arg Val He Glu Glu Ala 

20 25 30 

Arg Arg Thr Leu Phe Thr Pro Arg He Ala Ser Phe Arg He Arg Asp 

35 40 45 

Ser Arg Lys Phe Asn Leu Glu Pro Tyr Asn Gly Leu Glu Asp Pro Lys 

50 55 60 

Gly Tyr Leu Ala Ala Phe Leu He Ala Ala Gly Arg Val Asp Leu Asn 
65 70 75 80 

Glu Ala Asp Glu Asp Ala Ser He Ser Asp Thr Asp Leu Trp Asn Leu 

85 90 95 

Ser Gin Gly Pro Asn Glu Thr Leu Arg Ala Phe He Thr Lys Phe Lys 

100 105 HO 

Asn Val Leu Ser Lys Leu Pro Arg He Ser Gin Gin Ser Ala Leu Ser 

115 120 125 

Ala Leu Arg Lys Gly Leu Trp Tyr Asp Ser Arg Phe Lys Glu Asp Leu 

130 135 140 

He Leu His Lys Pro Asp Thr He Gin Asp Ala Leu Phe Arg Ala Asn 
145 150 155 160 

Asn Trp Met Glu Val Glu Asp Glu Lys Glu Ser Phe Ala Lys Arg Asn 

165 170 175 

Lys Gin Ala Lys Pro Thr Val Thr Phe Pro Pro Lys Lys Phe Glu Pro 

180 185 190 

Arg Glu Asn Gin Gly Pro Arg Lys Phe Gly Ser Gin Pro Phe Asn Asn 

195 200 205 

Asn Val Gly Lys Gin Phe Gin Gly Lys Gly Arg Ser Asn Thr Trp Val 

210 215 220 

Arg Asp Gly Ser Ser Tyr Cys Asp He His Arg Val Thr Arg His Leu 
225 230 235 240 

Thr Lys Asp Cys Ser Val Leu Lys Arg His Leu Ala Glu Leu Trp Ala 
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245 250 255 

Ser Gly Asp Leu Ser Lys Phe Asn Met Glu Asn Phe lie Lys Gin His 

260 265 270 

His Glu Ser Arg Asp Asn Pro Glu Ala Gin Asn Ser Lys Arg Pro Arg 

275 280 285 

Gin Val Gly Glu Glu Glu Pro Arg Thr Ser Lys Gly Lys He Asn Val 

290 295 300 

He Leu Gly Gly Ser Lys Leu Cys Arg Asp Ser He Ser Glu Asn Lys 
305 310 315 320 

Lys His Arg Arg Asn Val His Leu Lys Ser Ser Leu Ser Glu Glu Val 

325 330 335 

Asp Phe Gin Gly Thr Ser He Leu Phe Glu Glu Gly Asn Ala Thr Ser 

340 345 350 

Arg Lys Thr Ser 
355 

(2) INFORMATION FOR SEQ ID NO: 413: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1277 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1277 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585238 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 413: 
cagcattact tctattcctt ctgctcgagc aactctttta ccgtgtcaag aaacgtaacc 
tccctggccc tctcttcgtc tttccgatca ttggaaatgt tgttgcactt atccgcgacc 
caacttcctt ctgggacaag caatccgcga tggcngacac gtccgttggc ctctccgtca 
actaccttat cggaaaattc atcatataca ttaaagacgc agagctctcc aataaagtct 
tctccaacat tcgtcccgat gctttccaac ttgttggaca tccattcgga aagaagctct 
ttggtgatca cagccttatc tttatgtttg gcgagaatca caaatccgtt cgtcgtcaag 
tcgctcctaa cttcacccgc aagccactct ctgcttattc ttccctccag caaatagtta 
tcctccgtca tttacggcag tgggaggaaa gtttttctag tggatctcgt ccggtttcga 
tgagacaact catccgtgaa ctcaacctcg agacttccca aacggttttt gttggaccat 
acctcgacaa ggaagtcaag aacacgatcc gtgatgatta caatgtgttc aatcctggaa 
caatggcgct cccgatcgac ctccctggct tcacgttcgg ggaggctcgt cgggcggtag 
tgctcgttga ttttctgttt gcctcacaag acgcctccac gtcatcactc ctctgggcag 720 
tggtgctgct tgagtcggag ccggaagtgc taagaagagt gagggaggac gttgcaagat 780 
tttggtcgcc tgagtccaac gagtcgatca caaccgatca gctcgcggag atgaagtaca 
ctcgggctgt ggcgcgtgag gtcctaaggt accgaccacc agcaagtatg gtcccacatg 
ttgctgttag tgacttccgt ctcacggaat cgtacacaat ccctaaaggt acaattgtgt 
ttccttccct ttttgacgcc tcgtttcaag ggtttactga accagaccgg ttcgatccag 
accggtttag cgagacaagg caagaggatg aggtgttcaa acgcaatttc ctaacttttg 
gaattggctc gcaccagtgt gtgggccaac gttatgcgat gaaccacctc gtgctcttca 1140 
ttgccatgtt ctcctcgatg tttgatttca agagggtacg atcagatggt tgcgatgaga 1200 
ttgtgcatat ccccacgatg tcgcccaagg acgggtgcac ggtgttcttg tctagccgcc 1260 
tcgttacctc tccttga 

(2) INFORMATION FOR SEQ ID NO: 414: 
(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 424 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..424 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585239 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 414: 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 



840 
900 
960 
1020 
1080 
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Ala Leu Leu Leu Phe Leu Leu Leu Glu Gin Leu Phe Tyr Arg Val Lys 

15 10 15 

Lys Arg Asn Leu Pro Gly Pro Leu Phe Val Phe Pro He He Gly Asn 

20 25 30 

Val Val Ala Leu He Arg Asp Pro Thr Ser Phe Trp Asp Lys Gin Ser 

35 40 45 

Ala Met Xaa Asp Thr Ser Val Gly Leu Ser Val Asn Tyr Leu He Gly 

50 55 60 

Lys Phe He He Tyr He Lys Asp Ala Glu Leu Ser Asn Lys Val Phe 
65 70 75 80 

Ser Asn He Arg Pro Asp Ala Phe Gin Leu Val Gly His Pro Phe Gly 

85 90 95 

Lys Lys Leu Phe Gly Asp His Ser Leu lie Phe Met Phe Gly Glu Asn 

100 105 HO 

His Lys Ser Val Arg Arg Gin Val Ala Pro Asn Phe Thr Arg Lys Pro 

115 120 125 

Leu Ser Ala Tyr Ser Ser Leu Gin Gin He Val He Leu Arg His Leu 

130 135 140 

Arg Gin Trp Glu Glu Ser Phe Ser Ser Gly Ser Arg Pro Val Ser Met 
145 150 155 160 

Arg Gin Leu He Arg Glu Leu Asn Leu Glu Thr Ser Gin Thr Val Phe 

165 170 175 

Val Gly Pro Tyr Leu Asp Lys Glu Val Lys Asn Thr He Arg Asp Asp 

180 185 190 

Tyr Asn Val Phe Asn Pro Gly Thr Met Ala Leu Pro lie Asp Leu Pro 

195 200 205 

Gly Phe Thr Phe Gly Glu Ala Arg Arg Ala Val Val Leu Val Asp Phe 

210 215 220 

Leu Phe Ala Ser Gin Asp Ala Ser Thr Ser Ser Leu Leu Trp Ala Val 
225 230 235 240 

Val Leu Leu Glu Ser Glu Pro Glu Val Leu Arg Arg Val Arg Glu Asp 

245 250 255 

Val Ala Arg Phe Trp Ser Pro Glu Ser Asn Glu Ser He Thr Thr Asp 

260 265 270 

Gin Leu Ala Glu Met Lys Tyr Thr Arg Ala Val Ala Arg Glu Val Leu 

275 280 285 

Arg Tyr Arg Pro Pro Ala Ser Met Val Pro His Val Ala Val Ser Asp 

290 295 300 

Phe Arg Leu Thr Glu Ser Tyr Thr He Pro Lys Gly Thr He Val Phe 
305 310 315 320 

Pro Ser Leu Phe Asp Ala Ser Phe Gin Gly Phe Thr Glu Pro Asp Arg 

325 330 335 

Phe Asp Pro Asp Arg Phe Ser Glu Thr Arg Gin Glu Asp Glu Val Phe 

340 345 350 

Lys Arg Asn Phe Leu Thr Phe Gly He Gly Ser His Gin Cys Val Gly 

355 360 365 

Gin Arg Tyr Ala Met Asn His Leu Val Leu Phe He Ala Met Phe Ser 

370 375 380 

Ser Met Phe Asp Phe Lys Arg Val Arg Ser Asp Gly Cys Asp Glu He 
385 390 395 400 

Val His He Pro Thr Met Ser Pro Lys Asp Gly Cys Thr Val Phe Leu 

405 410 415 

Ser Ser Arg Leu Val Thr Ser Pro 
420 

(2) INFORMATION FOR SEQ ID NO: 415: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
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(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..375 

( D) OTHER INFORMATION: / Ceres Seq. ID 1585240 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 415: 
Met Xaa Asp Thr Ser Val Gly Leu Ser Val Asn Tyr Leu lie Gly Lys 
15 10 15 

Phe lie lie Tyr lie Lys Asp Ala Glu Leu Ser Asn Lys Val Phe Ser 

20 25 30 

Asn He Arg Pro Asp Ala Phe Gin Leu Val Gly His Pro Phe Gly Lys 

35 40 45 

Lys Leu Phe Gly Asp His Ser Leu He Phe Met Phe Gly Glu Asn His 

50 55 60 

Lys Ser Val Arg Arg Gin Val Ala Pro Asn Phe Thr Arg Lys Pro Leu 
65 70 75 80 

Ser Ala Tyr Ser Ser Leu Gin Gin He Val He Leu Arg His Leu Arg 

85 90 95 

Gin Trp Glu Glu Ser Phe Ser Ser Gly Ser Arg Pro Val Ser Met Arg 

100 105 HO 

Gin Leu He Arg Glu Leu Asn Leu Glu Thr Ser Gin Thr Val Phe Val 

115 120 125 

Gly Pro Tyr Leu Asp Lys Glu Val Lys Asn Thr He Arg Asp Asp Tyr 

130 135 140 

Asn Val Phe Asn Pro Gly Thr Met Ala Leu Pro He Asp Leu Pro Gly 
145 150 155 160 

Phe Thr Phe Gly Glu Ala Arg Arg Ala Val Val Leu Val Asp Phe Leu 

165 170 175 

Phe Ala Ser Gin Asp Ala Ser Thr Ser Ser Leu Leu Trp Ala Val Val 

180 185 190 

Leu Leu Glu Ser Glu Pro Glu Val Leu Arg Arg Val Arg Glu Asp Val 

195 200 205 

Ala Arg Phe Trp Ser Pro Glu Ser Asn Glu Ser He Thr Thr Asp Gin 

210 215 220 

Leu Ala Glu Met Lys Tyr Thr Arg Ala Val Ala Arg Glu Val Leu Arg 
225 230 235 240 

Tyr Arg Pro Pro Ala Ser Met Val Pro His Val Ala Val Ser Asp Phe 

245 250 255 

Arg Leu Thr Glu Ser Tyr Thr He Pro Lys Gly Thr He Val Phe Pro 

260 265 270 

Ser Leu Phe Asp Ala Ser Phe Gin Gly Phe Thr Glu Pro Asp Arg Phe 

275 280 285 

Asp Pro Asp Arg Phe Ser Glu Thr Arg Gin Glu Asp Glu Val Phe Lys 

290 295 300 

Arg Asn Phe Leu Thr Phe Gly He Gly Ser His Gin Cys Val Gly Gin 
305 310 315 320 

Arg Tyr Ala Met Asn His Leu Val Leu Phe He Ala Met Phe Ser Ser 

325 330 335 

Met Phe Asp Phe Lys Arg Val Arg Ser Asp Gly Cys Asp Glu He Val 

340 345 350 

His He Pro Thr Met Ser Pro Lys Asp Gly Cys Thr Val Phe Leu Ser 

355 360 365 

Ser Arg Leu Val Thr Ser Pro 
370 375 
(2) INFORMATION FOR SEQ ID NO: 416: 
(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 317 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 
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(A) NAME /KEY : peptide 

(B) LOCATION: 1 . . 317 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585241 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 416: 
Met Phe Gly Glu Asn His Lys Ser Val Arg Arg Gin Val Ala Pro Asn 
15 10 15 

Phe Thr Arg Lys Pro Leu Ser Ala Tyr Ser Ser Leu Gin Gin He Val 

20 25 30 

He Leu Arg His Leu Arg Gin Trp Glu Glu Ser Phe Ser Ser Gly Ser 

35 40 45 

Arg Pro Val Ser Met Arg Gin Leu He Arg Glu Leu Asn Leu Glu Thr 

50 55 60 

Ser Gin Thr Val Phe Val Gly Pro Tyr Leu Asp Lys Glu Val Lys Asn 
65 70 75 80 

Thr He Arg Asp Asp Tyr Asn Val Phe Asn Pro Gly Thr Met Ala Leu 

85 90 95 

Pro He Asp Leu Pro Gly Phe Thr Phe Gly Glu Ala Arg Arg Ala Val 

100 105 HO 

Val Leu Val Asp Phe Leu Phe Ala Ser Gin Asp Ala Ser Thr Ser Ser 

115 120 125 

Leu Leu Trp Ala Val Val Leu Leu Glu Ser Glu Pro Glu Val Leu Arg 

130 135 140 

Arg Val Arg Glu Asp Val Ala Arg Phe Trp Ser Pro Glu Ser Asn Glu 
145 150 155 160 

Ser He Thr Thr Asp Gin Leu Ala Glu Met Lys Tyr Thr Arg Ala Val 

165 170 175 

Ala Arg Glu Val Leu Arg Tyr Arg Pro Pro Ala Ser Met Val Pro His 

180 185 190 

Val Ala Val Ser Asp Phe Arg Leu Thr Glu Ser Tyr Thr He Pro Lys 

195 200 205 

Gly Thr He Val Phe Pro Ser Leu Phe Asp Ala Ser Phe Gin Gly Phe 

210 215 220 

Thr Glu Pro Asp Arg Phe Asp Pro Asp Arg Phe Ser Glu Thr Arg Gin 
225 230 235 240 

Glu Asp Glu Val Phe Lys Arg Asn Phe Leu Thr Phe Gly He Gly Ser 

245 250 255 

His Gin Cys Val Gly Gin Arg Tyr Ala Met Asn His Leu Val Leu Phe 

260 265 270 

He Ala Met Phe Ser Ser Met Phe Asp Phe Lys Arg Val Arg Ser Asp 

275 280 285 

Gly Cys Asp Glu He Val His He Pro Thr Met Ser Pro Lys Asp Gly 

290 295 300 

Cys Thr Val Phe Leu Ser Ser Arg Leu Val Thr Ser Pro 
305 310 315 

(2) INFORMATION FOR SEQ ID NO: 417: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 430 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..430 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585308 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 417: 
tatcattctt gtaatctacg ccatcatcga ttgcattctc cgaccattct tacgaacatg 
tctcgaccta gatctagaaa tcggagtgca aagaggccaa caacgtgcta gaatcattac 
ctaccacaca atcatcccga ctggtctacg gttaccagat ttcgagagaa aaaaaaagcg 
cgggctgaaa caaagcgcga tcgaaaccct actaccgaag ttgcttgttg gacaaggcaa 24 0 

ccatgaggag gatgaagaaa agtcactaga atcaagagaa tgcgcgattt gtctcagtgg 300 



60 
120 
180 
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ttatgtcgtt aatgaagaat gcagagtatt tcctgtttgc agacatatct atcatgcgct 360 
ttgtatcgac gcttggctta agaatcatct cacatgtcct acttgtcgta aagatctacc 420 
agaatcatga 

{2} INFORMATION FOR SEQ ID NO: 418: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 142 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..142 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585309 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 418: 



He 


He 


Leu 


Val 


He 


Tyr 


Ala 


He 


He 


Asp 


Cys 


He 


Leu 


Arg 


Pro 


Phe 


1 








5 










10 










15 




Leu 


Arg 


Thr 


Cys 


Leu 


Asp 


Leu 


Asp 


Leu 


Glu 


He 


Gly 


Val 


Gin 


Arg 


Gly 






20 










25 










30 






Gin 


Gin 


Arg 
35 


Ala 


Arg 


He 


He 


Thr 
40 


Tyr 


His 


Thr 


He 


He 
45 


Pro 


Thr 


Gly 


Leu 


Arg 
50 


Leu 


Pro 


Asp 


Phe 


Glu 
55 


Arg 


Lys 


Lys 


Lys 


Arg 
60 


Gly 


Leu 


Lys 


Gin 


Ser 


Ala 


He 


Glu 


Thr 


Leu 


Leu 


Pro 


Lys 


Leu 


Leu 


Val 


Gly 


Gin 


Gly 


Asn 


65 










70 










75 










80 


His 


Glu 


Glu 


Asp 


Glu 


Glu 


Lys 


Ser 


Leu 


Glu 


Ser 


Arg 


Glu 


Cys 


Ala 


He 








85 










90 










95 




Cys 


Leu 


Ser 


Gly 


Tyr 


Val 


Val 


Asn 


Glu 


Glu 


Cys 


Arg 


Val 


Phe 


Pro 


Val 






100 










105 










110 






Cys 


Arg 


His 


He 


Tyr 


His 


Ala 


Leu 


Cys 


He 


Asp 


Ala 


Trp 


Leu 


Lys 


Asn 


115 










120 










125 








His 


Leu 


Thr 


Cys 


Pro 


Thr 


Cys 


Arg 


Lys 


Asp 


Leu 


Pro 


Glu 


Ser 








130 








135 










140 










(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO : 4 1 9 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 484 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..484 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585351 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 419: 
cttcaacaga aggtgaacaa tcaagaacag gcaaatcgat ccttggtgca gcaactcgaa 
gcagctacct cccaaggaca gatnaggact actcgtttca gcgcgaggca tcttcaggat 120 
cgacgggcag cagcagatct caatcccaca cggctcgtat tccacacacc tggcaatacg 1Rn 
acaaggcccg tccgcagaac cgcaccggaa atcggaagag accgaaccga gccggcaatt 
ttgggaaatc gggaaacaaa tcgaaacgaa ccgcaactcc ctcctccccg agcagaggtt 
accgaggccg atcatatcga cgcctcggac aatgaagact ccgaggagaa tattaggtgg 360 
gctgaagagt acgccagaga acaggaaata agcgccatca agctatccct agccaaggca 420 
gagaacgaga tgaagctcgt gagatcccaa atgcataacg cntctcctcg gcaccaaaca 480 
tcga 

(2) INFORMATION FOR SEQ ID NO: 420: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 161 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



60 



180 
240 
300 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 2 
Page 259 



(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..161 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585352 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 420: 



Leu 


Gin 


Gin 


Lys 


Val 


Asn 


Asn 


Gin 


Glu 


Gin 


Ala 


Asn 


Arg 


Ser 


Leu 


Val 


1 






5 










10 










15 




Gin 


Gin 


Leu 


Glu 


Ala 


Ala 


Thr 


Ser 


Gin 


Gly 


Gin 


Xaa 


Arg 


Thr 


Thr 


Arg 








20 










25 










30 






Phe 


Ser 


Ala 


Arg 


His 


Leu 


Gin 


Asp 


Arg 


Arg 


Ala 


Ala 


Ala 


Asp 


Leu 


Asn 






35 










40 










45 








Pro 


Thr 


Arg 


Leu 


Val 


Phe 


His 


Thr 


Pro 


Gly Asn 


Thr 


Thr 


Arg 


Pro 


Val 




50 








55 










60 










Arg 


Arg 


Thr 


Ala 


Pro 


Glu 


He 


Gly 


Arg 


Asp 


Arg 


Thr 


Glu 


Pro 


Ala 


He 


65 








70 










75 










80 


Leu 


Gly 


Asn 


Arg 


Glu 


Thr 


Asn 


Arg 


Asn 


Glu 


Pro 


Gin 


Leu 


Pro 


Pro 


Pro 






85 










90 










95 




Arg 


Ala 


Glu 


Val 


Thr 


Glu 


Ala 


Asp 


His 


He 


Asp 


Ala 


Ser 


Asp 


Asn 


Glu 






100 










105 










110 






Asp 


Ser 


Glu 


Glu 


Asn 


He 


Arg 


Trp 


Ala 


Glu 


Glu 


Tyr 


Ala 


Arg 


Glu 


Gin 




115 










120 










125 








Glu 


lie 


Ser 


Ala 


He 


Lys 


Leu 


Ser 


Leu 


Ala 


Lys 


Ala 


Glu 


Asn 


Glu 


Met 




130 










135 










140 










Lys 


Leu 


Val 


Arg 


Ser 


Gin 


Met 


His 


Asn 


Xaa 


Ser 


Pro 


Arg 


His 


Gin 


Thr 


145 










150 










155 










160 



Ser 



(2) INFORMATION FOR SEQ ID NO: 421: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 110 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME/KEY: peptide 

(B) LOCATION: 1..110 







(D) OTHER 


INFORMATION: 


/ Ceres 


Seq 


. ID 


158 


5353 








(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO 


: 421 












Ser 


Thr 


Glu 


Gly 


Glu 


Gin 


Ser Arg 


Thr 


Gly 


Lys 


Ser 


He 


Leu 


Gly 


Ala 


1 






5 








10 










15 




Ala 


Thr 


Arg 


Ser 


Ser 


Tyr 


Leu Pro 


Arg 


Thr 


Asp 


Xaa 


Asp 


Tyr 


Ser 


Phe 






20 








25 










30 






Gin 


Arg 


Glu 


Ala 


Ser 


Ser 


Gly Ser 


Thr 


Gly 


Ser 


Ser 


Arg 


Ser 


Gin 


Ser 




35 








40 










45 








His 


Thr 


Ala 


Arg 


He 


Pro 


His Thr 


Trp 


Gin 


Tyr 


Asp 


Lys 


Ala 


Arg 


Pro 




50 










55 








60 










Gin 


Asn 


Arg 


Thr 


Gly 


Asn 


Arg Lys 


Arg 


Pro 


Asn 


Arg 


Ala 


Gly 


Asn 


Phe 


65 








70 








75 










80 


Gly 


Lys 


Ser 


Gly 


Asn 


Lys 


Ser Lys 


Arg 


Thr 


Ala 


Thr 


Pro 


Ser 


Ser 


Pro 






85 








90 










95 




Ser 


Arg 


Gly 


Tyr 


Arg 


Gly 


Arg Ser 


Tyr 


Arg 


Arg 


Leu 


Gly 


Gin 








100 








105 










110 






(2) 


INFORMATION 


FOR 


SEQ 


ID NO:422: 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2250 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 
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(A) NAME /KEY : - 

(B) LOCATION: 1..2250 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585458 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 422: 
gaatcatcat ggtgagaaag agaagaacgg atgctccatc tgaaggaggt gaaggctctg 60 
ggtctcgtga agctggtcca gtctcaggtg gtggacgtgg ttcacagcga ggtggtttcc 120 
agcagggagg aggacaacac caaggtggaa ggggttatac tcctcaacct caacagggag 180 
gtcgtggtgg tcgtggatat gggcaaccac cacaacagca acaacagtat ggaggaccac 240 

360 
420 
480 
540 
600 
660 



aagagtacca aggaagagga agaggaggac ctcctcatca aggaggtcga ggagggtatg 
gcggtggccg tggaggtgga ccttcttctg gaccaccgca gagacaatca gttcccgagc 
tgcatcaagc tacctcacct acttatcaag cggtgtcttc tcagcctaca ctgtctgagg 
tgagtcctac ccaggtacca gaacctactg ttctggctca gcaatttgaa caactctctg 
ttgaacaagg agctcccagt caggcaatcc agcctatacc gtcttctagc aaggctttca 
agtttccaat gaggcctggt aaaggacaga gtggaaagcg ttgcattgtg aaggctaacc 
atttctttgc tgaactgcct gataaggatt tgcaccatta tgatgttacc attactccgg 
aagttacatc aaggggtgtc aatcgtgctg tgatgaaaca acttgttgat aattatcgtg 720 

7 80 
840 
900 
960 
1020 
1080 



attctcacct tggaagtcgt cttccagcgt atgatggtcg aaaaagtctt tacactgctg 
gtccacttcc ctttaactcc aaggagttca gaatcaatct tcttgacgaa gaagtagggg 
ctggaggtca aaggggaaac aatcagatgc cccacaggaa gctctgcagg ttcttgacat 
tgttcttcgt gagctgccga cctctaggta ctggttactc atcctatgga tgtttgcttc 
tttactacca tgcttatatg tcatcgacag ccttcataga ggcaaaccct gtgattcagt 
ttgtctgtga tttgcttaac cgggatattt cttctcgacc tttatctgat gctgatcgtg 
ttaagataaa aaaggctctt agaggtgtca aagttgaagt gactcatcga ggaaacatgc 1140 
gccggaagta ccgcatttcc ggtttgactg ctgtggccac tcgggaattg acattcccag 1200 
tagatgaaag aaatactcag aaatctgttg tagaatactt ccacgaaaca tatggttttc 1260 
gcattcagca cactcaacta ccatgcttgc aagttgggaa ttctaatagg cctaattact 1320 
taccaatgga ggtatgcaag attgttgaag gccagcggta ttccaaaaga ttgaatgaga 1380 
gacagatcac tgctttgctg aaggttacct gtcagcgccc gatagatcga gaaaaagata 1440 
tcttacagac ggtgcaactc aatgattatg ctaaagataa ttatgctcaa gagtttggca 1500 
tcaaaataag tacttctctg gcttctgttg aggctcgtat actgcctcct ccatggctta 1560 
agtaccacga gtctggaagg gaagggactt gtctgccaca agttggtcaa tggaacatga 1620 
tgaataagaa aatgatcaat ggtggaacgg tgaataattg gatctgcatc aacttttcta 1680 
ggcaagtgca ggacaatcta gcgcgtacat tttgtcagga acttgctcaa atgtgttacg 1740 
tatctggcat ggcatttaat ccggaaccag tcctcccacc agtcagtgct cgccctgagc 1800 
aagtagagaa ggtcttgaag actagatatc atgatgccac atcaaaactc tcccaaggaa 18 60 
aagaaattga tctgcttatt gtcattctgc ccgataataa tggatcatta tacggtgatt 1920 
tgaaacgcat atgtgagact gaactcggca tagtctctca atgttgcctg acaaagcatg 1980 
tctttaagat gagcaaacaa tacatggcta atgttgcgct gaagattaat gtgaaggttg 2040 
gaggaagaaa cacagtgctt gttgatgctc tatctaggcg gattcctcta gtcagtgatc 2100 
gacccaccat tatatttggt gctgatgtta cccaccctca ccctggagag gattcaagcc 2160 
catctattgc tgctgttgtg gcatctcagg attggcctga aatcactaaa tatgctggat 2220 
tagtttgcgc tcaagcgcat aggcaggaga 
(2) INFORMATION FOR SEQ ID NO: 4 23: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 749 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY: peptide 

(B) LOCATION: 1..74 9 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585459 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 423: 
lie lie Met Val Arg Lys Arg Arg Thr Asp Ala Pro Ser Glu Gly Gly 
15 10 15 

Glu Gly Ser Gly Ser Arg Glu Ala Gly Pro Val Ser Gly Gly Gly Arg 

20 25 30 

Gly Ser Gin Arg Gly Gly Phe Gin Gin Gly Gly Gly Gin His Gin Gly 

35 40 45 

Gly Arg Gly Tyr Thr Pro Gin Pro Gin Gin Gly Gly Arg Gly Gly Arg 
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50 

Gly Tyr Gly Gin 
65 

Glu Tyr Gin Gly 

Gly Gly Tyr Gly 
100 

Gin Arg Gin Ser 
115 

Gin Ala Val Ser 
130 

Val Pro Glu Pro 
145 

Glu Gin Gly Ala 

Lys Ala Phe Lys 
180 

Arg Cys lie Val 
195 

Asp Leu His His 
210 

Gly Val Asn Arg 
225 

Ser His Leu Gly 

Tyr Thr Ala Gly 
260 

Leu Leu Asp Glu 
275 

Met Pro His Arg 
290 

Cys Arg Pro Leu 
305 

Tyr Tyr His Ala 

Val lie Gin Phe 
340 

Pro Leu Ser Asp 
355 

Val Lys Val Glu 
370 

lie Ser Gly Leu 
385 

Asp Glu Arg Asn 

Tyr Gly Phe Arg 
420 

Asn Ser Asn Arg 
435 

Glu Gly Gin Arg 
450 

Leu Leu Lys Val 
465 

Leu Gin Thr Val 

Glu Phe Gly lie 
500 

lie Leu Pro Pro 
515 

Thr Cys Leu Pro 
530 



55 

Pro Pro Gin Gin 
70 

Arg Gly Arg Gly 
85 

Gly Gly Arg Gly 

Val Pro Glu Leu 
120 

Ser Gin Pro Thr 
135 

Thr Val Leu Ala 
150 

Pro Ser Gin Ala 
165 

Phe Pro Met Arg 

Lys Ala Asn His 
200 

Tyr Asp Val Thr 

215 

Ala Val Met Lys 
230 

Ser Arg Leu Pro 
245 

Pro Leu Pro Phe 

Glu Val Gly Ala 
280 

Lys Leu Cys Arg 
295 

Gly Thr Gly Tyr 
310 

Tyr Met Ser Ser 
325 

Val Cys Asp Leu 

Ala Asp Arg Val 
360 

Val Thr His Arg 
375 

Thr Ala Val Ala 
390 

Thr Gin Lys Ser 
405 

lie Gin His Thr 

Pro Asn Tyr Leu 
440 

Tyr Ser Lys Arg 
455 

Thr Cys Gin Arg 
470 

Gin Leu Asn Asp 
485 

Lys lie Ser Thr 

Pro Trp Leu Lys 
520 

Gin Val Gly Gin 
535 



60 

Gin Gin Gin Tyr 
75 

Gly Pro Pro His 
90 

Gly Gly Pro Ser 
105 

His Gin Ala Thr 

Leu Ser Glu Val 

140 

Gin Gin Phe Glu 
155 

lie Gin Pro lie 
170 

Pro Gly Lys Gly 
185 

Phe Phe Ala Glu 

lie Thr Pro Glu 
220 

Gin Leu Val Asp 
235 

Ala Tyr Asp Gly 
250 

Asn Ser Lys Glu 
265 

Gly Gly Gin Arg 

Phe Leu Thr Leu 
300 

Ser Ser Tyr Gly 
315 

Thr Ala Phe lie 
330 

Leu Asn Arg Asp 
345 

Lys lie Lys Lys 

Gly Asn Met Arg 
380 

Thr Arg Glu Leu 
395 

Val Val Glu Tyr 
410 

Gin Leu Pro Cys 
425 

Pro Met Glu Val 

Leu Asn Glu Arg 
460 

Pro lie Asp Arg 
475 

Tyr Ala Lys Asp 
490 

Ser Leu Ala Ser 
505 

Tyr His Glu Ser 

Trp Asn Met Met 
540 



Gly Gly Pro Gin 
80 

Gin Gly Gly Arg 
95 

Ser Gly Pro Pro 
110 

Ser Pro Thr Tyr 
125 

Ser Pro Thr Gin 

Gin Leu Ser Val 
160 

Pro Ser Ser Ser 
175 

Gin Ser Gly Lys 
190 

Leu Pro Asp Lys 
205 

Val Thr Ser Arg 

Asn Tyr Arg Asp 
240 

Arg Lys Ser Leu 
255 

Phe Arg lie Asn 
270 

Gly Asn Asn Gin 
285 

Phe Phe Val Ser 

Cys Leu Leu Leu 
320 

Glu Ala Asn Pro 
335 

lie Ser Ser Arg 
350 

Ala Leu Arg Gly 
365 

Arg Lys Tyr Arg 

Thr Phe Pro Val 
400 

Phe His Glu Thr 
415 

Leu Gin Val Gly 
430 

Cys Lys lie Val 
445 

Gin He Thr Ala 

Glu Lys Asp He 
480 

Asn Tyr Ala Gin 
495 

Val Glu Ala Arg 
510 

Gly Arg Glu Gly 
525 

Asn Lys Lys Met 
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He Asn Gly Gly Thr Val Asn Asn Trp He Cys He Asn Phe Ser Arg 
545 550 555 560 

Gin Val Gin Asp Asn Leu Ala Arg Thr Phe Cys Gin Glu Leu Ala Gin 

565 570 575 

Met Cys Tyr Val Ser Gly Met Ala Phe Asn Pro Glu Pro Val Leu Pro 

580 585 590 

Pro Val Ser Ala Arg Pro Glu Gin Val Glu Lys Val Leu Lys Thr Arg 

595 600 605 

Tyr His Asp Ala Thr Ser Lys Leu Ser Gin Gly Lys Glu He Asp Leu 

610 615 620 

Leu He Val He Leu Pro Asp Asn Asn Gly Ser Leu Tyr Gly Asp Leu 
625 630 635 640 

Lys Arg He Cys Glu Thr Glu Leu Gly He Val Ser Gin Cys Cys Leu 

645 650 655 

Thr Lys His Val Phe Lys Met Ser Lys Gin Tyr Met Ala Asn Val Ala 

660 665 670 

Leu Lys He Asn Val Lys Val Gly Gly Arg Asn Thr Val Leu Val Asp 

675 680 685 

Ala Leu Ser Arg Arg He Pro Leu Val Ser Asp Arg Pro Thr He He 

690 695 700 

Phe Gly Ala Asp Val Thr His Pro His Pro Gly Glu Asp Ser Ser Pro 
705 710 715 720 

Ser He Ala Ala Val Val Ala Ser Gin Asp Trp Pro Glu He Thr Lys 

725 730 735 

Tyr Ala Gly Leu Val Cys Ala Gin Ala His Arg Gin Glu 

740 745 
(2) INFORMATION FOR SEQ ID NO: 424: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 47 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1 . .747 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585460 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 424: 
Met Val Arg Lys Arg Arg Thr Asp Ala Pro Ser Glu Gly Gly Glu Gly 
15 10 15 

Ser Gly Ser Arg Glu Ala Gly Pro Val Ser Gly Gly Gly Arg Gly Ser 

20 25 30 

Gin Arg Gly Gly Phe Gin Gin Gly Gly Gly Gin His Gin Gly Gly Arg 

35 40 45 

Gly Tyr Thr Pro Gin Pro Gin Gin Gly Gly Arg Gly Gly Arg Gly Tyr 

50 55 60 

Gly Gin Pro Pro Gin Gin Gin Gin Gin Tyr Gly Gly Pro Gin Glu Tyr 
65 70 75 80 

Gin Gly Arg Gly Arg Gly Gly Pro Pro His Gin Gly Gly Arg Gly Gly 

85 90 95 

Tyr Gly Gly Gly Arg Gly Gly Gly Pro Ser Ser Gly Pro Pro Gin Arg 

100 105 HO 

Gin Ser Val Pro Glu Leu His Gin Ala Thr Ser Pro Thr Tyr Gin Ala 

115 120 125 

Val Ser Ser Gin Pro Thr Leu Ser Glu Val Ser Pro Thr Gin Val Pro 

130 135 140 

Glu Pro Thr Val Leu Ala Gin Gin Phe Glu Gin Leu Ser Val Glu Gin 
145 150 155 160 

Gly Ala Pro Ser Gin Ala He Gin Pro He Pro Ser Ser Ser Lys Ala 

165 170 175 

Phe Lys Phe Pro Met Arg Pro Gly Lys Gly Gin Ser Gly Lys Arg Cys 



Attorney Docket No. 2750-1237P 
Client Docket No. 80146.003 



Table 2 
Page 263 



180 185 190 

lie Val Lys Ala Asn His Phe Phe Ala Glu Leu Pro Asp Lys Asp Leu 

195 200 205 

His His Tyr Asp Val Thr He Thr Pro Glu Val Thr Ser Arg Gly Val 

210 215 220 

Asn Arg Ala Val Met Lys Gin Leu Val Asp Asn Tyr Arg Asp Ser His 
225 230 235 240 

Leu Gly Ser Arg Leu Pro Ala Tyr Asp Gly Arg Lys Ser Leu Tyr Thr 

245 250 255 

Ala Gly Pro Leu Pro Phe Asn Ser Lys Glu Phe Arg He Asn Leu Leu 

260 265 270 

Asp Glu Glu Val Gly Ala Gly Gly Gin Arg Gly Asn Asn Gin Met Pro 

275 280 285 

His Arg Lys Leu Cys Arg Phe Leu Thr Leu Phe Phe Val Ser Cys Arg 

290 295 300 

Pro Leu Gly Thr Gly Tyr Ser Ser Tyr Gly Cys Leu Leu Leu Tyr Tyr 
305 310 315 320 

His Ala Tyr Met Ser Ser Thr Ala Phe He Glu Ala Asn Pro Val He 

325 330 335 

Gin Phe Val Cys Asp Leu Leu Asn Arg Asp He Ser Ser Arg Pro Leu 

340 345 350 

Ser Asp Ala Asp Arg Val Lys He Lys Lys Ala Leu Arg Gly Val Lys 

355 360 365 

Val Glu Val Thr His Arg Gly Asn Met Arg Arg Lys Tyr Arg He Ser 

370 375 380 

Gly Leu Thr Ala Val Ala Thr Arg Glu Leu Thr Phe Pro Val Asp Glu 
385 390 395 400 

Arg Asn Thr Gin Lys Ser Val Val Glu Tyr Phe His Glu Thr Tyr Gly 

405 410 415 

Phe Arg He Gin His Thr Gin Leu Pro Cys Leu Gin Val Gly Asn Ser 

420 425 430 

Asn Arg Pro Asn Tyr Leu Pro Met Glu Val Cys Lys He Val Glu Gly 

435 440 445 

Gin Arg Tyr Ser Lys Arg Leu Asn Glu Arg Gin He Thr Ala Leu Leu 

450 455 460 

Lys Val Thr Cys Gin Arg Pro He Asp Arg Glu Lys Asp He Leu Gin 
465 470 475 480 

Thr Val Gin Leu Asn Asp Tyr Ala Lys Asp Asn Tyr Ala Gin Glu Phe 

485 490 495 

Gly He Lys He Ser Thr Ser Leu Ala Ser Val Glu Ala Arg He Leu 

500 505 510 

Pro Pro Pro Trp Leu Lys Tyr His Glu Ser Gly Arg Glu Gly Thr Cys 

515 520 525 

Leu Pro Gin Val Gly Gin Trp Asn Met Met Asn Lys Lys Met He Asn 

530 535 540 

Gly Gly Thr Val Asn Asn Trp He Cys He Asn Phe Ser Arg Gin Val 
545 550 555 560 

Gin Asp Asn Leu Ala Arg Thr Phe Cys Gin Glu Leu Ala Gin Met Cys 

565 570 575 

Tyr Val Ser Gly Met Ala Phe Asn Pro Glu Pro Val Leu Pro Pro Val 

580 585 590 

Ser Ala Arg Pro Glu Gin Val Glu Lys Val Leu Lys Thr Arg Tyr His 

595 600 605 

Asp Ala Thr Ser Lys Leu Ser Gin Gly Lys Glu He Asp Leu Leu He 

610 615 620 

Val He Leu Pro Asp Asn Asn Gly Ser Leu Tyr Gly Asp Leu Lys Arg 
625 630 635 640 

He Cys Glu Thr Glu Leu Gly He Val Ser Gin Cys Cys Leu Thr Lys 

645 650 655 

His Val Phe Lys Met Ser Lys Gin Tyr Met Ala Asn Val Ala Leu Lys 
660 665 670 
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lie Asn Val Lys Val Gly Gly Arg Asn Thr Val Leu Val Asp Ala Leu 

675 680 685 

Ser Arg Arg lie Pro Leu Val Ser Asp Arg Pro Thr lie lie Phe Gly 

690 695 700 

Ala Asp Val Thr His Pro His Pro Gly Glu Asp Ser Ser Pro Ser He 
705 710 715 720 

Ala Ala Val Val Ala Ser Gin Asp Trp Pro Glu He Thr Lys Tyr Ala 

725 730 735 

Gly Leu Val Cys Ala Gin Ala His Arg Gin Glu 

740 745 
(2) INFORMATION FOR SEQ ID NO: 425: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 67 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..567 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585461 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 425: 
Met Arg Pro Gly Lys Gly Gin Ser Gly Lys Arg Cys He Val Lys Ala 
15 10 15 

Asn His Phe Phe Ala Glu Leu Pro Asp Lys Asp Leu His His Tyr Asp 

20 25 30 

Val Thr lie Thr Pro Glu Val Thr Ser Arg Gly Val Asn Arg Ala Val 

35 40 45 

Met Lys Gin Leu Val Asp Asn Tyr Arg Asp Ser His Leu Gly Ser Arg 

50 55 60 

Leu Pro Ala Tyr Asp Gly Arg Lys Ser Leu Tyr Thr Ala Gly Pro Leu 
65 70 75 80 

Pro Phe Asn Ser Lys Glu Phe Arg He Asn Leu Leu Asp Glu Glu Val 

85 90 95 

Gly Ala Gly Gly Gin Arg Gly Asn Asn Gin Met Pro His Arg Lys Leu 

100 105 HO 

Cys Arg Phe Leu Thr Leu Phe Phe Val Ser Cys Arg Pro Leu Gly Thr 

115 120 125 

Gly Tyr Ser Ser Tyr Gly Cys Leu Leu Leu Tyr Tyr His Ala Tyr Met 

130 135 140 

Ser Ser Thr Ala Phe He Glu Ala Asn Pro Val He Gin Phe Val Cys 
145 150 155 160 

Asp Leu Leu Asn Arg Asp He Ser Ser Arg Pro Leu Ser Asp Ala Asp 

165 170 175 

Arg Val Lys He Lys Lys Ala Leu Arg Gly Val Lys Val Glu Val Thr 

180 185 190 

His Arg Gly Asn Met Arg Arg Lys Tyr Arg He Ser Gly Leu Thr Ala 

195 200 205 

Val Ala Thr Arg Glu Leu Thr Phe Pro Val Asp Glu Arg Asn Thr Gin 

210 215 220 

Lys Ser Val Val Glu Tyr Phe His Glu Thr Tyr Gly Phe Arg He Gin 
225 230 235 240 

His Thr Gin Leu Pro Cys Leu Gin Val Gly Asn Ser Asn Arg Pro Asn 

245 250 255 

Tyr Leu Pro Met Glu Val Cys Lys He Val Glu Gly Gin Arg Tyr Ser 

260 265 270 

Lys Arg Leu Asn Glu Arg Gin He Thr Ala Leu Leu Lys Val Thr Cys 

275 280 285 

Gin Arg Pro He Asp Arg Glu Lys Asp He Leu Gin Thr Val Gin Leu 

290 295 300 

Asn Asp Tyr Ala Lys Asp Asn Tyr Ala Gin Glu Phe Gly He Lys He 
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305 










310 










315 










320 


Ser 


Thr 


Ser 


Leu 


Ala 


Ser 


Val 


Glu 


Ala 


Arg 


He 


Leu 


Pro 


Pro 


Pro 


Trp 










325 










330 










335 




Leu 


Lys 


Tyr 


His 


Glu 


Ser 


Gly Arg 


Glu 


Gly 


Thr 


Cys 


Leu 


Pro 


Gin 


Val 




340 










345 










350 






Gly 


Gin 


Trp 


Asn 


Met 


Met 


Asn 


Lys 


Lys 


Met 


He 


Asn 


Gly 


Gly 


Thr 


Val 




355 










360 










365 








Asn 


Asn 


Trp 


He 


Cys 


He 


Asn 


Phe 


Ser 


Arg 


Gin 


Val 


Gin 


Asp 


Asn 


Leu 




370 








375 










380 










Ala 


Arg 


Thr 


Phe 


Cys 


Gin 


Glu 


Leu 


Ala 


Gin 


Met 


Cys 


Tyr 


Val 


Ser 


Gly 


385 






390 










395 










400 


Met 


Ala 


Phe 


Asn 


Pro 


Glu 


Pro 


Val 


Leu 


Pro 


Pro 


Val 


Ser 


Ala 


Arg 


Pro 










405 










410 










415 




Glu 


Gin 


Val 


Glu 


Lys 


Val 


Leu 


Lys 


Thr 


Arg 


Tyr 


His 


Asp 


Ala 


Thr 


Ser 








420 










425 










430 






Lys 


Leu 


Ser 


Gin 


Gly 


Lys 


Glu 


He 


Asp 


Leu 


Leu 


He 


Val 


He 


Leu 


Pro 




435 










440 










445 








Asp 


Asn 


Asn 


Gly 


Ser 


Leu 


Tyr 


Gly Asp 


Leu 


Lys 


Arg 


He 


Cys 


Glu 


Thr 


450 










455 










460 










Glu 


Leu 


Gly 


He 


Val 


Ser 


Gin 


Cys 


Cys 


Leu 


Thr 


Lys 


His 


Val 


Phe 


Lys 


465 








470 










475 










480 


Met 


Ser 


Lys 


Gin 


Tyr 


Met 


Ala 


Asn 


Val 


Ala 


Leu 


Lys 


He 


Asn 


Val 


Lys 








485 










490 










495 




Val 


Gly 


Gly 


Arg 


Asn 


Thr 


Val 


Leu 


Val 


Asp 


Ala 


Leu 


Ser 


Arg 


Arg 


He 






500 










505 










510 






Pro 


Leu 


Val 


Ser 


Asp 


Arg 


Pro 


Thr 


He 


He 


Phe 


Gly Ala 


Asp 


Val 


Thr 






515 










520 










525 








His 


Pro 


His 


Pro 


Gly 


Glu 


Asp 


Ser 


Ser 


Pro 


Ser 


He 


Ala 


Ala 


Val 


Val 




530 










535 










540 










Ala 


Ser 


Gin 


Asp 


Trp 


Pro 


Glu 


He 


Thr 


Lys 


Tyr 


Ala 


Gly 


Leu 


Val 


Cys 


545 








550 










555 










560 


Ala 


Gin 


Ala 


His 


Arg 


Gin 


Glu 





















565 

(2) INFORMATION FOR SEQ ID NO: 426: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2175 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA {genomic) 
(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..2175 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585462 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:426: 
atggtgagaa agagaagaac ggatgctcca tctgaaggag gtgaaggctc tgggtctcgt 
gaagctggtc cagtctcagg tggtggacgt ggttcacagc gaggtggttt ccagcaggga 120 
ggaggacaac accaaggtgg aaggggttat actcctcaac ctcaacaggg aggtcgtggt 180 
ggtcgtggat atgggcaacc accacaacag caacaacagt atggaggacc acaagagtac 240 
caaggaagag gaagaggagg acctcctcat caaggaggtc gaggagggta tggcggtggc 300 
cgtggaggtg gaccttcttc tggaccaccg cagagacaat cagttcccga gctgcatcaa 
gctacctcac ctacttatca agcggtgtct tctcagccta cactgtctga ggtgagtcct 
acccaggtac cagaacctac tgttctggct cagcaatttg aacaactctc tgttgaacaa 
ggagctccca gtcaggcaat ccagcctata ccgtcttcta gcaaggcttt caagtttcca 
atgaggcctg gtaaaggaca gagtggaaag cgttgcattg tgaaggctaa ccatttcttt 
gctgaactgc ctgataagga tttgcaccat tatgatgtta ccattactcc ggaagttaca 
tcaaggggtg tcaatcgtgc tgtgatgaaa caacttgttg ataattatcg tgattctcac 720 
cttggaagtc gtcttccagc gtatgatggt cgaaaaagtc tttacactgc tggtccactt 
ccctttaact ccaaggagtt cagaatcaat cttcttgacg aagaagtagg ggctggaggt 
caaagacgag aaagggaatt taaagttgtg atcaagctag ttgcacgtgc tgatctgcat 
cacctaggaa tgtttttgga ggggaaacaa tcagatgccc cacaggaagc tctgcaggtt 



60 



360 
420 
480 
540 
600 
660 



780 
840 
900 
960 
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cttgacattg ttcttcatat gtcatcgaca gccttcatag aggcaaaccc tgtgattcag 
tttgtctgtg atttgcttaa ccgggatatt tcttctcgac ctttatctga tgctgatcgt 
gttaagataa aaaaggctct tagaggtgtc aaagttgaag tgactcatcg aggaaacatg 
cgccggaagt accgcatttc cggtttgact gctgtggcca ctcgggaatt gacattccca 
gtagatgaaa gaaatactca gaaatctgtt gtagaatact tccacgaaac atatggtttt 
cgcattcagc acactcaact accatgcttg caagttggga attctaatag gcctaattac 
ttaccaatgg aggtatgcaa gattgttgaa ggccagcggt attccaaaag attgaatgag 
agacagatca ctgctttgct gaaggttacc tgtcagcgcc cgatagatcg agaaaaagat 
atcttacaga cggtgcaact caatgattat gctaaagata attatgctca agagtttggc 
atcaaaataa gtacttctct ggcttctgtt gaggctcgta tactgcctcc tccatggctt 
aagtaccacg agtctggaag ggaagggact tgtctgccac aagttggtca atggaacatg 
atgaataaga aaatgatcaa tggtggaacg gtgaataatt ggatctgcat caacttttct 
aggcaagtgc aggacaatct agcgcgtaca ttttgtcagg aacttgctca aatgtgttac 
gtatctggca tggcatttaa tccggaacca gtcctcccac cagtcagtgc tcgccctgag 
caagtagaga aggtcttgaa gactagatat catgatgcca catcaaaact ctcccaagga 
aaagaaattg atctgcttat tgtcattctg cccgataata atggatcatt atacggtgat 
ttgaaacgca tatgtgagac tgaactcggc atagtctctc aatgttgcct gacaaagcat 
gtctttaaga tgagcaaaca atacatggct aatgttgcgc tgaagattaa tgtgaaggtt 
ggaggaagaa acacagtgct tgttgatgct ctatctaggc ggattcctct agtcagtgat 
cgacccacca ttatatttgg tgctgatgtt acccaccctc accctggaga ggattcaagc 
ccatctattg ctgct 

(2) INFORMATION FOR SEQ ID NO: 427: 



(i) 



(ii) 
(ix) 



(xi) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 725 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
MOLECULE TYPE: peptide 
FEATURE: 

(A) NAME /KEY : 

(B) LOCATION: 



peptide 
1. .725 

(D) OTHER INFORMATION: / Ceres Seq. 
SEQUENCE DESCRIPTION: SEQ ID NO: 427: 



ID 1585463 



Met Val Arg Lys Arg Arg Thr Asp Ala Pro Ser Glu Gly Gly Glu Gly 



1 



5 10 15 

Ser Gly Ser Arg Glu Ala Gly Pro Val Ser Gly Gly Gly Arg Gly Ser 

20 25 30 

Gin Arg Gly Gly Phe Gin Gin Gly Gly Gly Gin His Gin Gly Gly Arg 

35 40 45 

Gly Tyr Thr Pro Gin Pro Gin Gin Gly Gly Arg Gly Gly Arg Gly Tyr 

50 55 60 

Gly Gin Pro Pro Gin Gin Gin Gin Gin Tyr Gly Gly Pro Gin Glu Tyr 
65 70 75 80 

Gin Gly Arg Gly Arg Gly Gly Pro Pro His Gin Gly Gly Arg Gly Gly 

85 90 95 

Tyr Gly Gly Gly Arg Gly Gly Gly Pro Ser Ser Gly Pro Pro Gin Arg 



100 



105 



110 



Gin Ser Val Pro Glu Leu His Gin Ala Thr Ser Pro Thr Tyr Gin Ala 



115 



120 



125 



Val Ser Ser Gin Pro Thr Leu Ser Glu Val Ser Pro Thr Gin Val Pro 



130 



135 



140 



Glu Pro Thr Val Leu Ala Gin Gin Phe Glu Gin Leu Ser Val Glu Gin 



145 



150 



155 



Gly Ala Pro Ser Gin Ala He Gin Pro He Pro Ser Ser 



165 



170 



160 

Ser Lys Ala 
175 



Phe Lys Phe Pro Met Arg Pro Gly Lys Gly Gin Ser Gly Lys Arg Cys 



180 



185 



190 



He Val Lys Ala Asn His Phe Phe Ala Glu Leu Pro Asp Lys Asp Leu 



195 



200 



205 



His His Tyr Asp Val Thr He Thr Pro Glu Val Thr Ser Arg Gly Val 



210 



215 



220 



1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
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Asn Arg Ala Val Met Lys Gin Leu Val Asp Asn Tyr Arg Asp Ser His 
225 230 235 240 

Leu Gly Ser Arg Leu Pro Ala Tyr Asp Gly Arg Lys Ser Leu Tyr Thr 

245 250 255 

Ala Gly Pro Leu Pro Phe Asn Ser Lys Glu Phe Arg lie Asn Leu Leu 

260 265 270 

Asp Glu Glu Val Gly Ala Gly Gly Gin Arg Arg Glu Arg Glu Phe Lys 

275 280 285 

Val Val lie Lys Leu Val Ala Arg Ala Asp Leu His His Leu Gly Met 

290 295 300 

Phe Leu Glu Gly Lys Gin Ser Asp Ala Pro Gin Glu Ala Leu Gin Val 
305 310 315 320 

Leu Asp lie Val Leu His Met Ser Ser Thr Ala Phe lie Glu Ala Asn 

325 330 335 

Pro Val lie Gin Phe Val Cys Asp Leu Leu Asn Arg Asp lie Ser Ser 

340 345 350 

Arg Pro Leu Ser Asp Ala Asp Arg Val Lys lie Lys Lys Ala Leu Arg 

355 360 365 

Gly Val Lys Val Glu Val Thr His Arg Gly Asn Met Arg Arg Lys Tyr 

370 375 380 

Arg He Ser Gly Leu Thr Ala Val Ala Thr Arg Glu Leu Thr Phe Pro 
385 390 395 400 

Val Asp Glu Arg Asn Thr Gin Lys Ser Val Val Glu Tyr Phe His Glu 

405 410 415 

Thr Tyr Gly Phe Arg He Gin His Thr Gin Leu Pro Cys Leu Gin Val 

420 425 430 

Gly Asn Ser Asn Arg Pro Asn Tyr Leu Pro Met Glu Val Cys Lys He 

435 440 445 

Val Glu Gly Gin Arg Tyr Ser Lys Arg Leu Asn Glu Arg Gin He Thr 

450 455 460 

Ala Leu Leu Lys Val Thr Cys Gin Arg Pro He Asp Arg Glu Lys Asp 
465 470 475 480 

He Leu Gin Thr Val Gin Leu Asn Asp Tyr Ala Lys Asp Asn Tyr Ala 

485 490 495 

Gin Glu Phe Gly He Lys He Ser Thr Ser Leu Ala Ser Val Glu Ala 

500 505 510 

Arg He Leu Pro Pro Pro Trp Leu Lys Tyr His Glu Ser Gly Arg Glu 

515 520 525 

Gly Thr Cys Leu Pro Gin Val Gly Gin Trp Asn Met Met Asn Lys Lys 

530 535 540 

Met He Asn Gly Gly Thr Val Asn Asn Trp He Cys He Asn Phe Ser 
545 550 555 560 

Arg Gin Val Gin Asp Asn Leu Ala Arg Thr Phe Cys Gin Glu Leu Ala 

565 570 575 

Gin Met Cys Tyr Val Ser Gly Met Ala Phe Asn Pro Glu Pro Val Leu 

580 585 590 

Pro Pro Val Ser Ala Arg Pro Glu Gin Val Glu Lys Val Leu Lys Thr 

595 600 605 

Arg Tyr His Asp Ala Thr Ser Lys Leu Ser Gin Gly Lys Glu He Asp 

610 615 620 

Leu Leu He Val He Leu Pro Asp Asn Asn Gly Ser Leu Tyr Gly Asp 
625 630 635 640 

Leu Lys Arg He Cys Glu Thr Glu Leu Gly He Val Ser Gin Cys Cys 

645 650 655 

Leu Thr Lys His Val Phe Lys Met Ser Lys Gin Tyr Met Ala Asn Val 

660 665 670 

Ala Leu Lys He Asn Val Lys Val Gly Gly Arg Asn Thr Val Leu Val 

675 680 685 

Asp Ala Leu Ser Arg Arg He Pro Leu Val Ser Asp Arg Pro Thr He 

690 695 700 

He Phe Gly Ala Asp Val Thr His Pro His Pro Gly Glu Asp Ser Ser 
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705 710 715 720 

Pro Ser lie Ala Ala 
725 

(2) INFORMATION FOR SEQ ID NO: 428: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 545 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..545 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585465 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 428: 
Met Arg Pro Gly Lys Gly Gin Ser Gly Lys Arg Cys lie Val Lys Ala 
15 10 15 

Asn His Phe Phe Ala Glu Leu Pro Asp Lys Asp Leu His His Tyr Asp 

20 25 30 

Val Thr lie Thr Pro Glu Val Thr Ser Arg Gly Val Asn Arg Ala Val 

35 40 45 

Met Lys Gin Leu Val Asp Asn Tyr Arg Asp Ser His Leu Gly Ser Arg 

50 55 60 

Leu Pro Ala Tyr Asp Gly Arg Lys Ser Leu Tyr Thr Ala Gly Pro Leu 
65 70 75 80 

Pro Phe Asn Ser Lys Glu Phe Arg lie Asn Leu Leu Asp Glu Glu Val 

85 90 95 

Gly Ala Gly Gly Gin Arg Arg Glu Arg Glu Phe Lys Val Val lie Lys 

100 105 HO 

Leu Val Ala Arg Ala Asp Leu His His Leu Gly Met Phe Leu Glu Gly 

115 120 125 

Lys Gin Ser Asp Ala Pro Gin Glu Ala Leu Gin Val Leu Asp He Val 

130 135 140 

Leu His Met Ser Ser Thr Ala Phe He Glu Ala Asn Pro Val He Gin 
145 150 155 160 

Phe Val Cys Asp Leu Leu Asn Arg Asp He Ser Ser Arg Pro Leu Ser 

165 170 175 

Asp Ala Asp Arg Val Lys He Lys Lys Ala Leu Arg Gly Val Lys Val 

180 185 190 

Glu Val Thr His Arg Gly Asn Met Arg Arg Lys Tyr Arg He Ser Gly 

195 200 205 

Leu Thr Ala Val Ala Thr Arg Glu Leu Thr Phe Pro Val Asp Glu Arg 

210 215 220 

Asn Thr Gin Lys Ser Val Val Glu Tyr Phe His Glu Thr Tyr Gly Phe 
225 230 235 240 

Arg He Gin His Thr Gin Leu Pro Cys Leu Gin Val Gly Asn Ser Asn 

245 250 255 

Arg Pro Asn Tyr Leu Pro Met Glu Val Cys Lys He Val Glu Gly Gin 

260 265 270 

Arg Tyr Ser Lys Arg Leu Asn Glu Arg Gin lie Thr Ala Leu Leu Lys 

275 280 285 

Val Thr Cys Gin Arg Pro He Asp Arg Glu Lys Asp He Leu Gin Thr 

290 295 300 

Val Gin Leu Asn Asp Tyr Ala Lys Asp Asn Tyr Ala Gin Glu Phe Gly 
305 310 315 320 

He Lys He Ser Thr Ser Leu Ala Ser Val Glu Ala Arg He Leu Pro 

325 330 335 

Pro Pro Trp Leu Lys Tyr His Glu Ser Gly Arg Glu Gly Thr Cys Leu 

340 345 350 

Pro Gin Val Gly Gin Trp Asn Met Met Asn Lys Lys Met He Asn Gly 
355 360 365 
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Gly 


Thr 


Val 


Asn 


Asn 


Trp 


He 


Cys 


He 


Asn 


Phe 


Ser 


Arg 


Gin 


Val 


Gin 


370 










375 










380 










Asp 


Asn 


Leu 


Ala 


Arg 


Thr 


Phe 


Cys 


Gin 


Glu 


Leu 


Ala 


Gin 


Met 


Cys 


Tyr 


385 










390 










395 










400 


Val 


Ser 


Gly 


Met 


Ala 


Phe 


Asn 


Pro 


Glu 


Pro 


Val 


Leu 


Pro 


Pro 


Val 


Ser 








405 










410 










415 




Ala 


Arg 


Pro 


Glu 


Gin 


Val 


Glu 


Lys 


Val 


Leu 


Lys 


Thr 


Arg 


Tyr 


His 


Asp 






420 










425 










430 






Ala 


Thr 


Ser 


Lys 


Leu 


Ser 


Gin 


Gly 


Lys 


Glu 


He 


Asp 


Leu 


Leu 


He 


Val 






435 










440 










445 








He 


Leu 


Pro 


Asp 


Asn 


Asn 


Gly 


Ser 


Leu 


Tyr 


Gly 


Asp 


Leu 


Lys 


Arg 


He 




450 










455 










460 










Cys 


Glu 


Thr 


Glu 


Leu 


Gly 


He 


Val 


Ser 


Gin 


Cys 


Cys 


Leu 


Thr 


Lys 


His 


465 










470 










475 










480 


Val 


Phe 


Lys 


Met 


Ser 


Lys 


Gin 


Tyr 


Met 


Ala 


Asn 


Val 


Ala 


Leu 


Lys 


He 








485 










490 










4 95 




Asn 


Val 


Lys 


Val 


Gly 


Gly 


Arg 


Asn 


Thr 


Val 


Leu 


Val 


Asp 


Ala 


Leu 


Ser 








500 










505 










510 






Arg 


Arg 


He 


Pro 


Leu 


Val 


Ser 


Asp 


Arg 


Pro 


Thr 


He 


He 


Phe 


Gly Ala 


515 










520 










525 








Asp 


Val 


Thr 


His 


Pro 


His 


Pro 


Gly 


Glu 


Asp 


Ser 


Ser 


Pro 


Ser 


He 


Ala 


530 










535 










540 










Ala 
































545 
































(2) 


INFORMATION 


FOR 


SEQ 


ID 


NO : 4 2 9 : 

















(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
{ix} FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..490 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585469 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 9: 
agatggngtn natcgacncc ccaaatgggc tgaaccagaa ctatctacta aagttgcttt 
gcaagttcag accactgctg ctattcgagg acctgttgtc acaccaacca atgcacctct 
acaactgcat ccacctccga agaggcaaag gaccgaacta gttgacaaac ttaacctaac 
caaagagcca cctgatgtac ttgaagactt cctgctgttt atattcaccc ttatgaagac 
taaacgacat ttaggagctg cagagataac acacattcgg ttcacacatc caaaaccaag 
agagaagaaa ctttccttaa tcggaggatc tcttgccttt tcactcaaga tcgaatatcc 
tgctcttgat ccagttcgtt tactcactct agaagatctc agccatccat ctcatccacc 
agacctaaga agattcatga agatcacgcc acaagccaaa ggagcgtcac ttttaatgcc 
caacggctag 

(2) INFORMATION FOR SEQ ID NO: 430: 



(i) 



(ii) 
(ix) 



SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 162 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



MOLECULE TYPE: 
FEATURE : 

(A) NAME /KEY: 

(B) LOCATION: 



peptide 



ID 1585470 



peptide 
1. .162 

(D) OTHER INFORMATION: / Ceres Seq. 
SEQUENCE DESCRIPTION: SEQ ID NO: 430: 
Asp Xaa Xaa Xaa Arg Xaa Pro Lys Trp Ala Glu Pro Glu Leu Ser Thr 
15 10 15 

Lys Val Ala Leu Gin Val Gin Thr Thr Ala Ala He Arg Gly Pro Val 
20 25 30 



(xi) 



60 
120 
180 
240 
300 
360 
420 
480 
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Val 


Thr 


Pro 


Thr 


Asn 


Ala 


Pro 


Leu 


Gin 


Leu 


His 


Pro 


Pro 


Pro 


Lys 


Arg 






35 










40 










45 








Gin 


Arg 


Thr 


Glu 


Leu 


Val 


Asp 


Lys 


Leu 


Asn 


Leu 


Thr 


Lys 


Glu 


Pro 


Pro 




50 










55 










60 










Asp 


Val 


Leu 


Glu 


Asp 


Phe 


Leu 


Leu 


Phe 


He 


Phe 


Thr 


Leu 


Met 


Lys 


Thr 


65 










70 










75 










80 


Lys 


Arg 


His 


Leu 


Gly 


Ala 


Ala 


Glu 


He 


Thr 


His 


He 


Arg 


Phe 


Thr 


His 






85 










90 










95 




Pro 


Lys 


Pro 


Arg 


Glu 


Lys 


Lys 


Leu 


Ser 


Leu 


lie 


Gly 


Gly 


Ser 


Leu 


Ala 








100 










105 










110 






Phe 


Ser 


Leu 


Lys 


He 


Glu 


Tyr 


Pro 


Ala 


Leu 


Asp 


Pro 


Val 


Arg 


Leu 


Leu 






115 










120 










125 








Thr 


Leu 


Glu 


Asp 


Leu 


Ser 


His 


Pro 


Ser 


His 


Pro 


Pro 


Asp 


Leu 


Arg 


Arg 




130 










135 










140 










Phe 


Met 


Lys 


He 


Thr 


Pro 


Gin 


Ala 


Lys 


Gly 


Ala 


Ser 


Leu 


Leu 


Met 


Pro 



145 150 155 160 

Asn Gly 

(2) INFORMATION FOR SEQ ID NO: 431: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

( D ) TOPOLOGY : linear 
(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 

(A) NAME /KEY : peptide 

(B) LOCATION: 1..85 

CD) OTHER INFORMATION: / Ceres Seq. ID 1585471 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 431: 



Met 


Lys 


Thr 


Lys 


Arg 


His 


Leu 


Gly 


Ala 


Ala 


Glu 


He 


Thr 


His 


He 


Arg 


1 






5 










10 










15 




Phe 


Thr 


His 


Pro 


Lys 


Pro 


Arg 


Glu 


Lys 


Lys 


Leu 


Ser 


Leu 


He 


Gly 


Gly 








20 










25 










30 






Ser 


Leu 


Ala 


Phe 


Ser 


Leu 


Lys 


He 


Glu 


Tyr 


Pro 


Ala 


Leu 


Asp 


Pro 


Val 






35 










40 










45 








Arg 


Leu 


Leu 


Thr 


Leu 


Glu 


Asp 


Leu 


Ser 


His 


Pro 


Ser 


His 


Pro 


Pro 


Asp 


50 










55 










60 










Leu 


Arg 


Arg 


Phe 


Met 


Lys 


He 


Thr 


Pro 


Gin 


Ala 


Lys 


Gly 


Ala 


Ser 


Leu 


65 








70 










75 










80 


Leu 


Met 


Pro 


Asn 


Gly 

























85 

(2) INFORMATION FOR SEQ ID NO:432: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 594 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

{ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..594 

(D) OTHER INFORMATION: / Ceres Seq. ID 1585624 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 432: 
atggctgaag aacagaatca acaaaatggc cctgtcaaca ttggtgctcg agatgcacca 
cgtgatcacc gtcagaggaa gggaattgca cctcctgcta tcctgaacaa caacttcgag 
attaagagtg gtctcatctc gatgattcag gggaacaaat tccatggtct gccaatggag 
gatccactcg accaccttaa tgaattcgat aggctctgca acctaacgaa gatcaatggt 
gtcagtgaag acggattcaa gctctgtttg tttccattct ccttaggcga caaagcacac 
atctgggaga agaatctgcc ccatgactca atcaccacat gggatgattg caagaaggct 
tttctatcaa agttcttctc aaatgccata actgcaagac tcagaaatga gatttctggt 



